[jira] [Updated] (MESOS-2708) Design doc for the Executor HTTP API

2015-08-26 Thread Anand Mazumdar (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anand Mazumdar updated MESOS-2708:
--
Sprint: Mesosphere Sprint 17

 Design doc for the Executor HTTP API
 

 Key: MESOS-2708
 URL: https://issues.apache.org/jira/browse/MESOS-2708
 Project: Mesos
  Issue Type: Bug
Reporter: Alexander Rojas
Assignee: Anand Mazumdar
  Labels: mesosphere

 This tracks the design of the Executor HTTP API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3319) Mesos will not build when configured with gperftools enabled

2015-08-26 Thread Greg Mann (JIRA)
Greg Mann created MESOS-3319:


 Summary: Mesos will not build when configured with gperftools 
enabled
 Key: MESOS-3319
 URL: https://issues.apache.org/jira/browse/MESOS-3319
 Project: Mesos
  Issue Type: Bug
Reporter: Greg Mann


Mesos configured with {{--enable-perftools}} currently will not build on OSX 
10.10.4 or Ubuntu 14.04, possibly because the bundled gperftools-2.0 is not 
current. The stable release is now 2.4, which builds successfully on both of 
these platforms.

This issue is resolved when Mesos will build successfully out of the box with 
gperftools enabled. After this ticket is resolved, the libprocess profiler 
should be tested to confirm that it still works and if not, it should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2466) Write documentation for all the LIBPROCESS_* environment variables.

2015-08-26 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715346#comment-14715346
 ] 

Greg Mann commented on MESOS-2466:
--

It seems Mesos will currently not build successfully with gperftools enabled 
(upon which the profiler depends), so I removed that environment variable from 
this ticket and created new issues - MESOS-3319  MESOS-3320 - for fixing the 
gperftools build and documenting the env. var., respectively.

 Write documentation for all the LIBPROCESS_* environment variables.
 ---

 Key: MESOS-2466
 URL: https://issues.apache.org/jira/browse/MESOS-2466
 Project: Mesos
  Issue Type: Documentation
Reporter: Alexander Rojas
Assignee: Greg Mann
  Labels: documentation, mesosphere

 libprocess uses a set of environment variables to modify its behaviour; 
 however, these variables are not documented anywhere, nor it is defined where 
 the documentation should be.
 What would be needed is a decision whether the environment variables should 
 be documented (a new doc file or reusing an existing one), and then add the 
 documentation there.
 After searching in the code, these are the variables which need to be 
 documented:
 # {{LIBPROCESS_IP}}
 # {{LIBPROCESS_PORT}}
 # {{LIBPROCESS_ADVERTISE_IP}}
 # {{LIBPROCESS_ADVERTISE_PORT}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3319) Mesos will not build when configured with gperftools enabled

2015-08-26 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann reassigned MESOS-3319:


Assignee: Greg Mann

 Mesos will not build when configured with gperftools enabled
 

 Key: MESOS-3319
 URL: https://issues.apache.org/jira/browse/MESOS-3319
 Project: Mesos
  Issue Type: Bug
Reporter: Greg Mann
Assignee: Greg Mann
  Labels: build

 Mesos configured with {{--enable-perftools}} currently will not build on OSX 
 10.10.4 or Ubuntu 14.04, possibly because the bundled gperftools-2.0 is not 
 current. The stable release is now 2.4, which builds successfully on both of 
 these platforms.
 This issue is resolved when Mesos will build successfully out of the box with 
 gperftools enabled. After this ticket is resolved, the libprocess profiler 
 should be tested to confirm that it still works and if not, it should be 
 fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2015-08-26 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715153#comment-14715153
 ] 

Alexander Rukletsov commented on MESOS-3307:


[~bobrik], you should be able to get the list of endpoints by hitting {{/help}} 
endpoint.

I think history size is also an option, my feeling is however that we need a 
more general solution rather than a band-aid. I would also like [~jmlvanre] to 
chime in.

 Configurable size of completed task / framework history
 ---

 Key: MESOS-3307
 URL: https://issues.apache.org/jira/browse/MESOS-3307
 Project: Mesos
  Issue Type: Bug
Reporter: Ian Babrou

 We try to make Mesos work with multiple frameworks and mesos-dns at the same 
 time. The goal is to have set of frameworks per team / project on a single 
 Mesos cluster.
 At this point our mesos state.json is at 4mb and it takes a while to 
 assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
 pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
 Here's the problem:
 {noformat}
 mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
 .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
1 20150606-001827-252388362-5050-5982-0003
   16 20150606-001827-252388362-5050-5982-0005
   18 20150606-001827-252388362-5050-5982-0029
   73 20150606-001827-252388362-5050-5982-0007
  141 20150606-001827-252388362-5050-5982-0009
  154 20150820-154817-302720010-5050-15320-
  289 20150606-001827-252388362-5050-5982-0004
  510 20150606-001827-252388362-5050-5982-0012
  666 20150606-001827-252388362-5050-5982-0028
  923 20150116-002612-269165578-5050-32204-0003
 1000 20150606-001827-252388362-5050-5982-0001
 1000 20150606-001827-252388362-5050-5982-0006
 1000 20150606-001827-252388362-5050-5982-0010
 1000 20150606-001827-252388362-5050-5982-0011
 1000 20150606-001827-252388362-5050-5982-0027
 mesos λ fgrep 1000 -r src/master
 src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
 src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
 1000;
 {noformat}
 Active tasks are just 6% of state.json response:
 {noformat}
 mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
1   14796 4138942
 mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
   16  37  252774
 {noformat}
 I see four options that can improve the situation:
 1. Add query string param to exclude completed tasks from state.json and use 
 it in mesos-dns and similar tools. There is no need for mesos-dns to know 
 about completed tasks, it's just extra load on master and mesos-dns.
 2. Make history size configurable.
 3. Make JSON serialization faster. With 1s of tasks even without history 
 it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
 60 seconds instead of every 5 seconds isn't really an option.
 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
 have it in Mesos. This way mesos-dns could avoid polling master state and 
 switch to listening for events.
 All can be done independently.
 Note to mesosphere folks: please start distributing debug symbols with your 
 distribution. I was asking for it for a while and it is really helpful: 
 https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
 Perf report for leading master: 
 !http://i.imgur.com/iz7C3o0.png!
 I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2466) Write documentation for all the LIBPROCESS_* environment variables.

2015-08-26 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-2466:
-
Description: 
libprocess uses a set of environment variables to modify its behaviour; 
however, these variables are not documented anywhere, nor it is defined where 
the documentation should be.

What would be needed is a decision whether the environment variables should be 
documented (a new doc file or reusing an existing one), and then add the 
documentation there.

After searching in the code, these are the variables which need to be 
documented:

# {{LIBPROCESS_IP}}
# {{LIBPROCESS_PORT}}
# {{LIBPROCESS_ADVERTISE_IP}}
# {{LIBPROCESS_ADVERTISE_PORT}}

  was:
libprocess uses a set of environment variables to modify its behaviour; 
however, these variables are not documented anywhere, nor it is defined where 
the documentation should be.

What would be needed is a decision whether the environment variables should be 
documented (a new doc file or reusing an existing one), and then add the 
documentation there.

After searching in the code, these are the variables which need to be 
documented:

# {{LIBPROCESS_ENABLE_PROFILER}}
# {{LIBPROCESS_IP}}
# {{LIBPROCESS_PORT}}
# {{LIBPROCESS_ADVERTISE_IP}}
# {{LIBPROCESS_ADVERTISE_PORT}}


 Write documentation for all the LIBPROCESS_* environment variables.
 ---

 Key: MESOS-2466
 URL: https://issues.apache.org/jira/browse/MESOS-2466
 Project: Mesos
  Issue Type: Documentation
Reporter: Alexander Rojas
Assignee: Greg Mann
  Labels: documentation, mesosphere

 libprocess uses a set of environment variables to modify its behaviour; 
 however, these variables are not documented anywhere, nor it is defined where 
 the documentation should be.
 What would be needed is a decision whether the environment variables should 
 be documented (a new doc file or reusing an existing one), and then add the 
 documentation there.
 After searching in the code, these are the variables which need to be 
 documented:
 # {{LIBPROCESS_IP}}
 # {{LIBPROCESS_PORT}}
 # {{LIBPROCESS_ADVERTISE_IP}}
 # {{LIBPROCESS_ADVERTISE_PORT}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3320) Document LIBPROCESS_ENABLE_PROFILER environment variable

2015-08-26 Thread Greg Mann (JIRA)
Greg Mann created MESOS-3320:


 Summary: Document LIBPROCESS_ENABLE_PROFILER environment variable
 Key: MESOS-3320
 URL: https://issues.apache.org/jira/browse/MESOS-3320
 Project: Mesos
  Issue Type: Documentation
Reporter: Greg Mann


This environment variable, used to enable the libprocess profiler, needs to be 
documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3320) Document LIBPROCESS_ENABLE_PROFILER environment variable

2015-08-26 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715352#comment-14715352
 ] 

Greg Mann commented on MESOS-3320:
--

The gperftools build must be fixed, and functionality of the profiler 
confirmed, before this documentation is added.

 Document LIBPROCESS_ENABLE_PROFILER environment variable
 

 Key: MESOS-3320
 URL: https://issues.apache.org/jira/browse/MESOS-3320
 Project: Mesos
  Issue Type: Documentation
Reporter: Greg Mann
  Labels: documentation, libprocess

 This environment variable, used to enable the libprocess profiler, needs to 
 be documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3321) Spurious fetcher message about extracting an archive

2015-08-26 Thread Kapil Arya (JIRA)
Kapil Arya created MESOS-3321:
-

 Summary: Spurious fetcher message about extracting an archive
 Key: MESOS-3321
 URL: https://issues.apache.org/jira/browse/MESOS-3321
 Project: Mesos
  Issue Type: Bug
  Components: fetcher
Reporter: Kapil Arya


The fetcher emits a spurious log message about not extracting an archive with 
.tgz extension, even though the tarball is extracted correctly.

{code}
I0826 19:02:08.304914  2109 logging.cpp:172] INFO level logging started!
I0826 19:02:08.305253  2109 fetcher.cpp:413] Fetcher Info: 
{cache_directory:\/tmp\/mesos\/fetch\/slaves\/20150826-185716-251662764-5050-1-S0\/root,items:[{action:BYPASS_CACHE,uri:{extract:true,value:file:\/\/\/mesos\/sampleflaskapp.tgz}}],sandbox_directory:\/tmp\/mesos\/slaves\/20150826-185716-251662764-5050-1-S0\/frameworks\/20150826-185716-251662764-5050-1-\/executors\/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011\/runs\/e71f50b8-816d-46d5-bcc6-f9850a0402ed,user:root}
I0826 19:02:08.306834  2109 fetcher.cpp:368] Fetching URI 
'file:///mesos/sampleflaskapp.tgz'
I0826 19:02:08.306864  2109 fetcher.cpp:242] Fetching directly into the sandbox 
directory
I0826 19:02:08.306884  2109 fetcher.cpp:179] Fetching URI 
'file:///mesos/sampleflaskapp.tgz'
I0826 19:02:08.306900  2109 fetcher.cpp:159] Copying resource with command:cp 
'/mesos/sampleflaskapp.tgz' 
'/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed/sampleflaskapp.tgz'
I0826 19:02:08.309063  2109 fetcher.cpp:76] Extracting with command: tar -C 
'/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed'
 -xf 
'/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed/sampleflaskapp.tgz'
I0826 19:02:08.315313  2109 fetcher.cpp:84] Extracted 
'/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed/sampleflaskapp.tgz'
 into 
'/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed'
W0826 19:02:08.315381  2109 fetcher.cpp:264] Copying instead of extracting 
resource from URI with 'extract' flag, because it does not seem to be an 
archive: file:///mesos/sampleflaskapp.tgz
I0826 19:02:08.315604  2109 fetcher.cpp:445] Fetched 
'file:///mesos/sampleflaskapp.tgz' to 
'/tmp/mesos/slaves/20150826-185716-251662764-5050-1-S0/frameworks/20150826-185716-251662764-5050-1-/executors/sample-flask-app.f222d202-4c24-11e5-a628-0242ac110011/runs/e71f50b8-816d-46d5-bcc6-f9850a0402ed/sampleflaskapp.tgz'
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3320) Document LIBPROCESS_ENABLE_PROFILER environment variable

2015-08-26 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-3320:
-
Labels: documentation libprocess  (was: )

 Document LIBPROCESS_ENABLE_PROFILER environment variable
 

 Key: MESOS-3320
 URL: https://issues.apache.org/jira/browse/MESOS-3320
 Project: Mesos
  Issue Type: Documentation
Reporter: Greg Mann
  Labels: documentation, libprocess

 This environment variable, used to enable the libprocess profiler, needs to 
 be documented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3318) Disabling local message passing causes tests to fail

2015-08-26 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-3318:
---

 Summary: Disabling local message passing causes tests to fail
 Key: MESOS-3318
 URL: https://issues.apache.org/jira/browse/MESOS-3318
 Project: Mesos
  Issue Type: Bug
  Components: libprocess
Reporter: Joris Van Remoortere


If we add a flag to disable the shortcut of local message passing in libprocess 
between actors in the same OS process, there are tests that fail.
A patch that implemented this behavior can be found here:
https://reviews.apache.org/r/33315/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1791) Introduce Master / Offer Resource Reservations aka Quota

2015-08-26 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14714507#comment-14714507
 ] 

Alexander Rukletsov commented on MESOS-1791:


[~hbogert], both reserve resources per role. The differences are that dynamic 
reservations are tied to particular agents (slaves) and can be controlled by 
frameworks, while quotas are cluster-wide and managed by operators. I would 
encourage you to take a look at the design doc (MESOS-2936) for more 
information.

 Introduce Master / Offer Resource Reservations aka Quota
 

 Key: MESOS-1791
 URL: https://issues.apache.org/jira/browse/MESOS-1791
 Project: Mesos
  Issue Type: Epic
  Components: allocation, master, replicated log
Reporter: Tom Arnfeld
Assignee: Alexander Rukletsov
  Labels: mesosphere

 Currently Mesos supports the ability to reserve resources (for a given role) 
 on a per-slave basis, as introduced in MESOS-505. This allows you to almost 
 statically partition off a set of resources on a set of machines, to 
 guarantee certain types of frameworks get some resources.
 This is very useful, though it is also very useful to be able to control 
 these reservations through the master (instead of per-slave) for when I don't 
 care which nodes I get on, as long as I get X cpu and Y RAM, or Z sets of 
 (X,Y).
 I'm not sure what structure this could take, but apparently it has already 
 been discussed. Would this be a CLI flag? Could there be a (authenticated) 
 web interface to control these reservations?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

2015-08-26 Thread Bernd Mathiske (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Mathiske updated MESOS-3235:
--
Comment: was deleted

(was: https://reviews.apache.org/r/37813/)

 FetcherCacheHttpTest.HttpCachedSerialized and 
 FetcherCacheHttpTest.HttpCachedConcurrent are flaky
 -

 Key: MESOS-3235
 URL: https://issues.apache.org/jira/browse/MESOS-3235
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Joseph Wu
Assignee: Bernd Mathiske
  Labels: mesosphere

 On OSX, {{make clean  make -j8 V=0 check}}:
 {code}
 [--] 3 tests from FetcherCacheHttpTest
 [ RUN  ] FetcherCacheHttpTest.HttpCachedSerialized
 HTTP/1.1 200 OK
 Date: Fri, 07 Aug 2015 17:23:05 GMT
 Content-Length: 30
 I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0
 E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 
 20150807-102305-139395082-52338-52313-S0
 E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 Registered executor on 10.0.79.8
 Starting task 0
 Forked command at 54363
 sh -c './mesos-fetcher-test-cmd 0'
 E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 Command exited with status 0 (pid: 54363)
 E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0
 E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 
 20150807-102305-139395082-52338-52313-S0
 E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 Registered executor on 10.0.79.8
 Starting task 1
 Forked command at 54411
 sh -c './mesos-fetcher-test-cmd 1'
 E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 Command exited with status 0 (pid: 54411)
 E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 ../../src/tests/fetcher_cache_tests.cpp:860: Failure
 Failed to wait 15secs for awaitFinished(task.get())
 *** Aborted at 1438968214 (unix time) try date -d @1438968214 if you are 
 using GNU date ***
 [  FAILED  ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms)
 [ RUN  ] FetcherCacheHttpTest.HttpCachedConcurrent
 PC: @0x113723618 process::Owned::get()
 *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: ***
 @ 0x7fff8fcacf1a _sigtramp
 @ 0x7f9bc3109710 (unknown)
 @0x1136f07e2 mesos::internal::slave::Fetcher::fetch()
 @0x113862f9d 
 mesos::internal::slave::MesosContainerizerProcess::fetch()
 @0x1138f1b5d 
 _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_
 @0x1138f18cf 
 _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_
 @0x1143768cf std::__1::function::operator()()
 @0x11435ca7f process::ProcessBase::visit()
 @0x1143ed6fe process::DispatchEvent::visit()
 @0x11271 process::ProcessBase::serve()
 @0x114343b4e process::ProcessManager::resume()
 @0x1143431ca process::internal::schedule()
 @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_
 @ 0x7fff95090268 _pthread_body
 @ 0x7fff950901e5 _pthread_start
 @ 0x7fff9508e41d thread_start
 Failed to synchronize with slave (it's probably exited)
 make[3]: *** [check-local] Segmentation fault: 11
 make[2]: *** [check-am] Error 2
 make[1]: *** [check] Error 2
 make: *** [check-recursive] Error 1
 {code}
 This was encountered just once out of 3+ {{make check}}s.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3235) FetcherCacheHttpTest.HttpCachedSerialized and FetcherCacheHttpTest.HttpCachedConcurrent are flaky

2015-08-26 Thread Bernd Mathiske (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715520#comment-14715520
 ] 

Bernd Mathiske commented on MESOS-3235:
---

I have been unable to reproduce this, so I could not debug it. And I looked at 
the source code and still could not find what caused this failure. So I think 
the best I can do at the moment is to add additional diagnostic output that may 
help catch the bug once it shows itself again. To this end I have prepared a 
patch that dumps the contents of all task/executor sandboxes in play iff a 
fetcher cache test ends prematurely. 

https://reviews.apache.org/r/37813/

 FetcherCacheHttpTest.HttpCachedSerialized and 
 FetcherCacheHttpTest.HttpCachedConcurrent are flaky
 -

 Key: MESOS-3235
 URL: https://issues.apache.org/jira/browse/MESOS-3235
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Joseph Wu
Assignee: Bernd Mathiske
  Labels: mesosphere

 On OSX, {{make clean  make -j8 V=0 check}}:
 {code}
 [--] 3 tests from FetcherCacheHttpTest
 [ RUN  ] FetcherCacheHttpTest.HttpCachedSerialized
 HTTP/1.1 200 OK
 Date: Fri, 07 Aug 2015 17:23:05 GMT
 Content-Length: 30
 I0807 10:23:05.673596 2085372672 exec.cpp:133] Version: 0.24.0
 E0807 10:23:05.675884 184373248 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 I0807 10:23:05.675897 182226944 exec.cpp:207] Executor registered on slave 
 20150807-102305-139395082-52338-52313-S0
 E0807 10:23:05.683980 184373248 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 Registered executor on 10.0.79.8
 Starting task 0
 Forked command at 54363
 sh -c './mesos-fetcher-test-cmd 0'
 E0807 10:23:05.694953 184373248 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 Command exited with status 0 (pid: 54363)
 E0807 10:23:05.793927 184373248 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 I0807 10:23:06.590008 2085372672 exec.cpp:133] Version: 0.24.0
 E0807 10:23:06.592244 355938304 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 I0807 10:23:06.592243 353255424 exec.cpp:207] Executor registered on slave 
 20150807-102305-139395082-52338-52313-S0
 E0807 10:23:06.597995 355938304 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 Registered executor on 10.0.79.8
 Starting task 1
 Forked command at 54411
 sh -c './mesos-fetcher-test-cmd 1'
 E0807 10:23:06.608708 355938304 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 Command exited with status 0 (pid: 54411)
 E0807 10:23:06.707649 355938304 socket.hpp:173] Shutdown failed on fd=18: 
 Socket is not connected [57]
 ../../src/tests/fetcher_cache_tests.cpp:860: Failure
 Failed to wait 15secs for awaitFinished(task.get())
 *** Aborted at 1438968214 (unix time) try date -d @1438968214 if you are 
 using GNU date ***
 [  FAILED  ] FetcherCacheHttpTest.HttpCachedSerialized (28685 ms)
 [ RUN  ] FetcherCacheHttpTest.HttpCachedConcurrent
 PC: @0x113723618 process::Owned::get()
 *** SIGSEGV (@0x0) received by PID 52313 (TID 0x118d59000) stack trace: ***
 @ 0x7fff8fcacf1a _sigtramp
 @ 0x7f9bc3109710 (unknown)
 @0x1136f07e2 mesos::internal::slave::Fetcher::fetch()
 @0x113862f9d 
 mesos::internal::slave::MesosContainerizerProcess::fetch()
 @0x1138f1b5d 
 _ZZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS2_11ContainerIDERKNS2_11CommandInfoERKNSt3__112basic_stringIcNSC_11char_traitsIcEENSC_9allocatorIcRK6OptionISI_ERKNS2_7SlaveIDES6_S9_SI_SM_SP_EENS_6FutureIT_EERKNS_3PIDIT0_EEMSW_FSU_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_ENKUlPNS_11ProcessBaseEE_clES1D_
 @0x1138f18cf 
 _ZNSt3__110__function6__funcIZN7process8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS5_11ContainerIDERKNS5_11CommandInfoERKNS_12basic_stringIcNS_11char_traitsIcEENS_9allocatorIcRK6OptionISK_ERKNS5_7SlaveIDES9_SC_SK_SO_SR_EENS2_6FutureIT_EERKNS2_3PIDIT0_EEMSY_FSW_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_EUlPNS2_11ProcessBaseEE_NSI_IS1G_EEFvS1F_EEclEOS1F_
 @0x1143768cf std::__1::function::operator()()
 @0x11435ca7f process::ProcessBase::visit()
 @0x1143ed6fe process::DispatchEvent::visit()
 @0x11271 process::ProcessBase::serve()
 @0x114343b4e process::ProcessManager::resume()
 @0x1143431ca process::internal::schedule()
 @0x1143da646 _ZNSt3__114__thread_proxyINS_5tupleIJPFvvEEPvS5_
 @ 0x7fff95090268 _pthread_body
 @ 0x7fff950901e5 _pthread_start
 @ 0x7fff9508e41d thread_start
 Failed to synchronize with slave (it's 

[jira] [Commented] (MESOS-3310) Support provisioning images specified in volumes.

2015-08-26 Thread Jie Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715548#comment-14715548
 ] 

Jie Yu commented on MESOS-3310:
---

commit 33058278e4839fdfaf65b2adc7785e61d74b6775
Author: Jie Yu yujie@gmail.com
Date:   Mon Aug 24 16:22:08 2015 -0700

Added a filesystem isolator test to test image in volume while the
container root filesystem is also specified.

Review: https://reviews.apache.org/r/37738

commit 40638c5413266f4a4d5117cde225247ad19b2f55
Author: Jie Yu yujie@gmail.com
Date:   Mon Aug 24 15:55:11 2015 -0700

Refactored filesystem isolator tests to allow multiple rootfses.

Review: https://reviews.apache.org/r/37735

commit da2dfab8c77ae583eff1a5ce54f23f4b17831976
Author: Jie Yu yujie@gmail.com
Date:   Mon Aug 24 14:23:27 2015 -0700

Used recursive bind mounts for volumes.

Review: https://reviews.apache.org/r/37734

commit 347d51ceca849cc26b9ada8f1014e4c578eeb47b
Author: Jie Yu yujie@gmail.com
Date:   Mon Aug 24 12:44:12 2015 -0700

Added support for preparing images specified in volumes.

Review: https://reviews.apache.org/r/37726

 Support provisioning images specified in volumes.
 -

 Key: MESOS-3310
 URL: https://issues.apache.org/jira/browse/MESOS-3310
 Project: Mesos
  Issue Type: Task
Reporter: Jie Yu
Assignee: Jie Yu

 This is related to MESOS-3095 and MESOS-3227.
 The idea is that we should allow command executor to run under host 
 filesystem and provision the filesystem for the user. The command line 
 executor will then chroot into user's root filesystem.
 This solves the issue that the command executor is not launchable in the user 
 specified root filesystem. 
 The design doc is here:
 https://docs.google.com/document/d/16hyLVRL0nz-KBts1J5stGyxZPniFPbPbs7R-ZRQVCH4/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2466) Write documentation for all the LIBPROCESS_* environment variables.

2015-08-26 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715670#comment-14715670
 ] 

Greg Mann commented on MESOS-2466:
--

Review here: https://reviews.apache.org/r/37814/

Anybody willing to shepherd this little one? :-) [~vinodkone]? [~nnielsen]?

 Write documentation for all the LIBPROCESS_* environment variables.
 ---

 Key: MESOS-2466
 URL: https://issues.apache.org/jira/browse/MESOS-2466
 Project: Mesos
  Issue Type: Documentation
Reporter: Alexander Rojas
Assignee: Greg Mann
  Labels: documentation, mesosphere

 libprocess uses a set of environment variables to modify its behaviour; 
 however, these variables are not documented anywhere, nor it is defined where 
 the documentation should be.
 What would be needed is a decision whether the environment variables should 
 be documented (a new doc file or reusing an existing one), and then add the 
 documentation there.
 After searching in the code, these are the variables which need to be 
 documented:
 # {{LIBPROCESS_IP}}
 # {{LIBPROCESS_PORT}}
 # {{LIBPROCESS_ADVERTISE_IP}}
 # {{LIBPROCESS_ADVERTISE_PORT}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3316) provisioner_backend_tests.cpp breaks the build on OSX

2015-08-26 Thread Alexander Rojas (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712661#comment-14712661
 ] 

Alexander Rojas commented on MESOS-3316:


The review [r/37747/|https://reviews.apache.org/r/37747/] introduced the issue. 
Can you [~xujyan] and your shepherd [~jieyu] take a look at it.

 provisioner_backend_tests.cpp breaks the build on OSX
 -

 Key: MESOS-3316
 URL: https://issues.apache.org/jira/browse/MESOS-3316
 Project: Mesos
  Issue Type: Bug
Reporter: Alexander Rojas
Priority: Blocker
  Labels: build-failure

 The test file makes an include of {{linux/fs.hpp}} which in turn includes 
 {{mntent.h}} which is only available in linux.
 Building in OSX leads to:
 {noformat}
 g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ 
 -DPACKAGE_VERSION=\0.25.0\ -DPACKAGE_STRING=\mesos\ 0.25.0\ 
 -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ 
 -DVERSION=\0.25.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD_PRIO_INHERIT=1 
 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 
 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 
 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. 
 -I../../src   -Wall -Werror -DLIBDIR=\/usr/local/lib\ 
 -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ 
 -DPKGDATADIR=\/usr/local/share/mesos\ -I../../include 
 -I../../3rdparty/libprocess/include 
 -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
 -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
 -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
 -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
 -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
 -I../3rdparty/zookeeper-3.4.5/src/c/generated 
 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
 -DSOURCE_DIR=\/Users/alexander/Documents/workspace/pmesos/build/..\ 
 -DBUILD_DIR=\/Users/alexander/Documents/workspace/pmesos/build\ 
 -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include 
 -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include  
 -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
 -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 
 -I/usr/include/apr-1.0  -D_THREAD_SAFE -pthread -g -O0 -std=c++11 
 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT 
 tests/containerizer/mesos_tests-provisioner_backend_tests.o -MD -MP -MF 
 tests/containerizer/.deps/mesos_tests-provisioner_backend_tests.Tpo -c -o 
 tests/containerizer/mesos_tests-provisioner_backend_tests.o `test -f 
 'tests/containerizer/provisioner_backend_tests.cpp' || echo 
 '../../src/'`tests/containerizer/provisioner_backend_tests.cpp
 make[3]: Nothing to be done for `../../src/tests/balloon_framework_test.sh'.
 make[3]: Nothing to be done for 
 `../../src/tests/event_call_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/java_exception_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/java_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/java_log_test.sh'.
 make[3]: Nothing to be done for 
 `../../src/tests/no_executor_framework_test.sh'.
 make[3]: Nothing to be done for 
 `../../src/tests/persistent_volume_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/python_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/test_framework_test.sh'.
 In file included from 
 ../../src/tests/containerizer/provisioner_backend_tests.cpp:28:
 ../../src/linux/fs.hpp:23:10: fatal error: 'mntent.h' file not found
 #include mntent.h
  ^
 1 error generated.
 make[3]: *** [tests/containerizer/mesos_tests-provisioner_backend_tests.o] 
 Error 1
 make[2]: *** [check-am] Error 2
 make[1]: *** [check] Error 2
 make: *** [check-recursive] Error 1
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3070) Master CHECK failure if a framework uses duplicated task id.

2015-08-26 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712678#comment-14712678
 ] 

Klaus Ma commented on MESOS-3070:
-

Regarding #4.2, framework developers do not need to care about the new field 
(TaskUID) except in such special cases; framework developers still uses TaskID 
as before, TaskUID is used internally between Master  Slave. Personally, one 
concern is the effort/risks: all TaskID will be replaced by UUID internally. 
From this point, #3 (storing tasks in master in a per slave map) seems better, 
because it does not need to change the interaction between Master  Slave.

 Master CHECK failure if a framework uses duplicated task id.
 

 Key: MESOS-3070
 URL: https://issues.apache.org/jira/browse/MESOS-3070
 Project: Mesos
  Issue Type: Bug
  Components: master
Affects Versions: 0.22.1
Reporter: Jie Yu
Assignee: Klaus Ma

 We observed this in one of our testing cluster.
 One framework (under development) keeps launching tasks using the same 
 task_id. We don't expect the master to crash even if the framework is not 
 doing what it's supposed to do. However, under a series of events, this could 
 happen and keeps crashing the master.
 1) frameworkA launches task 'task_id_1' on slaveA
 2) master fails over
 3) slaveA has not re-registered yet
 4) frameworkA re-registered and launches task 'task_id_1' on slaveB
 5) slaveA re-registering and add task task_id_1' to frameworkA
 6) CHECK failure in addTask
 {noformat}
 I0716 21:52:50.759305 28805 master.hpp:159] Adding task 'task_id_1' with 
 resources cpus(*):4; mem(*):32768 on slave 
 20150417-232509-1735470090-5050-48870-S25 (hostname)
 ...
 ...
 F0716 21:52:50.760136 28805 master.hpp:362] Check failed: 
 !tasks.contains(task-task_id()) Duplicate task 'task_id_1' of framework 
 framework_id
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3316) provisioner_backend_tests.cpp breaks the build on OSX

2015-08-26 Thread Yan Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712679#comment-14712679
 ] 

Yan Xu commented on MESOS-3316:
---

Sorry for the oversight. Committed a fix.

{noformat:title=}
commit 5a198ee92c9aa7f14187df7e30d05137fa63b0b3
Author: Jiang Yan Xu y...@jxu.me
Date:   Wed Aug 26 00:39:40 2015 -0700

Fixed provisioner_backend_tests.cpp which included a Linux-only header 
unconditionally.
{noformat}

 provisioner_backend_tests.cpp breaks the build on OSX
 -

 Key: MESOS-3316
 URL: https://issues.apache.org/jira/browse/MESOS-3316
 Project: Mesos
  Issue Type: Bug
Reporter: Alexander Rojas
Priority: Blocker
  Labels: build-failure

 The test file makes an include of {{linux/fs.hpp}} which in turn includes 
 {{mntent.h}} which is only available in linux.
 Building in OSX leads to:
 {noformat}
 g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ 
 -DPACKAGE_VERSION=\0.25.0\ -DPACKAGE_STRING=\mesos\ 0.25.0\ 
 -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ 
 -DVERSION=\0.25.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD_PRIO_INHERIT=1 
 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 
 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 
 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. 
 -I../../src   -Wall -Werror -DLIBDIR=\/usr/local/lib\ 
 -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ 
 -DPKGDATADIR=\/usr/local/share/mesos\ -I../../include 
 -I../../3rdparty/libprocess/include 
 -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
 -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
 -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
 -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
 -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
 -I../3rdparty/zookeeper-3.4.5/src/c/generated 
 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
 -DSOURCE_DIR=\/Users/alexander/Documents/workspace/pmesos/build/..\ 
 -DBUILD_DIR=\/Users/alexander/Documents/workspace/pmesos/build\ 
 -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include 
 -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include  
 -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
 -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 
 -I/usr/include/apr-1.0  -D_THREAD_SAFE -pthread -g -O0 -std=c++11 
 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT 
 tests/containerizer/mesos_tests-provisioner_backend_tests.o -MD -MP -MF 
 tests/containerizer/.deps/mesos_tests-provisioner_backend_tests.Tpo -c -o 
 tests/containerizer/mesos_tests-provisioner_backend_tests.o `test -f 
 'tests/containerizer/provisioner_backend_tests.cpp' || echo 
 '../../src/'`tests/containerizer/provisioner_backend_tests.cpp
 make[3]: Nothing to be done for `../../src/tests/balloon_framework_test.sh'.
 make[3]: Nothing to be done for 
 `../../src/tests/event_call_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/java_exception_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/java_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/java_log_test.sh'.
 make[3]: Nothing to be done for 
 `../../src/tests/no_executor_framework_test.sh'.
 make[3]: Nothing to be done for 
 `../../src/tests/persistent_volume_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/python_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/test_framework_test.sh'.
 In file included from 
 ../../src/tests/containerizer/provisioner_backend_tests.cpp:28:
 ../../src/linux/fs.hpp:23:10: fatal error: 'mntent.h' file not found
 #include mntent.h
  ^
 1 error generated.
 make[3]: *** [tests/containerizer/mesos_tests-provisioner_backend_tests.o] 
 Error 1
 make[2]: *** [check-am] Error 2
 make[1]: *** [check] Error 2
 make: *** [check-recursive] Error 1
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3316) provisioner_backend_tests.cpp breaks the build on OSX

2015-08-26 Thread Alexander Rojas (JIRA)
Alexander Rojas created MESOS-3316:
--

 Summary: provisioner_backend_tests.cpp breaks the build on OSX
 Key: MESOS-3316
 URL: https://issues.apache.org/jira/browse/MESOS-3316
 Project: Mesos
  Issue Type: Bug
Reporter: Alexander Rojas
Priority: Blocker


The test file makes an include of {{linux/fs.hpp}} which in turn includes 
{{mntent.h}} which is only available in linux.


Building in OSX leads to:
{noformat}
g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ 
-DPACKAGE_VERSION=\0.25.0\ -DPACKAGE_STRING=\mesos\ 0.25.0\ 
-DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ 
-DVERSION=\0.25.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 
-DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
-DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
-DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src   -Wall -Werror 
-DLIBDIR=\/usr/local/lib\ -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ 
-DPKGDATADIR=\/usr/local/share/mesos\ -I../../include 
-I../../3rdparty/libprocess/include 
-I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
-I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
-I../3rdparty/libprocess/3rdparty/picojson-4f93734 
-I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include 
-I../3rdparty/zookeeper-3.4.5/src/c/include 
-I../3rdparty/zookeeper-3.4.5/src/c/generated 
-I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
-DSOURCE_DIR=\/Users/alexander/Documents/workspace/pmesos/build/..\ 
-DBUILD_DIR=\/Users/alexander/Documents/workspace/pmesos/build\ 
-I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include 
-I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include  
-I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
-I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 
-I/usr/include/apr-1.0  -D_THREAD_SAFE -pthread -g -O0 -std=c++11 
-stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT 
tests/containerizer/mesos_tests-provisioner_backend_tests.o -MD -MP -MF 
tests/containerizer/.deps/mesos_tests-provisioner_backend_tests.Tpo -c -o 
tests/containerizer/mesos_tests-provisioner_backend_tests.o `test -f 
'tests/containerizer/provisioner_backend_tests.cpp' || echo 
'../../src/'`tests/containerizer/provisioner_backend_tests.cpp
make[3]: Nothing to be done for `../../src/tests/balloon_framework_test.sh'.
make[3]: Nothing to be done for `../../src/tests/event_call_framework_test.sh'.
make[3]: Nothing to be done for `../../src/tests/java_exception_test.sh'.
make[3]: Nothing to be done for `../../src/tests/java_framework_test.sh'.
make[3]: Nothing to be done for `../../src/tests/java_log_test.sh'.
make[3]: Nothing to be done for `../../src/tests/no_executor_framework_test.sh'.
make[3]: Nothing to be done for 
`../../src/tests/persistent_volume_framework_test.sh'.
make[3]: Nothing to be done for `../../src/tests/python_framework_test.sh'.
make[3]: Nothing to be done for `../../src/tests/test_framework_test.sh'.
In file included from 
../../src/tests/containerizer/provisioner_backend_tests.cpp:28:
../../src/linux/fs.hpp:23:10: fatal error: 'mntent.h' file not found
#include mntent.h
 ^
1 error generated.
make[3]: *** [tests/containerizer/mesos_tests-provisioner_backend_tests.o] 
Error 1
make[2]: *** [check-am] Error 2
make[1]: *** [check] Error 2
make: *** [check-recursive] Error 1
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3315) make check fails on OSX

2015-08-26 Thread Neil Conway (JIRA)
Neil Conway created MESOS-3315:
--

 Summary: make check fails on OSX
 Key: MESOS-3315
 URL: https://issues.apache.org/jira/browse/MESOS-3315
 Project: Mesos
  Issue Type: Bug
 Environment: OSX 10.10.5
Reporter: Neil Conway
Assignee: Yan Xu
Priority: Minor


{quote}
g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ 
-DPACKAGE_VERSION=\0.25.0\ -DPACKAGE_STRING=\mesos\ 0.25.0\ 
-DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ 
-DVERSION=\0.25.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 
-DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 
-DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1 
-DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 
-DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 
-DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. -I../../src   -Wall -Werror 
-DLIBDIR=\/usr/local/lib\ -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ 
-DPKGDATADIR=\/usr/local/share/mesos\ -I../../include 
-I../../3rdparty/libprocess/include 
-I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
-I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
-I../3rdparty/libprocess/3rdparty/picojson-4f93734 
-I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
-I../3rdparty/libprocess/3rdparty/glog-0.3.3/src -I../3rdparty/leveldb/include 
-I../3rdparty/zookeeper-3.4.5/src/c/include 
-I../3rdparty/zookeeper-3.4.5/src/c/generated 
-I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
-DSOURCE_DIR=\/Users/neilc/mesos/build/..\ 
-DBUILD_DIR=\/Users/neilc/mesos/build\ 
-I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include 
-I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include  
-I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
-I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 
-I/usr/include/apr-1.0  -D_THREAD_SAFE -pthread -g1 -O0 -std=c++11 
-stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT 
tests/containerizer/mesos_tests-provisioner_backend_tests.o -MD -MP -MF 
tests/containerizer/.deps/mesos_tests-provisioner_backend_tests.Tpo -c -o 
tests/containerizer/mesos_tests-provisioner_backend_tests.o `test -f 
'tests/containerizer/provisioner_backend_tests.cpp' || echo 
'../../src/'`tests/containerizer/provisioner_backend_tests.cpp
In file included from 
../../src/tests/containerizer/provisioner_backend_tests.cpp:28:
../../src/linux/fs.hpp:23:10: fatal error: 'mntent.h' file not found
#include mntent.h
{quote}

Seems like {{provisioner_backend_tests.cpp}} shouldn't unconditionally include 
linux/fs.hpp, as mntent.h is not provided on OSX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3316) provisioner_backend_tests.cpp breaks the build on OSX

2015-08-26 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-3316:
--
Assignee: Yan Xu

 provisioner_backend_tests.cpp breaks the build on OSX
 -

 Key: MESOS-3316
 URL: https://issues.apache.org/jira/browse/MESOS-3316
 Project: Mesos
  Issue Type: Bug
Reporter: Alexander Rojas
Assignee: Yan Xu
Priority: Blocker
  Labels: build-failure

 The test file makes an include of {{linux/fs.hpp}} which in turn includes 
 {{mntent.h}} which is only available in linux.
 Building in OSX leads to:
 {noformat}
 g++ -DPACKAGE_NAME=\mesos\ -DPACKAGE_TARNAME=\mesos\ 
 -DPACKAGE_VERSION=\0.25.0\ -DPACKAGE_STRING=\mesos\ 0.25.0\ 
 -DPACKAGE_BUGREPORT=\\ -DPACKAGE_URL=\\ -DPACKAGE=\mesos\ 
 -DVERSION=\0.25.0\ -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 
 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 
 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 
 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\.libs/\ -DHAVE_PTHREAD_PRIO_INHERIT=1 
 -DHAVE_PTHREAD=1 -DHAVE_LIBZ=1 -DHAVE_LIBCURL=1 -DHAVE_APR_POOLS_H=1 
 -DHAVE_LIBAPR_1=1 -DHAVE_SVN_VERSION_H=1 -DHAVE_LIBSVN_SUBR_1=1 
 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1 -DHAVE_LIBSASL2=1 -I. 
 -I../../src   -Wall -Werror -DLIBDIR=\/usr/local/lib\ 
 -DPKGLIBEXECDIR=\/usr/local/libexec/mesos\ 
 -DPKGDATADIR=\/usr/local/share/mesos\ -I../../include 
 -I../../3rdparty/libprocess/include 
 -I../../3rdparty/libprocess/3rdparty/stout/include -I../include 
 -I../include/mesos -I../3rdparty/libprocess/3rdparty/boost-1.53.0 
 -I../3rdparty/libprocess/3rdparty/picojson-4f93734 
 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
 -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
 -I../3rdparty/libprocess/3rdparty/glog-0.3.3/src 
 -I../3rdparty/leveldb/include -I../3rdparty/zookeeper-3.4.5/src/c/include 
 -I../3rdparty/zookeeper-3.4.5/src/c/generated 
 -I../3rdparty/libprocess/3rdparty/protobuf-2.5.0/src 
 -DSOURCE_DIR=\/Users/alexander/Documents/workspace/pmesos/build/..\ 
 -DBUILD_DIR=\/Users/alexander/Documents/workspace/pmesos/build\ 
 -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/gtest/include 
 -I../3rdparty/libprocess/3rdparty/gmock-1.7.0/include  
 -I/usr/local/opt/openssl/include -I/usr/local/opt/libevent/include 
 -I/usr/local/opt/subversion/include/subversion-1 -I/usr/include/apr-1 
 -I/usr/include/apr-1.0  -D_THREAD_SAFE -pthread -g -O0 -std=c++11 
 -stdlib=libc++ -DGTEST_USE_OWN_TR1_TUPLE=1 -DGTEST_LANG_CXX11 -MT 
 tests/containerizer/mesos_tests-provisioner_backend_tests.o -MD -MP -MF 
 tests/containerizer/.deps/mesos_tests-provisioner_backend_tests.Tpo -c -o 
 tests/containerizer/mesos_tests-provisioner_backend_tests.o `test -f 
 'tests/containerizer/provisioner_backend_tests.cpp' || echo 
 '../../src/'`tests/containerizer/provisioner_backend_tests.cpp
 make[3]: Nothing to be done for `../../src/tests/balloon_framework_test.sh'.
 make[3]: Nothing to be done for 
 `../../src/tests/event_call_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/java_exception_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/java_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/java_log_test.sh'.
 make[3]: Nothing to be done for 
 `../../src/tests/no_executor_framework_test.sh'.
 make[3]: Nothing to be done for 
 `../../src/tests/persistent_volume_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/python_framework_test.sh'.
 make[3]: Nothing to be done for `../../src/tests/test_framework_test.sh'.
 In file included from 
 ../../src/tests/containerizer/provisioner_backend_tests.cpp:28:
 ../../src/linux/fs.hpp:23:10: fatal error: 'mntent.h' file not found
 #include mntent.h
  ^
 1 error generated.
 make[3]: *** [tests/containerizer/mesos_tests-provisioner_backend_tests.o] 
 Error 1
 make[2]: *** [check-am] Error 2
 make[1]: *** [check] Error 2
 make: *** [check-recursive] Error 1
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3317) URL query string order is undefined

2015-08-26 Thread Jan Schlicht (JIRA)
Jan Schlicht created MESOS-3317:
---

 Summary: URL query string order is undefined
 Key: MESOS-3317
 URL: https://issues.apache.org/jira/browse/MESOS-3317
 Project: Mesos
  Issue Type: Wish
  Components: libprocess
Reporter: Jan Schlicht
Priority: Minor


A `process::http::URL` instance has its query strings stored in a hashmap. 
Stringifying the instance will use the order defined by the hash function to 
order the query strings. This order depends on the concrete implementation of 
the hash function.  
A well defined query string order (e.g. in alphabetical order) may be important 
for bot detection. If the query strings should be in an alphabetic order, 
multiple solutions are possible:
1. Use map instead of hashmap for defining query string in URLs
2. Order the query strings while creating the URL string
3. Provide an own string hash function that guarantees a certain order



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3317) URL query string order is undefined

2015-08-26 Thread Jan Schlicht (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Schlicht updated MESOS-3317:

Labels: easyfix mesosphere newbie  (was: easyfix newbie)

 URL query string order is undefined
 ---

 Key: MESOS-3317
 URL: https://issues.apache.org/jira/browse/MESOS-3317
 Project: Mesos
  Issue Type: Wish
  Components: libprocess
Reporter: Jan Schlicht
Priority: Minor
  Labels: easyfix, mesosphere, newbie

 A `process::http::URL` instance has its query strings stored in a hashmap. 
 Stringifying the instance will use the order defined by the hash function to 
 order the query strings. This order depends on the concrete implementation of 
 the hash function.  
 A well defined query string order (e.g. in alphabetical order) may be 
 important for bot detection. If the query strings should be in an alphabetic 
 order, multiple solutions are possible:
 1. Use map instead of hashmap for defining query string in URLs
 2. Order the query strings while creating the URL string
 3. Provide an own string hash function that guarantees a certain order



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3307) Configurable size of completed task / framework history

2015-08-26 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712967#comment-14712967
 ] 

Ian Babrou commented on MESOS-3307:
---

[~alex-mesos] is there a list of mesos endpoints? I wasn't able to find one. 
Having docs for this would be great.

Any feedback on configurable history size? This is the simplest solution so far.

 Configurable size of completed task / framework history
 ---

 Key: MESOS-3307
 URL: https://issues.apache.org/jira/browse/MESOS-3307
 Project: Mesos
  Issue Type: Bug
Reporter: Ian Babrou

 We try to make Mesos work with multiple frameworks and mesos-dns at the same 
 time. The goal is to have set of frameworks per team / project on a single 
 Mesos cluster.
 At this point our mesos state.json is at 4mb and it takes a while to 
 assembly. 5 mesos-dns instances hit state.json every 5 seconds, effectively 
 pushing mesos-master CPU usage through the roof. It's at 100%+ all the time.
 Here's the problem:
 {noformat}
 mesos λ curl -s http://mesos-master:5050/master/state.json | jq 
 .frameworks[].completed_tasks[].framework_id | sort | uniq -c | sort -n
1 20150606-001827-252388362-5050-5982-0003
   16 20150606-001827-252388362-5050-5982-0005
   18 20150606-001827-252388362-5050-5982-0029
   73 20150606-001827-252388362-5050-5982-0007
  141 20150606-001827-252388362-5050-5982-0009
  154 20150820-154817-302720010-5050-15320-
  289 20150606-001827-252388362-5050-5982-0004
  510 20150606-001827-252388362-5050-5982-0012
  666 20150606-001827-252388362-5050-5982-0028
  923 20150116-002612-269165578-5050-32204-0003
 1000 20150606-001827-252388362-5050-5982-0001
 1000 20150606-001827-252388362-5050-5982-0006
 1000 20150606-001827-252388362-5050-5982-0010
 1000 20150606-001827-252388362-5050-5982-0011
 1000 20150606-001827-252388362-5050-5982-0027
 mesos λ fgrep 1000 -r src/master
 src/master/constants.cpp:const size_t MAX_REMOVED_SLAVES = 10;
 src/master/constants.cpp:const uint32_t MAX_COMPLETED_TASKS_PER_FRAMEWORK = 
 1000;
 {noformat}
 Active tasks are just 6% of state.json response:
 {noformat}
 mesos λ cat ~/temp/mesos-state.json | jq -c . | wc
1   14796 4138942
 mesos λ cat ~/temp/mesos-state.json | jq .frameworks[].tasks | jq -c . | wc
   16  37  252774
 {noformat}
 I see four options that can improve the situation:
 1. Add query string param to exclude completed tasks from state.json and use 
 it in mesos-dns and similar tools. There is no need for mesos-dns to know 
 about completed tasks, it's just extra load on master and mesos-dns.
 2. Make history size configurable.
 3. Make JSON serialization faster. With 1s of tasks even without history 
 it would take a lot of time to serialize tasks for mesos-dns. Doing it every 
 60 seconds instead of every 5 seconds isn't really an option.
 4. Create event bus for mesos master. Marathon has it and it'd be nice to 
 have it in Mesos. This way mesos-dns could avoid polling master state and 
 switch to listening for events.
 All can be done independently.
 Note to mesosphere folks: please start distributing debug symbols with your 
 distribution. I was asking for it for a while and it is really helpful: 
 https://github.com/mesosphere/marathon/issues/1497#issuecomment-104182501
 Perf report for leading master: 
 !http://i.imgur.com/iz7C3o0.png!
 I'm on 0.23.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2058) Deprecate stats.json endpoints for Master and Slave

2015-08-26 Thread Ian Babrou (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713504#comment-14713504
 ] 

Ian Babrou commented on MESOS-2058:
---

[~dhamon] [~nnielsen] this broke master's ui home page. It is using 
staged_tasks and friends and in 0.23.0 you can't see the values.

 Deprecate stats.json endpoints for Master and Slave
 ---

 Key: MESOS-2058
 URL: https://issues.apache.org/jira/browse/MESOS-2058
 Project: Mesos
  Issue Type: Task
  Components: master, slave
Reporter: Dominic Hamon
Assignee: Dominic Hamon
  Labels: twitter
 Fix For: 0.23.0


 With the introduction of the libprocess {{/metrics/snapshot}} endpoint, 
 metrics are now duplicated in the Master and Slave between this and 
 {{stats.json}}. We should deprecate the {{stats.json}} endpoints.
 Manual inspection of {{stats.json}} shows that all metrics are now covered by 
 the new endpoint for Master and Slave.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2684) mesos-slave should not abort when a single task has e.g. a 'mkdir' failure

2015-08-26 Thread Scott D.W. Rankin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713497#comment-14713497
 ] 

Scott D.W. Rankin commented on MESOS-2684:
--

Hi all - I'm seeing this issue as well.  We're running Marathon 0.8.2, Mesos 
0.22.1 on CentOS 6.6 and are getting errors similar to the one pasted below 
pretty regularly.  We can't reproduce it all the time, but it happens when 
initiating a deployment from Marathon.  

26 Aug 2015 09:35:01.213  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
F0826 06:35:01.136056 30280 slave.cpp:3354] CHECK_SOME(os::touch(path)): Failed 
to open file: No such file or directory Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  *** 
Check failure stack trace: *** Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
@   0x3de1e765cd  (unknown) Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
@   0x3de1e7a5e7  (unknown) Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
@   0x3de1e78469  (unknown) Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
@   0x3de1e7876d  (unknown) Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
@   0x3de17c5696  (unknown) Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
@   0x3de1a1855a  (unknown) Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
@   0x3de1a1c0a9  (unknown) Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
@   0x3de1a510ff  (unknown) Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
@   0x3de1e18b83  (unknown) Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
@   0x3de1e1978c  (unknown) Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
@   0x39d58079d1  (unknown) Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=mesos-slave[30248]:  
@   0x39d54e88fd  (unknown) Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=init:  mesos-slave main 
process (30248) killed by ABRT signal Context
26 Aug 2015 09:35:01.369  host=mesosnode6-aws-west tag=init:  mesos-slave main 
process ended, respawning Context


 mesos-slave should not abort when a single task has e.g. a 'mkdir' failure
 --

 Key: MESOS-2684
 URL: https://issues.apache.org/jira/browse/MESOS-2684
 Project: Mesos
  Issue Type: Bug
  Components: slave
Affects Versions: 0.21.1
Reporter: Steven Schlansker
 Attachments: mesos-slave-restart.txt


 mesos-slave can encounter a variety of problems while attempting to launch a 
 task.  If the task fails, that is unfortunate, but not the end of the world.  
 Other tasks should not be affected.
 However, if the task failure happens to trigger an assertion, the entire 
 slave comes crashing down:
 F0501 19:10:46.095464  1705 paths.hpp:342] CHECK_SOME(mkdir): No space left 
 on device Failed to create executor directory 
 '/mnt/mesos/slaves/20150327-194449-419644938-5050-1649-S71/frameworks/Singularity/executors/pp-gc-eventlog-teamcity.2015.03.31T23.55.14-1430507446029-2-10.70.8.160-us_west_2b/runs/95a54aeb-322c-48e9-9f6f-5b359bccbc01'
 Immediately afterwards, all tasks on this slave were declared TASK_KILLED 
 when mesos-slave restarted.
 Something as simple as a 'mkdir' failing is not worthy of an assertion 
 failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1791) Introduce Master / Offer Resource Reservations aka Quota

2015-08-26 Thread Hans van den Bogert (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713626#comment-14713626
 ] 

Hans van den Bogert commented on MESOS-1791:


I see this ticket is related to dynamic reservations, but how exactly is this 
related? Could one say that dynamic reservations is a more restricted form of 
quotas, as the latter does not lay reservation on specific resources/slaves?

Or are they  the same thing?

 Introduce Master / Offer Resource Reservations aka Quota
 

 Key: MESOS-1791
 URL: https://issues.apache.org/jira/browse/MESOS-1791
 Project: Mesos
  Issue Type: Epic
  Components: allocation, master, replicated log
Reporter: Tom Arnfeld
Assignee: Alexander Rukletsov
  Labels: mesosphere

 Currently Mesos supports the ability to reserve resources (for a given role) 
 on a per-slave basis, as introduced in MESOS-505. This allows you to almost 
 statically partition off a set of resources on a set of machines, to 
 guarantee certain types of frameworks get some resources.
 This is very useful, though it is also very useful to be able to control 
 these reservations through the master (instead of per-slave) for when I don't 
 care which nodes I get on, as long as I get X cpu and Y RAM, or Z sets of 
 (X,Y).
 I'm not sure what structure this could take, but apparently it has already 
 been discussed. Would this be a CLI flag? Could there be a (authenticated) 
 web interface to control these reservations?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3158) Libprocess Process: Join runqueue workers during finalization

2015-08-26 Thread Greg Mann (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715792#comment-14715792
 ] 

Greg Mann commented on MESOS-3158:
--

Review here: https://reviews.apache.org/r/37821/

 Libprocess Process: Join runqueue workers during finalization
 -

 Key: MESOS-3158
 URL: https://issues.apache.org/jira/browse/MESOS-3158
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Joris Van Remoortere
Assignee: Greg Mann
  Labels: beginner, libprocess, mesosphere, newbie

 The lack of synchronization between ProcessManager destruction and the thread 
 pool threads running the queued processes means that the shared state that is 
 part of the ProcessManager gets destroyed prematurely.
 Synchronizing the ProcessManager destructor with draining the work queues and 
 stopping the workers will allow us to not require leaking the shared state to 
 avoid use beyond destruction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3322) Upgrade vendored google-glog to 0.3.4

2015-08-26 Thread Neil Conway (JIRA)
Neil Conway created MESOS-3322:
--

 Summary: Upgrade vendored google-glog to 0.3.4
 Key: MESOS-3322
 URL: https://issues.apache.org/jira/browse/MESOS-3322
 Project: Mesos
  Issue Type: Improvement
Reporter: Neil Conway
Assignee: Neil Conway
Priority: Minor


This brings a few improvements; it should also mean we can drop the patch we 
currently apply to address some glog bugs that likely have been fixed upstream 
(see [#860]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3322) Upgrade vendored google-glog to 0.3.4

2015-08-26 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715816#comment-14715816
 ] 

Neil Conway commented on MESOS-3322:


https://reviews.apache.org/r/37823/
https://reviews.apache.org/r/37824/


 Upgrade vendored google-glog to 0.3.4
 -

 Key: MESOS-3322
 URL: https://issues.apache.org/jira/browse/MESOS-3322
 Project: Mesos
  Issue Type: Improvement
Reporter: Neil Conway
Assignee: Neil Conway
Priority: Minor

 This brings a few improvements; it should also mean we can drop the patch we 
 currently apply to address some glog bugs that likely have been fixed 
 upstream (see [#860]).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1791) Introduce Master / Offer Resource Reservations aka Quota

2015-08-26 Thread Qian Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715871#comment-14715871
 ] 

Qian Zhang commented on MESOS-1791:
---

[~alex-mesos], if dynamic reservation is per role, then how can we guarantee 
the reserved resources will be re-offered to the stateful framework which makes 
the dynamic reservation? For example, both Cassandra framework and HDFS 
framework belongs to role1, and HDFS dynamically reserves some resources in an 
agent for role1, then I think it may be possible for allocator to offer those 
resources to Cassandra since it also belongs to role1, but actually HDFS 
expects to be offered with those resources.

 Introduce Master / Offer Resource Reservations aka Quota
 

 Key: MESOS-1791
 URL: https://issues.apache.org/jira/browse/MESOS-1791
 Project: Mesos
  Issue Type: Epic
  Components: allocation, master, replicated log
Reporter: Tom Arnfeld
Assignee: Alexander Rukletsov
  Labels: mesosphere

 Currently Mesos supports the ability to reserve resources (for a given role) 
 on a per-slave basis, as introduced in MESOS-505. This allows you to almost 
 statically partition off a set of resources on a set of machines, to 
 guarantee certain types of frameworks get some resources.
 This is very useful, though it is also very useful to be able to control 
 these reservations through the master (instead of per-slave) for when I don't 
 care which nodes I get on, as long as I get X cpu and Y RAM, or Z sets of 
 (X,Y).
 I'm not sure what structure this could take, but apparently it has already 
 been discussed. Would this be a CLI flag? Could there be a (authenticated) 
 web interface to control these reservations?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3063) Add an example framework using dynamic reservation

2015-08-26 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716016#comment-14716016
 ] 

Klaus Ma commented on MESOS-3063:
-

Update the example with UT scripts; and also add [~mcypark] as reviewer.

 Add an example framework using dynamic reservation
 --

 Key: MESOS-3063
 URL: https://issues.apache.org/jira/browse/MESOS-3063
 Project: Mesos
  Issue Type: Task
Reporter: Michael Park
Assignee: Klaus Ma

 An example framework using dynamic reservation should added to
 # test dynamic reservations further, and
 # to be used as a reference for those who want to use the dynamic reservation 
 feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3312) Factor out JSON to repeated protobuf conversion

2015-08-26 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715927#comment-14715927
 ] 

Alexander Rukletsov edited comment on MESOS-3312 at 8/27/15 3:39 AM:
-

https://reviews.apache.org/r/37826/
https://reviews.apache.org/r/37827/
https://reviews.apache.org/r/37830/


was (Author: alex-mesos):
https://reviews.apache.org/r/37826/
https://reviews.apache.org/r/37827/

 Factor out JSON to repeated protobuf conversion
 ---

 Key: MESOS-3312
 URL: https://issues.apache.org/jira/browse/MESOS-3312
 Project: Mesos
  Issue Type: Improvement
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov
  Labels: mesosphere

 In general, we have the collection of protobuf messages as another protobuf 
 message, which makes JSON - protobuf conversion straightforward. This is not 
 always the case, for example, {{Resources}} class is not a protobuf, though 
 protobuf-convertible.
 To facilitate conversions like JSON - {{Resources}} and avoid writing code 
 for each particular case, we propose to introduce {{JSON::Array}} - 
 {{repeated protobuf}} conversion. With this in place, {{JSON::Array}} - 
 {{Resources}} boils down to {{JSON::Array}} - {{repeated Resource}} - 
 (extra c-tor call) - {{Resources}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3063) Add an example framework using dynamic reservation

2015-08-26 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716017#comment-14716017
 ] 

Klaus Ma commented on MESOS-3063:
-

Update the example with UT scripts; and also add [~mcypark] as reviewer.

 Add an example framework using dynamic reservation
 --

 Key: MESOS-3063
 URL: https://issues.apache.org/jira/browse/MESOS-3063
 Project: Mesos
  Issue Type: Task
Reporter: Michael Park
Assignee: Klaus Ma

 An example framework using dynamic reservation should added to
 # test dynamic reservations further, and
 # to be used as a reference for those who want to use the dynamic reservation 
 feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3323) Auto-generate protos for stout tests

2015-08-26 Thread Alexander Rukletsov (JIRA)
Alexander Rukletsov created MESOS-3323:
--

 Summary: Auto-generate protos for stout tests
 Key: MESOS-3323
 URL: https://issues.apache.org/jira/browse/MESOS-3323
 Project: Mesos
  Issue Type: Improvement
Reporter: Alexander Rukletsov
Assignee: Kapil Arya
Priority: Minor


Stout protobufs (AFAIK right now it's just a single file 
{{protobuf_tests.proto}}) are not generated automatically. Including proto 
generation step would be cleaner and more convenient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1791) Introduce Master / Offer Resource Reservations aka Quota

2015-08-26 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715905#comment-14715905
 ] 

Alexander Rukletsov commented on MESOS-1791:


[~qianzhang], have a look at [persistent volumes 
documentation|https://mesos.apache.org/documentation/latest/persistent-volume/].
 There was also a [talk on 
MesosCon|http://mesoscon2015.sched.org/event/7151c36724e5c3bc9de9e452fe4c866a#.Vd5uBtOqqko],
 hopefully the video will be available soon. If your question will remain 
unanswered, I would like to encourage you to continue on the devlist rather 
than quota epic, so that other contributors may chime in.

 Introduce Master / Offer Resource Reservations aka Quota
 

 Key: MESOS-1791
 URL: https://issues.apache.org/jira/browse/MESOS-1791
 Project: Mesos
  Issue Type: Epic
  Components: allocation, master, replicated log
Reporter: Tom Arnfeld
Assignee: Alexander Rukletsov
  Labels: mesosphere

 Currently Mesos supports the ability to reserve resources (for a given role) 
 on a per-slave basis, as introduced in MESOS-505. This allows you to almost 
 statically partition off a set of resources on a set of machines, to 
 guarantee certain types of frameworks get some resources.
 This is very useful, though it is also very useful to be able to control 
 these reservations through the master (instead of per-slave) for when I don't 
 care which nodes I get on, as long as I get X cpu and Y RAM, or Z sets of 
 (X,Y).
 I'm not sure what structure this could take, but apparently it has already 
 been discussed. Would this be a CLI flag? Could there be a (authenticated) 
 web interface to control these reservations?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3312) Factor out JSON to repeated protobuf conversion

2015-08-26 Thread Alexander Rukletsov (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715927#comment-14715927
 ] 

Alexander Rukletsov commented on MESOS-3312:


https://reviews.apache.org/r/37826/
https://reviews.apache.org/r/37827/

 Factor out JSON to repeated protobuf conversion
 ---

 Key: MESOS-3312
 URL: https://issues.apache.org/jira/browse/MESOS-3312
 Project: Mesos
  Issue Type: Improvement
Reporter: Alexander Rukletsov
Assignee: Alexander Rukletsov
  Labels: mesosphere

 In general, we have the collection of protobuf messages as another protobuf 
 message, which makes JSON - protobuf conversion straightforward. This is not 
 always the case, for example, {{Resources}} class is not a protobuf, though 
 protobuf-convertible.
 To facilitate conversions like JSON - {{Resources}} and avoid writing code 
 for each particular case, we propose to introduce {{JSON::Array}} - 
 {{repeated protobuf}} conversion. With this in place, {{JSON::Array}} - 
 {{Resources}} boils down to {{JSON::Array}} - {{repeated Resource}} - 
 (extra c-tor call) - {{Resources}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-3063) Add an example framework using dynamic reservation

2015-08-26 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-3063:

Comment: was deleted

(was: Update the example with UT scripts; and also add [~mcypark] as reviewer.)

 Add an example framework using dynamic reservation
 --

 Key: MESOS-3063
 URL: https://issues.apache.org/jira/browse/MESOS-3063
 Project: Mesos
  Issue Type: Task
Reporter: Michael Park
Assignee: Klaus Ma

 An example framework using dynamic reservation should added to
 # test dynamic reservations further, and
 # to be used as a reference for those who want to use the dynamic reservation 
 feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)