[jira] [Commented] (MESOS-3794) Master should not store arbitrarily sized data in ExecutorInfo

2015-11-20 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018776#comment-15018776
 ] 

Joseph Wu commented on MESOS-3794:
--

That would be appreciated.

> Master should not store arbitrarily sized data in ExecutorInfo
> --
>
> Key: MESOS-3794
> URL: https://issues.apache.org/jira/browse/MESOS-3794
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> From a comment in [MESOS-3771]:
> Master should not be storing the {{data}} fields from {{ExecutorInfo}}.  We 
> currently [store the entire 
> object|https://github.com/apache/mesos/blob/master/src/master/master.hpp#L262-L271],
>  which means master would be at high risk of OOM-ing if a bunch of executors 
> were started with big {{data}} blobs.
> * Master should scrub out unneeded bloat from {{ExecutorInfo}} before storing 
> it.
> * We can use an alternate internal object, like we do for {{TaskInfo}} vs 
> {{Task}}; see 
> [this|https://github.com/apache/mesos/blob/master/src/messages/messages.proto#L39-L41].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2015-11-19 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014458#comment-15014458
 ] 

Joseph Wu commented on MESOS-3753:
--

More test cleanup, discovered while prototyping {{libprocess::finalize}}:
https://reviews.apache.org/r/40501/

> Test the HTTP Scheduler library with SSL enabled
> 
>
> Key: MESOS-3753
> URL: https://issues.apache.org/jira/browse/MESOS-3753
> Project: Mesos
>  Issue Type: Story
>  Components: framework, HTTP API, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere, security
>
> Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
> (You can manually test this by spinning up an SSL-enabled master and attempt 
> to run the event-call framework example against it.)
> We need to add tests that check the HTTP Scheduler library against 
> SSL-enabled Mesos:
> * with downgrade support,
> * with required framework/client-side certifications,
> * with/without verification of certificates (master-side),
> * with/without verification of certificates (framework-side),
> * with a custom certificate authority (CA)
> These options should be controlled by the same environment variables found on 
> the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].
> Note: This issue will be broken down into smaller sub-issues as bugs/problems 
> are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2015-11-19 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014733#comment-15014733
 ] 

Joseph Wu commented on MESOS-3753:
--

More test cleanup.  This time discovered by cleaning up every process managed 
by libprocess between every test:
https://reviews.apache.org/r/40507/

> Test the HTTP Scheduler library with SSL enabled
> 
>
> Key: MESOS-3753
> URL: https://issues.apache.org/jira/browse/MESOS-3753
> Project: Mesos
>  Issue Type: Story
>  Components: framework, HTTP API, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere, security
>
> Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
> (You can manually test this by spinning up an SSL-enabled master and attempt 
> to run the event-call framework example against it.)
> We need to add tests that check the HTTP Scheduler library against 
> SSL-enabled Mesos:
> * with downgrade support,
> * with required framework/client-side certifications,
> * with/without verification of certificates (master-side),
> * with/without verification of certificates (framework-side),
> * with a custom certificate authority (CA)
> These options should be controlled by the same environment variables found on 
> the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].
> Note: This issue will be broken down into smaller sub-issues as bugs/problems 
> are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3885) Remove apply-review.sh script from repository.

2015-11-18 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012070#comment-15012070
 ] 

Joseph Wu commented on MESOS-3885:
--

Before we remove {{apply-review.sh}}, we should probably add an option to 
{{apply-reviews.py}} to apply a single review.

Right now, if you have a chain like:
{code}
1 -> 2 -> 3
{code}

It is non-trivial to go from having review 2 applied; to having review 3 
applied.  i.e.
{code}
apply-reviews.py -n -r 2
apply-reviews.py -n -r 3 # Tries to apply review 1 again => bad patch :(
{code}

> Remove apply-review.sh script from repository.
> --
>
> Key: MESOS-3885
> URL: https://issues.apache.org/jira/browse/MESOS-3885
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>
> Since apply-reviews.py script was introduced there is no need anymore to keep 
> the apply-review.sh around. apply-review.sh should be removed soon after 
> apply-reviews.py ships.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2015-11-18 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012106#comment-15012106
 ] 

Joseph Wu commented on MESOS-3753:
--

A little test cleanup:
https://reviews.apache.org/r/40453/
https://reviews.apache.org/r/40454/

> Test the HTTP Scheduler library with SSL enabled
> 
>
> Key: MESOS-3753
> URL: https://issues.apache.org/jira/browse/MESOS-3753
> Project: Mesos
>  Issue Type: Story
>  Components: framework, HTTP API, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere, security
>
> Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
> (You can manually test this by spinning up an SSL-enabled master and attempt 
> to run the event-call framework example against it.)
> We need to add tests that check the HTTP Scheduler library against 
> SSL-enabled Mesos:
> * with downgrade support,
> * with required framework/client-side certifications,
> * with/without verification of certificates (master-side),
> * with/without verification of certificates (framework-side),
> * with a custom certificate authority (CA)
> These options should be controlled by the same environment variables found on 
> the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].
> Note: This issue will be broken down into smaller sub-issues as bugs/problems 
> are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3935) mesos-master crashes when a scheduler with an unresolvable hostname attempts to connect

2015-11-17 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009165#comment-15009165
 ] 

Joseph Wu commented on MESOS-3935:
--

This is the expected behavior (Mesos tries to "fail fast" whenever 
possible/applicable).  In this case, Mesos found that your {{--ip}} was invalid 
and quit.

If you don't want Mesos to quit in this way, you could set the 
{{--hostname_lookup}} flag to false.  But make sure this is actually what you 
want.

> mesos-master crashes when a scheduler with an unresolvable hostname attempts 
> to connect
> ---
>
> Key: MESOS-3935
> URL: https://issues.apache.org/jira/browse/MESOS-3935
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Hamza Faran
>
> {code}
> $ sudo mesos-master --ip=10.8.0.5 --work_dir=work_dir --authenticate 
> --authenticate_slaves --credentials=credentials --port=5045   
>
> I1117 07:05:15.371150  5852 main.cpp:229] Build: 2015-10-12 21:00:09 by root  
>   
>   
> I1117 07:05:15.371314  5852 main.cpp:231] Version: 0.25.0 
>   
>   
> I1117 07:05:15.371340  5852 main.cpp:234] Git tag: 0.25.0 
>   
>   
> I1117 07:05:15.371366  5852 main.cpp:238] Git SHA: 
> 2dd7f7ee115fe00b8e098b0a10762a4fa8f4600f  
>   
>
> I1117 07:05:15.371439  5852 main.cpp:252] Using 'HierarchicalDRF' allocator   
>   
>   
> I1117 07:05:15.373845  5852 leveldb.cpp:176] Opened db in 2.267831ms  
>   
>   
> I1117 07:05:15.374606  5852 leveldb.cpp:183] Compacted db in 678911ns 
>   
>   
> I1117 07:05:15.374668  5852 leveldb.cpp:198] Created db iterator in 19310ns   
>   
>   
> I1117 07:05:15.374775  5852 leveldb.cpp:204] Seeked to beginning of db in 
> 79269ns   
>   
> I1117 07:05:15.374882  5852 leveldb.cpp:273] Iterated through 3 keys in the 
> db in 79949ns 
> 
> I1117 07:05:15.374953  5852 replica.cpp:744] Replica recovered with log 
> positions 91 -> 92 with 0 holes and 0 unlearned   
> 
> I1117 07:05:15.375820  5852 main.cpp:465] Starting Mesos master   
>   
>   
> I1117 07:05:15.376049  5856 recover.cpp:449] Starting replica recovery
>   
>   
> I1117 07:05:15.376188  5858 recover.cpp:475] Replica is in VOTING status  
>   
>   
> I1117 07:05:15.376332  5858 recover.cpp:464] Recover process terminated   
>   
>   
> F1117 07:05:43.398336  5852 master.cpp:330] Failed to get hostname: Temporary 
> failure in name resolution
>   
> *** Check failure stack trace: ***
>   
>   
> @ 0x7f55ebf4273d  google::LogMessage::Fail()  
>  

[jira] [Issue Comment Deleted] (MESOS-3935) mesos-master crashes when a scheduler with an unresolvable hostname attempts to connect

2015-11-17 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3935:
-
Comment: was deleted

(was: This is the expected behavior (Mesos tries to "fail fast" whenever 
possible/applicable).  In this case, Mesos found that your {{--ip}} was invalid 
and quit.

If you don't want Mesos to quit in this way, you could set the 
{{--hostname_lookup}} flag to false.  But make sure this is actually what you 
want.)

> mesos-master crashes when a scheduler with an unresolvable hostname attempts 
> to connect
> ---
>
> Key: MESOS-3935
> URL: https://issues.apache.org/jira/browse/MESOS-3935
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Hamza Faran
>
> {code}
> $ sudo mesos-master --ip=10.8.0.5 --work_dir=work_dir --authenticate 
> --authenticate_slaves --credentials=credentials --port=5045   
>
> I1117 07:05:15.371150  5852 main.cpp:229] Build: 2015-10-12 21:00:09 by root  
>   
>   
> I1117 07:05:15.371314  5852 main.cpp:231] Version: 0.25.0 
>   
>   
> I1117 07:05:15.371340  5852 main.cpp:234] Git tag: 0.25.0 
>   
>   
> I1117 07:05:15.371366  5852 main.cpp:238] Git SHA: 
> 2dd7f7ee115fe00b8e098b0a10762a4fa8f4600f  
>   
>
> I1117 07:05:15.371439  5852 main.cpp:252] Using 'HierarchicalDRF' allocator   
>   
>   
> I1117 07:05:15.373845  5852 leveldb.cpp:176] Opened db in 2.267831ms  
>   
>   
> I1117 07:05:15.374606  5852 leveldb.cpp:183] Compacted db in 678911ns 
>   
>   
> I1117 07:05:15.374668  5852 leveldb.cpp:198] Created db iterator in 19310ns   
>   
>   
> I1117 07:05:15.374775  5852 leveldb.cpp:204] Seeked to beginning of db in 
> 79269ns   
>   
> I1117 07:05:15.374882  5852 leveldb.cpp:273] Iterated through 3 keys in the 
> db in 79949ns 
> 
> I1117 07:05:15.374953  5852 replica.cpp:744] Replica recovered with log 
> positions 91 -> 92 with 0 holes and 0 unlearned   
> 
> I1117 07:05:15.375820  5852 main.cpp:465] Starting Mesos master   
>   
>   
> I1117 07:05:15.376049  5856 recover.cpp:449] Starting replica recovery
>   
>   
> I1117 07:05:15.376188  5858 recover.cpp:475] Replica is in VOTING status  
>   
>   
> I1117 07:05:15.376332  5858 recover.cpp:464] Recover process terminated   
>   
>   
> F1117 07:05:43.398336  5852 master.cpp:330] Failed to get hostname: Temporary 
> failure in name resolution
>   
> *** Check failure stack trace: ***
>   
>   
> @ 0x7f55ebf4273d  google::LogMessage::Fail()  
>  

[jira] [Created] (MESOS-3934) Libprocess: Unify the initialization of the MetricsProcess and ReaperProcess

2015-11-16 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3934:


 Summary: Libprocess: Unify the initialization of the 
MetricsProcess and ReaperProcess
 Key: MESOS-3934
 URL: https://issues.apache.org/jira/browse/MESOS-3934
 Project: Mesos
  Issue Type: Task
  Components: libprocess, test
Reporter: Joseph Wu
Assignee: Joseph Wu


Related to this 
[TODO|https://github.com/apache/mesos/blob/aa0cd7ed4edf1184cbc592b5caa2429a8373e813/3rdparty/libprocess/src/process.cpp#L949-L950].

The {{MetricsProcess}} and {{ReaperProcess}} are global processes (singletons) 
which are initialized upon first use.  The two processes could be initialized 
alongside the {{gc}}, {{help}}, {{logging}}, {{profiler}}, and {{system}} 
(statistics) processes inside {{process::initialize}}.

This is also necessary for libprocess re-initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3863) Investigate the requirements of programmatically re-initializing libprocess

2015-11-16 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3863:
-
Description: 
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

{{process::finalize}} should cover the following components:
* {{__s__}} (the server socket)
** {{delete}} should be sufficient.  This closes the socket and thereby 
prevents any further interaction from it.
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
** Manages the {{EventLoop}}.
* {{Clock}}
** The goal here is to clear any timers so that nothing can deference 
{{process_manager}} while we're finalizing/finalized.  It's probably not 
important to execute any remaining timers, since we're "shutting down" 
libprocess.  This means:
*** The clock should be {{paused}} and {{settled}} before the clean up of 
{{process_manager}}.
*** Processes, which might interact with the {{Clock}}, should be cleaned up 
next.
*** A new {{Clock::finalize}} method would then clear timers, process-specific 
clocks, and {{tick}} s; and then {{resume}} the clock.
* {{__address__}} (the advertised IP and port)
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{socket_manager}}
** The idea here is to close all sockets and deallocate any existing 
{{HttpProxy}} or {{Encoder}} objects.
** All sockets are created via {{__s__}}, so cleaning up the server socket 
prior will prevent any new activity.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

*Summary*:
* Implement {{Clock::finalize}}.  [MESOS-3882]
* Implement {{~SocketManager}}.
* Make sure the {{MetricsProcess}} and {{ReaperProcess}} are reinitialized.  
These are currently initialized separately.
* (Optional) Clean up {{mime}}.
* Wrap everything up in {{process::finalize}}.

  was:
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

{{process::finalize}} should cover the following components:
* {{__s__}} (the server socket)
** {{delete}} should be sufficient.  This closes the socket and thereby 
prevents any further interaction from it.
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
** Manages the {{EventLoop}}.
* {{Clock}}
** The goal here is to clear any timers so that nothing can deference 
{{process_manager}} while we're finalizing/finalized.  It's probably not 
important to execute any remaining timers, since we're "shutting down" 
libprocess.  This means:
*** The clock should be {{paused}} and {{settled}} before the clean up of 
{{process_manager}}.
*** Processes, which might interact with the {{Clock}}, should be cleaned up 
next.
*** A new {{Clock::finalize}} method would then clear timers, process-specific 
clocks, and {{tick}} s; and then {{resume}} the clock.
* {{__address__}} (the advertised IP and port)
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{socket_manager}}
** The idea here is to close all sockets and deallocate any existing 
{{HttpProxy}} or {{Encoder}} objects.
** All sockets are created via {{__s__}}, so cleaning up the server socket 
prior will prevent any new activity.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

*Summary*:
* Implement {{Clock::finalize}}.  [MESOS-3882]
* Implement {{~SocketManager}}.
* Clean up {{mime}}.
* Wrap everything up in {{process::finalize}}.


> Investigate the requirements of programmatically 

[jira] [Updated] (MESOS-3863) Investigate the requirements of programmatically re-initializing libprocess

2015-11-16 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3863:
-
Description: 
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

{{process::finalize}} should cover the following components:
* {{__s__}} (the server socket)
** {{delete}} should be sufficient.  This closes the socket and thereby 
prevents any further interaction from it.
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
** Manages the {{EventLoop}}.
* {{Clock}}
** The goal here is to clear any timers so that nothing can deference 
{{process_manager}} while we're finalizing/finalized.  It's probably not 
important to execute any remaining timers, since we're "shutting down" 
libprocess.  This means:
*** The clock should be {{paused}} and {{settled}} before the clean up of 
{{process_manager}}.
*** Processes, which might interact with the {{Clock}}, should be cleaned up 
next.
*** A new {{Clock::finalize}} method would then clear timers, process-specific 
clocks, and {{tick}} s; and then {{resume}} the clock.
* {{__address__}} (the advertised IP and port)
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{socket_manager}}
** The idea here is to close all sockets and deallocate any existing 
{{HttpProxy}} or {{Encoder}} objects.
** All sockets are created via {{__s__}}, so cleaning up the server socket 
prior will prevent any new activity.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

*Summary*:
* Implement {{Clock::finalize}}.  [MESOS-3882]
* Implement {{~SocketManager}}.  [MESOS-3910]
* Make sure the {{MetricsProcess}} and {{ReaperProcess}} are reinitialized.  
[MESOS-3934]
* (Optional) Clean up {{mime}}.
* Wrap everything up in {{process::finalize}}.

  was:
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

{{process::finalize}} should cover the following components:
* {{__s__}} (the server socket)
** {{delete}} should be sufficient.  This closes the socket and thereby 
prevents any further interaction from it.
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
** Manages the {{EventLoop}}.
* {{Clock}}
** The goal here is to clear any timers so that nothing can deference 
{{process_manager}} while we're finalizing/finalized.  It's probably not 
important to execute any remaining timers, since we're "shutting down" 
libprocess.  This means:
*** The clock should be {{paused}} and {{settled}} before the clean up of 
{{process_manager}}.
*** Processes, which might interact with the {{Clock}}, should be cleaned up 
next.
*** A new {{Clock::finalize}} method would then clear timers, process-specific 
clocks, and {{tick}} s; and then {{resume}} the clock.
* {{__address__}} (the advertised IP and port)
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{socket_manager}}
** The idea here is to close all sockets and deallocate any existing 
{{HttpProxy}} or {{Encoder}} objects.
** All sockets are created via {{__s__}}, so cleaning up the server socket 
prior will prevent any new activity.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

*Summary*:
* Implement {{Clock::finalize}}.  [MESOS-3882]
* Implement {{~SocketManager}}.
* Make sure the {{MetricsProcess}} and {{ReaperProcess}} are reinitialized.  
These are currently initialized separately.
* (Optional) Clean up 

[jira] [Commented] (MESOS-3916) Flakey test on Ubuntu Wily: MasterMaintenanceTest.InverseOffersFilters

2015-11-13 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004420#comment-15004420
 ] 

Joseph Wu commented on MESOS-3916:
--

Looks like we have a race:
{code}
I1113 16:44:01.695679  8744 master.cpp:5069] Sending 1 inverse offers to 
framework d59449fc-5462-43c5-b935-e05563fdd4b6- (default)
I1113 16:44:01.701465  8743 http.cpp:338] HTTP POST for 
/master/api/v1/scheduler from 10.0.2.15:32768
I1113 16:44:01.701598  8743 master.cpp:3297] Processing DECLINE call for 
offers: [ d59449fc-5462-43c5-b935-e05563fdd4b6-O4 ] for framework 
d59449fc-5462-43c5-b935-e05563fdd4b6- (default)
I1113 16:44:02.767982  8742 master.cpp:5069] Sending 2 inverse offers to 
framework d59449fc-5462-43c5-b935-e05563fdd4b6- (default)
/mesos/src/tests/master_maintenance_tests.cpp:1594: Failure
Value of: event.get().offers().inverse_offers().size()
  Actual: 2
Expected: 1
{code}

I'm guessing this isn't OS-specific (it would be weird if it was), but I 
haven't seen it on any other OS.

> Flakey test on Ubuntu Wily: MasterMaintenanceTest.InverseOffersFilters
> --
>
> Key: MESOS-3916
> URL: https://issues.apache.org/jira/browse/MESOS-3916
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>  Labels: maintenance, mesosphere, tech-debt
> Attachments: wily_maintenance_test_verbose.txt
>
>
> Seems to fail about 10% of the time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3917) Configure and load SSL key password from configuration file

2015-11-13 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004425#comment-15004425
 ] 

Joseph Wu commented on MESOS-3917:
--

We would likely use this function to implement this fix/feature:
https://www.openssl.org/docs/manmaster/ssl/SSL_CTX_set_default_passwd_cb.html

> Configure and load SSL key password from configuration file
> ---
>
> Key: MESOS-3917
> URL: https://issues.apache.org/jira/browse/MESOS-3917
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Timothy Chen
>
> Currently when SSL enabled and password enabled on SSL key file, launching a 
> task fails as the executor inherits SSL config and blocks on stdin asking for 
> SSL key password.
> To avoid this we should allow password information to be loaded from a config 
> file and also pass the file information to executors and other processes that 
> wants SSL communication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3917) Configure and load SSL key password from configuration file

2015-11-13 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004451#comment-15004451
 ] 

Joseph Wu commented on MESOS-3917:
--

Also see this: https://reviews.apache.org/r/40284/#review106388

> Configure and load SSL key password from configuration file
> ---
>
> Key: MESOS-3917
> URL: https://issues.apache.org/jira/browse/MESOS-3917
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Timothy Chen
>
> Currently when SSL enabled and password enabled on SSL key file, launching a 
> task fails as the executor inherits SSL config and blocks on stdin asking for 
> SSL key password.
> To avoid this we should allow password information to be loaded from a config 
> file and also pass the file information to executors and other processes that 
> wants SSL communication.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3863) Investigate the requirements of programmatically re-initializing libprocess

2015-11-12 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15002673#comment-15002673
 ] 

Joseph Wu edited comment on MESOS-3863 at 11/13/15 12:33 AM:
-

Inter-dependency between {{process_manager}} and {{socket_manager}} will 
complicate things:

* {{process_manager}} holds the {{gc}} and various {{HttpProxy}} processes.
* {{socket_manager}} spawns {{HttpProxy}} processes and relies on {{gc}} to 
clean them up.
* {{gc}} relies on {{socket_manager}} links to clean up processes.

See [MESOS-3910]


was (Author: kaysoky):
Inter-dependency between {{process_manager}} and {{socket_manager}} will 
complicate things:

* {{process_manager}} holds the {{gc}} and various {{HttpProxy}} processes.
* {{socket_manager}} spawns {{HttpProxy}} processes and relies on {{gc}} to 
clean them up.
* {{gc}} relies on {{socket_manager}} links to clean up processes.

{{process::finalize}} should:
# Clean up all processes other than {{gc}}.  This will clear all links and 
delete all {{HttpProxy}} s while {{socket_manager}} still exists.
# Close all sockets via {{SocketManager::close}}.  All of {{socket_manager}} 's 
state is cleaned up via {{SocketManager::close}}, including termination of 
{{HttpProxy}} (termination is idempotent, meaning that killing {{HttpProxy}} s 
via {{process_manager}} is safe).
# At this point, {{socket_manager}} should be empty and only the {{gc}} process 
should be running.  (Since we're finalizing, assume there are no threads trying 
to spawn processes.)  {{socket_manager}} can be deleted.
# {{gc}} can be deleted.  This is currently a leaked pointer, so we'll also 
need to track and delete that.
# {{process_manager}} should be devoid of processes, so we can proceed with 
cleanup (join threads, stop the {{EventLoop}}, etc).

> Investigate the requirements of programmatically re-initializing libprocess
> ---
>
> Key: MESOS-3863
> URL: https://issues.apache.org/jira/browse/MESOS-3863
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> This issue is for investigating what needs to be added/changed in 
> {{process::finalize}} such that {{process::initialize}} will start on a clean 
> slate.  Additional issues will be created once done.  Also see [the parent 
> issue|MESOS-3820].
> {{process::finalize}} should cover the following components:
> * {{__s__}} (the server socket)
> ** {{delete}} should be sufficient.  This closes the socket and thereby 
> prevents any further interaction from it.
> * {{process_manager}}
> ** Related prior work: [MESOS-3158]
> ** Cleans up the garbage collector, help, logging, profiler, statistics, 
> route processes (including [this 
> one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
>  which currently leaks a pointer).
> ** Cleans up any other {{spawn}} 'd process.
> ** Manages the {{EventLoop}}.
> * {{Clock}}
> ** The goal here is to clear any timers so that nothing can deference 
> {{process_manager}} while we're finalizing/finalized.  It's probably not 
> important to execute any remaining timers, since we're "shutting down" 
> libprocess.  This means:
> *** The clock should be {{paused}} and {{settled}} before the clean up of 
> {{process_manager}}.
> *** Processes, which might interact with the {{Clock}}, should be cleaned up 
> next.
> *** A new {{Clock::finalize}} method would then clear timers, 
> process-specific clocks, and {{tick}} s; and then {{resume}} the clock.
> * {{__address__}} (the advertised IP and port)
> ** Needs to be cleared after {{process_manager}} has been cleaned up.  
> Processes use this to communicate events.  If cleared prematurely, 
> {{TerminateEvents}} will not be sent correctly, leading to infinite waits.
> * {{socket_manager}}
> ** The idea here is to close all sockets and deallocate any existing 
> {{HttpProxy}} or {{Encoder}} objects.
> ** All sockets are created via {{__s__}}, so cleaning up the server socket 
> prior will prevent any new activity.
> * {{mime}}
> ** This is effectively a static map.
> ** It should be possible to statically initialize it.
> * Synchronization atomics {{initialized}} & {{initializing}}.
> ** Once cleanup is done, these should be reset.
> *Summary*:
> * Implement {{Clock::finalize}}.  [MESOS-3882]
> * Implement {{~SocketManager}}.
> * Clean up {{mime}}.
> * Wrap everything up in {{process::finalize}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3910) Libprocess: Implement cleanup of the SocketManager in process::finalize

2015-11-12 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3910:


 Summary: Libprocess: Implement cleanup of the SocketManager in 
process::finalize
 Key: MESOS-3910
 URL: https://issues.apache.org/jira/browse/MESOS-3910
 Project: Mesos
  Issue Type: Task
  Components: libprocess, test
Reporter: Joseph Wu
Assignee: Joseph Wu


The {{socket_manager}} and {{process_manager}} are intricately tied together.  
Currently, only the {{process_manager}} is cleaned up by {{process::finalize}}.

To clean up the {{socket_manager}}, we must close all sockets and deallocate 
any existing {{HttpProxy}} or {{Encoder}} objects.  And we should prevent 
further objects from being created/tracked by the {{socket_manager}}.

*Proposal*
# Clean up all processes other than {{gc}}.  This will clear all links and 
delete all {{HttpProxy}} s while {{socket_manager}} still exists.
# Close all sockets via {{SocketManager::close}}.  All of {{socket_manager}} 's 
state is cleaned up via {{SocketManager::close}}, including termination of 
{{HttpProxy}} (termination is idempotent, meaning that killing {{HttpProxy}} s 
via {{process_manager}} is safe).
# At this point, {{socket_manager}} should be empty and only the {{gc}} process 
should be running.  (Since we're finalizing, assume there are no threads trying 
to spawn processes.)  {{socket_manager}} can be deleted.
# {{gc}} can be deleted.  This is currently a leaked pointer, so we'll also 
need to track and delete that.
# {{process_manager}} should be devoid of processes, so we can proceed with 
cleanup (join threads, stop the {{EventLoop}}, etc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3909) isolator module headers depend on picojson headers

2015-11-12 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003279#comment-15003279
 ] 

Joseph Wu commented on MESOS-3909:
--

This has always been a requirement, although I don't think it's very well 
documented (if at all).

Currently, you'll need to copy the header into your path then add these build 
flags along with any other flags:
{code}
-DPICOJSON_USE_INT64
-D__STDC_FORMAT_MACROS
{code}

> isolator module headers depend on picojson headers
> --
>
> Key: MESOS-3909
> URL: https://issues.apache.org/jira/browse/MESOS-3909
> Project: Mesos
>  Issue Type: Bug
>  Components: c++ api, modules
>Reporter: James Peach
>
> When trying to build an isolator module, stout headers end up depending on 
> {{picojson.hpp}} which is not installed.
> {code}
> In file included from /opt/mesos/include/mesos/module/isolator.hpp:25:
> In file included from /opt/mesos/include/mesos/slave/isolator.hpp:30:
> In file included from /opt/mesos/include/process/dispatch.hpp:22:
> In file included from /opt/mesos/include/process/process.hpp:26:
> In file included from /opt/mesos/include/process/event.hpp:21:
> In file included from /opt/mesos/include/process/http.hpp:39:
> /opt/mesos/include/stout/json.hpp:23:10: fatal error: 'picojson.h' file not 
> found
> #include 
>  ^
> 8 warnings and 1 error generated.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3882) Libprocess: Implement process::Clock::finalize

2015-11-10 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3882:


 Summary: Libprocess: Implement process::Clock::finalize
 Key: MESOS-3882
 URL: https://issues.apache.org/jira/browse/MESOS-3882
 Project: Mesos
  Issue Type: Task
  Components: libprocess, test
Reporter: Joseph Wu
Assignee: Joseph Wu


Tracks this 
[TODO|https://github.com/apache/mesos/blob/aa0cd7ed4edf1184cbc592b5caa2429a8373e813/3rdparty/libprocess/src/process.cpp#L974-L975].

The {{Clock}} is initialized with a callback that, among other things, will 
dereference the global {{process_manager}} object.

When libprocess is shutting down, the {{process_manager}} is cleaned up.  
Between cleanup and termination of libprocess, there is some chance that a 
{{Timer}} will time out and result in dereferencing {{process_manager}}.

*Proposal* 
* Implement {{Clock::finalize}}.  This would clear:
** existing timers
** process-specific clocks
** ticks
* Change {{process::finalize}}.
*# Resume the clock.  (The clock is only paused during some tests.)  When the 
clock is not paused, the callback does not dereference {{process_manager}}.
*# Clean up {{process_manager}}.  This terminates all the processes that would 
potentially interact with {{Clock}}.
*# Call {{Clock::finalize}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3863) Investigate the requirements of programmatically re-initializing libprocess

2015-11-10 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3863:
-
Description: 
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

{{process::finalize}} should cover the following components:
* {{__s__}} (the server socket)
** {{delete}} should be sufficient.  This closes the socket and thereby 
prevents any further interaction from it.
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
** Manages the {{EventLoop}}.
* {{Clock}}
** The goal here is to clear any timers so that nothing can deference 
{{process_manager}} while we're finalizing/finalized.  It's probably not 
important to execute any remaining timers, since we're "shutting down" 
libprocess.  This means:
*** The clock should be {{paused}} and {{settled}} before the clean up of 
{{process_manager}}.
*** Processes, which might interact with the {{Clock}}, should be cleaned up 
next.
*** A new {{Clock::finalize}} method would then clear timers, process-specific 
clocks, and {{tick}} s; and then {{resume}} the clock.
* {{__address__}} (the advertised IP and port)
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{socket_manager}}
** The idea here is to close all sockets and deallocate any existing 
{{HttpProxy}} or {{Encoder}} objects.
** All sockets are created via {{__s__}}, so cleaning up the server socket 
prior will prevent any new activity.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

*Summary*:
* Implement {{Clock::finalize}}.  [MESOS-3882]
* Implement {{~SocketManager}}.
* Clean up {{mime}}.
* Wrap everything up in {{process::finalize}}.

  was:
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

{{process::finalize}} should cover the following components:
* {{__s__}} (the server socket)
** {{delete}} should be sufficient.  This closes the socket and thereby 
prevents any further interaction from it.
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
** Manages the {{EventLoop}}.
* {{Clock}}
** The goal here is to clear any timers so that nothing can deference 
{{process_manager}} while we're finalizing/finalized.  It's probably not 
important to execute any remaining timers, since we're "shutting down" 
libprocess.  This means:
*** The clock should be {{paused}} and {{settled}} before the clean up of 
{{process_manager}}.
*** Processes, which might interact with the {{Clock}}, should be cleaned up 
next.
*** A new {{Clock::finalize}} method would then clear timers, process-specific 
clocks, and {{tick}} s; and then {{resume}} the clock.
* {{__address__}} (the advertised IP and port)
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{socket_manager}}
** The idea here is to close all sockets and deallocate any existing 
{{HttpProxy}} or {{Encoder}} objects.
** All sockets are created via {{__s__}}, so cleaning up the server socket 
prior will prevent any new activity.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

*Summary*:
* Implement {{Clock::finalize}}.
* Implement {{~SocketManager}}.
* Clean up {{mime}}.
* Wrap everything up in {{process::finalize}}.


> Investigate the requirements of programmatically re-initializing libprocess
> ---
>
> Key: MESOS-3863
>  

[jira] [Updated] (MESOS-2081) Add safety constraints for maintenance primitives.

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-2081:
-
Labels: twitter  (was: mesosphere twitter)

> Add safety constraints for maintenance primitives.
> --
>
> Key: MESOS-2081
> URL: https://issues.apache.org/jira/browse/MESOS-2081
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Klaus Ma
>  Labels: twitter
>
> In order to ensure that the maintenance primitives can be used safely by 
> operators, we want to put a few safety mechanisms in place. Some ideas from 
> the [design 
> doc|https://docs.google.com/a/twitter.com/document/d/16k0lVwpSGVOyxPSyXKmGC-gbNmRlisNEe4p-fAUSojk/]:
> # Prevent bad schedules from being constructed: schedules with more than x% 
> overlap in slaves are rejected.
> # Prevent bad maintenance from proceeding unchecked: if x% of the slaves are 
> not being unscheduled, or are not re-registering, cancel the schedule.
> These will likely be configurable via flags.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2080) Add master metrics for maintenance.

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-2080:
-
Labels: twitter  (was: mesosphere twitter)

> Add master metrics for maintenance.
> ---
>
> Key: MESOS-2080
> URL: https://issues.apache.org/jira/browse/MESOS-2080
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Benjamin Mahler
>Assignee: Yong Qiao Wang
>  Labels: twitter
>
> We'll need metrics in order to gain visibility into the maintenance 
> functionality. This will also allow operators to add alerting on these 
> metrics, in particular:
> # Number of scheduled hosts.
> # Number of active windows.
> # Number of expired windows.
> # Number of successful drains.
> # Number of failed drains.
> As an example of an alert guideline, we would want to know the number of 
> expired windows as a gauge to ensure that it is not growing excessively. This 
> allows alerting to catch when operators are not properly unscheduling 
> maintenance once it is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3820) Refactor libprocess initialization to allow for test-only reinitialization of the server socket

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3820:
-
Issue Type: Story  (was: Task)

> Refactor libprocess initialization to allow for test-only reinitialization of 
> the server socket
> ---
>
> Key: MESOS-3820
> URL: https://issues.apache.org/jira/browse/MESOS-3820
> Project: Mesos
>  Issue Type: Story
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> *Background*
> Libprocess initialization includes the spawning of a variety of global 
> processes and the creation of the server socket which listens for incoming 
> requests.  Some properties of the server socket are configured via 
> environment variables, such as the IP and port or the SSL configuration.
> In the case of tests, libprocess is initialized once per test binary.  This 
> means that testing different configurations (SSL in particular) is cumbersome 
> as a separate process would be needed for every test case.
> *Proposal* (Still under investigation)
> # Augment {{process::finalize}} to completely clean up libprocess.
> #* {{process_manager}}
> #* {{socket_manager}}
> #* {{EventLoop}}
> #* {{Clock}}
> #* {{__s__}}
> #* {{__address__}}
> #* Garbage collector, help, logging, profiler, statistics, route processes 
> (should fall under {{process_manager}}).
> #* {{mime}}
> # Add a test-only {{process::reinitialize}} function, which should be roughly 
> equivalent to a first-time run of {{process::initialize}}.
> -*Proposal to swap out server socket*- (Does not work)
> # Follow the [example of the SSL 
> library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
>  and allow tests to declare an internal function for re-initializing a 
> portion of libprocess.
> # Move the [existing creation of the server 
> socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
>  into a {{reinitialize_server_socket}} function.
> # Add any necessary cleanup for swapping server sockets.
> # Consider whether any additional locking is required in the 
> {{reinitialize_server_socket}} function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3864) Simplify and/or document the libprocess initialization synchronization logic

2015-11-09 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3864:


 Summary: Simplify and/or document the libprocess initialization 
synchronization logic
 Key: MESOS-3864
 URL: https://issues.apache.org/jira/browse/MESOS-3864
 Project: Mesos
  Issue Type: Task
  Components: libprocess
Reporter: Joseph Wu
Assignee: Joseph Wu
Priority: Minor


Tracks this 
[TODO|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L749].

The [synchronization logic of 
libprocess|https://github.com/apache/mesos/commit/cd757cf75637c92c438bf4cd22f21ba1b5be702f#diff-128d3b56fc8c9ec0176fdbadcfd11fc2]
 [predates 
abstractions|https://github.com/apache/mesos/commit/6c3b107e4e02d5ba0673eb3145d71ec9d256a639#diff-0eebc8689450916990abe080d86c2acb]
 like {{process::Once}}, which is used in almost all other one-time 
initialization blocks.  

The logic should be documented.  It can also be simplified (see the [review 
description|https://reviews.apache.org/r/39949/]).  Or it can be replaced with 
{{process::Once}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3820) Refactor libprocess initialization to allow for test-only reinitialization of the server socket

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3820:
-
Sprint: Mesosphere Sprint 22

> Refactor libprocess initialization to allow for test-only reinitialization of 
> the server socket
> ---
>
> Key: MESOS-3820
> URL: https://issues.apache.org/jira/browse/MESOS-3820
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> *Background*
> Libprocess initialization includes the spawning of a variety of global 
> processes and the creation of the server socket which listens for incoming 
> requests.  Some properties of the server socket are configured via 
> environment variables, such as the IP and port or the SSL configuration.
> In the case of tests, libprocess is initialized once per test binary.  This 
> means that testing different configurations (SSL in particular) is cumbersome 
> as a separate process would be needed for every test case.
> *Proposal* (Still under investigation)
> # Augment {{process::finalize}} to completely clean up libprocess.
> #* {{process_manager}}
> #* {{socket_manager}}
> #* {{EventLoop}}
> #* {{Clock}}
> #* {{__s__}}
> #* {{__address__}}
> #* Garbage collector, help, logging, profiler, statistics, route processes 
> (should fall under {{process_manager}}).
> #* {{mime}}
> # Add a test-only {{process::reinitialize}} function, which should be roughly 
> equivalent to a first-time run of {{process::initialize}}.
> -*Proposal to swap out server socket*- (Does not work)
> # Follow the [example of the SSL 
> library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
>  and allow tests to declare an internal function for re-initializing a 
> portion of libprocess.
> # Move the [existing creation of the server 
> socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
>  into a {{reinitialize_server_socket}} function.
> # Add any necessary cleanup for swapping server sockets.
> # Consider whether any additional locking is required in the 
> {{reinitialize_server_socket}} function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3820) Refactor libprocess initialization to allow for test-only reinitialization of the server socket

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3820:
-
Story Points: 8  (was: 5)

> Refactor libprocess initialization to allow for test-only reinitialization of 
> the server socket
> ---
>
> Key: MESOS-3820
> URL: https://issues.apache.org/jira/browse/MESOS-3820
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> *Background*
> Libprocess initialization includes the spawning of a variety of global 
> processes and the creation of the server socket which listens for incoming 
> requests.  Some properties of the server socket are configured via 
> environment variables, such as the IP and port or the SSL configuration.
> In the case of tests, libprocess is initialized once per test binary.  This 
> means that testing different configurations (SSL in particular) is cumbersome 
> as a separate process would be needed for every test case.
> *Proposal* (Still under investigation)
> # Augment {{process::finalize}} to completely clean up libprocess.
> #* {{process_manager}}
> #* {{socket_manager}}
> #* {{EventLoop}}
> #* {{Clock}}
> #* {{__s__}}
> #* {{__address__}}
> #* Garbage collector, help, logging, profiler, statistics, route processes 
> (should fall under {{process_manager}}).
> #* {{mime}}
> # Add a test-only {{process::reinitialize}} function, which should be roughly 
> equivalent to a first-time run of {{process::initialize}}.
> -*Proposal to swap out server socket*- (Does not work)
> # Follow the [example of the SSL 
> library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
>  and allow tests to declare an internal function for re-initializing a 
> portion of libprocess.
> # Move the [existing creation of the server 
> socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
>  into a {{reinitialize_server_socket}} function.
> # Add any necessary cleanup for swapping server sockets.
> # Consider whether any additional locking is required in the 
> {{reinitialize_server_socket}} function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3183) Documentation images do not load

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3183:
-
Assignee: (was: Joseph Wu)

> Documentation images do not load
> 
>
> Key: MESOS-3183
> URL: https://issues.apache.org/jira/browse/MESOS-3183
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.0
>Reporter: James Mulcahy
>Priority: Minor
>  Labels: mesosphere
> Attachments: rake.patch
>
>
> Any images which are referenced from the generated docs ({{docs/*.md}}) do 
> not show up on the website.  For example:
> * [External 
> Containerizer|http://mesos.apache.org/documentation/latest/external-containerizer/]
> * [Fetcher Cache 
> Internals|http://mesos.apache.org/documentation/latest/fetcher-cache-internals/]
> * [Maintenance|http://mesos.apache.org/documentation/latest/maintenance/] 
> * 
> [Oversubscription|http://mesos.apache.org/documentation/latest/oversubscription/]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3864) Simplify and/or document the libprocess initialization synchronization logic

2015-11-09 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997502#comment-14997502
 ] 

Joseph Wu commented on MESOS-3864:
--

Reviews:
* Remove some stale comments: https://reviews.apache.org/r/39948/
* Document/simplify synchronization code: https://reviews.apache.org/r/39949/

> Simplify and/or document the libprocess initialization synchronization logic
> 
>
> Key: MESOS-3864
> URL: https://issues.apache.org/jira/browse/MESOS-3864
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Minor
>  Labels: mesosphere
>
> Tracks this 
> [TODO|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L749].
> The [synchronization logic of 
> libprocess|https://github.com/apache/mesos/commit/cd757cf75637c92c438bf4cd22f21ba1b5be702f#diff-128d3b56fc8c9ec0176fdbadcfd11fc2]
>  [predates 
> abstractions|https://github.com/apache/mesos/commit/6c3b107e4e02d5ba0673eb3145d71ec9d256a639#diff-0eebc8689450916990abe080d86c2acb]
>  like {{process::Once}}, which is used in almost all other one-time 
> initialization blocks.  
> The logic should be documented.  It can also be simplified (see the [review 
> description|https://reviews.apache.org/r/39949/]).  Or it can be replaced 
> with {{process::Once}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2222) Add ACLs for the maintenance HTTP endpoints.

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-:
-
Labels:   (was: mesosphere)

> Add ACLs for the maintenance HTTP endpoints.
> 
>
> Key: MESOS-
> URL: https://issues.apache.org/jira/browse/MESOS-
> Project: Mesos
>  Issue Type: Task
>Reporter: Benjamin Mahler
>Assignee: Chen Zhiwei
>
> In order to authorize the HTTP endpoints for maintenance (to be added in 
> MESOS-2067), we will need to add an ACL definition for performing maintenance 
> operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3820) Refactor libprocess initialization to allow for test-only reinitialization of the server socket

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3820:
-
Sprint:   (was: Mesosphere Sprint 23)

> Refactor libprocess initialization to allow for test-only reinitialization of 
> the server socket
> ---
>
> Key: MESOS-3820
> URL: https://issues.apache.org/jira/browse/MESOS-3820
> Project: Mesos
>  Issue Type: Story
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> *Background*
> Libprocess initialization includes the spawning of a variety of global 
> processes and the creation of the server socket which listens for incoming 
> requests.  Some properties of the server socket are configured via 
> environment variables, such as the IP and port or the SSL configuration.
> In the case of tests, libprocess is initialized once per test binary.  This 
> means that testing different configurations (SSL in particular) is cumbersome 
> as a separate process would be needed for every test case.
> *Proposal* (Still under investigation)
> # Augment {{process::finalize}} to completely clean up libprocess.
> #* {{process_manager}}
> #* {{socket_manager}}
> #* {{EventLoop}}
> #* {{Clock}}
> #* {{__s__}}
> #* {{__address__}}
> #* Garbage collector, help, logging, profiler, statistics, route processes 
> (should fall under {{process_manager}}).
> #* {{mime}}
> # Add a test-only {{process::reinitialize}} function, which should be roughly 
> equivalent to a first-time run of {{process::initialize}}.
> -*Proposal to swap out server socket*- (Does not work)
> # Follow the [example of the SSL 
> library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
>  and allow tests to declare an internal function for re-initializing a 
> portion of libprocess.
> # Move the [existing creation of the server 
> socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
>  into a {{reinitialize_server_socket}} function.
> # Add any necessary cleanup for swapping server sockets.
> # Consider whether any additional locking is required in the 
> {{reinitialize_server_socket}} function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3863) Investigate the requirements of programmatically re-initializing libprocess

2015-11-09 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3863:


 Summary: Investigate the requirements of programmatically 
re-initializing libprocess
 Key: MESOS-3863
 URL: https://issues.apache.org/jira/browse/MESOS-3863
 Project: Mesos
  Issue Type: Task
  Components: libprocess, test
Reporter: Joseph Wu
Assignee: Joseph Wu


This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

*{{process::initialize}} covers the following components:*
* {{process_manager}}
** Garbage collector, help, logging, profiler, statistics, route processes.
** Any other {{spawn}} 'd process.
* {{socket_manager}}
* {{EventLoop}}
* {{Clock}}
* {{__s__}}
* {{__address__}}
* {{mime}}
** This is effectively a static map.  (It should be possible to clean this up.)

(Note: the list above is still incomplete/under-investigation.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2077) Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-2077:
-
Target Version/s:   (was: 0.27.0)

> Ensure that TASK_LOSTs for a hard slave drain (SIGUSR1) include a Reason.
> -
>
> Key: MESOS-2077
> URL: https://issues.apache.org/jira/browse/MESOS-2077
> Project: Mesos
>  Issue Type: Improvement
>  Components: master, slave
>Reporter: Benjamin Mahler
>Assignee: Guangya Liu
>  Labels: twitter
>
> For maintenance, sometimes operators will force the drain of a slave (via 
> SIGUSR1), when deemed safe (e.g. non-critical tasks running) and/or necessary 
> (e.g. bad hardware).
> To eliminate alerting noise, we'd like to add a 'Reason' that expresses the 
> forced drain of the slave, so that these are not considered to be a generic 
> slave removal TASK_LOST.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3041) Decline call does not include an optional "reason", in the Event/Call API

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3041:
-
Labels:   (was: mesosphere)

> Decline call does not include an optional "reason", in the Event/Call API
> -
>
> Key: MESOS-3041
> URL: https://issues.apache.org/jira/browse/MESOS-3041
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Joseph Wu
>Assignee: Guangya Liu
>
> In the Event/Call API, the Decline call is currently used by frameworks to 
> reject resource offers.
> In the case of InverseOffers, the framework could give additional information 
> to the operators and/or allocator, as to why the InverseOffer is declined. 
> i.e. Suppose a cluster running some consensus algorithm is given an 
> InverseOffer on one of its nodes.  It may decline saying "Too few nodes" (or, 
> more verbosely, "Specified InverseOffer would lower the number of active 
> nodes below quorum").
> This change requires the following changes:
> * include/mesos/scheduler/scheduler.proto:
> {code}
> message Call {
>   ...
>   message Decline {
> repeated OfferID offer_ids = 1;
> optional Filters filters = 2;
> // Add this extra string for each OfferID
> // i.e. reasons[i] is for offer_ids[i]
> repeated string reasons = 3;
>   }
>   ...
> }
> {code}
> * src/master/master.cpp
> Change Master::decline to either store the reason, or log it.
> * Add a declineOffer overload in the (Mesos)SchedulerDriver with an optional 
> "reason".
> ** Extend the interface in include/mesos/scheduler.hpp
> ** Add/change the declineOffer method in src/sched/sched.cpp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3420) Resolve shutdown semantics for Machine/Down

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3420:
-
Target Version/s:   (was: 0.27.0)

> Resolve shutdown semantics for Machine/Down
> ---
>
> Key: MESOS-3420
> URL: https://issues.apache.org/jira/browse/MESOS-3420
> Project: Mesos
>  Issue Type: Task
>Reporter: Joris Van Remoortere
>Assignee: Klaus Ma
>  Labels: maintenance, mesosphere
>
> When an operator uses the {{machine/down}} endpoint, the master sends a 
> shutdown message to the agent.
> We need to discuss and resolve the semantics that we want regarding the 
> operators and frameworks knowing when their tasks are terminated.
> One option is to explicitly remove the agent from the master which will send 
> the {{TASK_LOST}} updates and {{SlaveLostMessage}} directly from the master. 
> The concern around this is that during a network partition, or if the agent 
> was down at the time, that these tasks could still be running.
> This is a general problem related to task life-times being dissociated with 
> that life-time of the agent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3820) Refactor libprocess initialization to allow for test-only reinitialization of the server socket

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3820:
-
Description: 
*Background*
Libprocess initialization includes the spawning of a variety of global 
processes and the creation of the server socket which listens for incoming 
requests.  Some properties of the server socket are configured via environment 
variables, such as the IP and port or the SSL configuration.

In the case of tests, libprocess is initialized once per test binary.  This 
means that testing different configurations (SSL in particular) is cumbersome 
as a separate process would be needed for every test case.

*Proposal* (Still under investigation)
# Investigate using {{process::finalize}} to completely clean up libprocess.  
See [MESOS-3863].
# Add a test-only {{process::reinitialize}} function, which should be roughly 
equivalent to a first-time run of {{process::initialize}}.

-*Proposal to swap out server socket*- (Does not work)
# Follow the [example of the SSL 
library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
 and allow tests to declare an internal function for re-initializing a portion 
of libprocess.
# Move the [existing creation of the server 
socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
 into a {{reinitialize_server_socket}} function.
# Add any necessary cleanup for swapping server sockets.
# Consider whether any additional locking is required in the 
{{reinitialize_server_socket}} function.

  was:
*Background*
Libprocess initialization includes the spawning of a variety of global 
processes and the creation of the server socket which listens for incoming 
requests.  Some properties of the server socket are configured via environment 
variables, such as the IP and port or the SSL configuration.

In the case of tests, libprocess is initialized once per test binary.  This 
means that testing different configurations (SSL in particular) is cumbersome 
as a separate process would be needed for every test case.

*Proposal* (Still under investigation)
# Augment {{process::finalize}} to completely clean up libprocess.
#* {{process_manager}}
#* {{socket_manager}}
#* {{EventLoop}}
#* {{Clock}}
#* {{__s__}}
#* {{__address__}}
#* Garbage collector, help, logging, profiler, statistics, route processes 
(should fall under {{process_manager}}).
#* {{mime}}
# Add a test-only {{process::reinitialize}} function, which should be roughly 
equivalent to a first-time run of {{process::initialize}}.

-*Proposal to swap out server socket*- (Does not work)
# Follow the [example of the SSL 
library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
 and allow tests to declare an internal function for re-initializing a portion 
of libprocess.
# Move the [existing creation of the server 
socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
 into a {{reinitialize_server_socket}} function.
# Add any necessary cleanup for swapping server sockets.
# Consider whether any additional locking is required in the 
{{reinitialize_server_socket}} function.


> Refactor libprocess initialization to allow for test-only reinitialization of 
> the server socket
> ---
>
> Key: MESOS-3820
> URL: https://issues.apache.org/jira/browse/MESOS-3820
> Project: Mesos
>  Issue Type: Story
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> *Background*
> Libprocess initialization includes the spawning of a variety of global 
> processes and the creation of the server socket which listens for incoming 
> requests.  Some properties of the server socket are configured via 
> environment variables, such as the IP and port or the SSL configuration.
> In the case of tests, libprocess is initialized once per test binary.  This 
> means that testing different configurations (SSL in particular) is cumbersome 
> as a separate process would be needed for every test case.
> *Proposal* (Still under investigation)
> # Investigate using {{process::finalize}} to completely clean up libprocess.  
> See [MESOS-3863].
> # Add a test-only {{process::reinitialize}} function, which should be roughly 
> equivalent to a first-time run of {{process::initialize}}.
> -*Proposal to swap out server socket*- (Does not work)
> # Follow the [example of the SSL 
> library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
>  and allow tests to declare an internal function for re-initializing a 
> portion of libprocess.
> # Move the [existing creation of the server 
> 

[jira] [Updated] (MESOS-3863) Investigate the requirements of programmatically re-initializing libprocess

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3863:
-
Description: 
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

*{{process::initialize}} covers the following components:*
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
* {{socket_manager}}
* {{EventLoop}}
* {{Clock}}
* {{__s__}}
** {{delete}} should be sufficient.
* {{__address__}}
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

(Note: the list above is still incomplete/under-investigation.)

  was:
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

*{{process::initialize}} covers the following components:*
* {{process_manager}}
** Garbage collector, help, logging, profiler, statistics, route processes 
(including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Any other {{spawn}} 'd process.
* {{socket_manager}}
* {{EventLoop}}
* {{Clock}}
* {{__s__}}
* {{__address__}}
* {{mime}}
** This is effectively a static map.  (It should be possible to clean this up.)

(Note: the list above is still incomplete/under-investigation.)


> Investigate the requirements of programmatically re-initializing libprocess
> ---
>
> Key: MESOS-3863
> URL: https://issues.apache.org/jira/browse/MESOS-3863
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> This issue is for investigating what needs to be added/changed in 
> {{process::finalize}} such that {{process::initialize}} will start on a clean 
> slate.  Additional issues will be created once done.  Also see [the parent 
> issue|MESOS-3820].
> *{{process::initialize}} covers the following components:*
> * {{process_manager}}
> ** Related prior work: [MESOS-3158]
> ** Cleans up the garbage collector, help, logging, profiler, statistics, 
> route processes (including [this 
> one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
>  which currently leaks a pointer).
> ** Cleans up any other {{spawn}} 'd process.
> * {{socket_manager}}
> * {{EventLoop}}
> * {{Clock}}
> * {{__s__}}
> ** {{delete}} should be sufficient.
> * {{__address__}}
> ** Needs to be cleared after {{process_manager}} has been cleaned up.  
> Processes use this to communicate events.  If cleared prematurely, 
> {{TerminateEvents}} will not be sent correctly, leading to infinite waits.
> * {{mime}}
> ** This is effectively a static map.
> ** It should be possible to statically initialize it.
> * Synchronization atomics {{initialized}} & {{initializing}}.
> ** Once cleanup is done, these should be reset.
> (Note: the list above is still incomplete/under-investigation.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3863) Investigate the requirements of programmatically re-initializing libprocess

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3863:
-
Description: 
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

*{{process::initialize}} covers the following components:*
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
* {{socket_manager}}
* {{EventLoop}}
* {{Clock}}
** The goal here is to clear any timers so that nothing can deference 
{{process_manager}} while we're finalizing/finalized.  It's probably not 
important to execute any remaining timers, since we're "shutting down" 
libprocess.  This means:
*** The clock should be {{paused}} and {{settled}} before the clean up of 
{{process_manager}}.
*** Processes, which might interact with the {{Clock}}, should be cleaned up 
next.
*** A new {{Clock::finalize}} method would then clear timers, process-specific 
clocks, and {{tick}} s; and then {{resume}} the clock.
* {{__s__}}
** {{delete}} should be sufficient.
* {{__address__}}
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

(Note: the list above is still incomplete/under-investigation.)

  was:
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

*{{process::initialize}} covers the following components:*
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
* {{socket_manager}}
* {{EventLoop}}
* {{Clock}}
* {{__s__}}
** {{delete}} should be sufficient.
* {{__address__}}
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

(Note: the list above is still incomplete/under-investigation.)


> Investigate the requirements of programmatically re-initializing libprocess
> ---
>
> Key: MESOS-3863
> URL: https://issues.apache.org/jira/browse/MESOS-3863
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> This issue is for investigating what needs to be added/changed in 
> {{process::finalize}} such that {{process::initialize}} will start on a clean 
> slate.  Additional issues will be created once done.  Also see [the parent 
> issue|MESOS-3820].
> *{{process::initialize}} covers the following components:*
> * {{process_manager}}
> ** Related prior work: [MESOS-3158]
> ** Cleans up the garbage collector, help, logging, profiler, statistics, 
> route processes (including [this 
> one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
>  which currently leaks a pointer).
> ** Cleans up any other {{spawn}} 'd process.
> * {{socket_manager}}
> * {{EventLoop}}
> * {{Clock}}
> ** The goal here is to clear any timers so that nothing can deference 
> {{process_manager}} while we're finalizing/finalized.  It's probably not 
> important to execute any remaining timers, since we're "shutting down" 
> libprocess.  This means:
> *** The clock should be {{paused}} and {{settled}} before the clean up of 
> {{process_manager}}.
> *** Processes, which might interact with the {{Clock}}, should be cleaned up 
> next.
> *** A new 

[jira] [Commented] (MESOS-3851) Investigate recent crashes in Command Executor

2015-11-09 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997759#comment-14997759
 ] 

Joseph Wu commented on MESOS-3851:
--

Here's another one ({{<...>}} indicates stuff I truncated):
{code}
[ RUN  ] SlaveTest.HTTPSchedulerSlaveRestart
<...>
I1110 00:36:30.616757  5169 exec.cpp:297] Executor asked to run task 
'a58d3992-429f-4e26-b60c-72ddcce3b804'
I1110 00:36:30.616987  5169 exec.cpp:306] Executor::launchTask took 160701ns
F1110 00:36:30.617060  5166 executor.cpp:184] CHECK_SOME(executorInfo): is NONE 
Starting task a58d3992-429f-4e26-b60c-72ddcce3b804
*** Check failure stack trace: ***
I1110 00:36:30.618422  5163 exec.cpp:210] Executor registered on slave 
cfa4cfcf-5fce-4ce7-89b8-c466ef8c6bc3-S0
@ 0x7f917a1b911e  google::LogMessage::Fail()
I1110 00:36:30.621285  5163 exec.cpp:222] Executor::registered took 399555ns
@ 0x7f917a1b907d  google::LogMessage::SendToLog()
@ 0x7f917a1b8a8e  google::LogMessage::Flush()
@ 0x7f917a1bb7c2  google::LogMessageFatal::~LogMessageFatal()
@   0x48d14a  _CheckFatal::~_CheckFatal()
@   0x49cae7  mesos::internal::CommandExecutorProcess::launchTask()
@   0x4b43d9  
_ZZN7process8dispatchIN5mesos8internal22CommandExecutorProcessEPNS1_14ExecutorDriverERKNS1_8TaskInfoES5_S6_EEvRKNS_3PIDIT_EEMSA_FvT0_T1_ET2_T3_ENKUlPNS_11ProcessBaseEE_clESL_
@   0x4c4d59  
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal22CommandExecutorProcessEPNS5_14ExecutorDriverERKNS5_8TaskInfoES9_SA_EEvRKNS0_3PIDIT_EEMSE_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
@ 0x7f917a13f83f  std::function<>::operator()()
@ 0x7f917a127659  process::ProcessBase::visit()
@ 0x7f917a12b424  process::DispatchEvent::visit()
@   0x48e144  process::ProcessBase::serve()
@ 0x7f917a123a45  process::ProcessManager::resume()
@ 0x7f917a120c76  
_ZZN7process14ProcessManager12init_threadsEvENKUlRKSt11atomic_boolE_clES3_
@ 0x7f917a12ac50  
_ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEE6__callIvIEILm0T_OSt5tupleIIDpT0_EESt12_Index_tupleIIXspT1_EEE
@ 0x7f917a12ac00  
_ZNSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS3_EEEclIIEvEET0_DpOT_
@ 0x7f917a12ab92  
_ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
@ 0x7f917a12aae9  
_ZNSt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS4_EEEvEEclEv
@ 0x7f917a12aa82  
_ZNSt6thread5_ImplISt12_Bind_simpleIFSt5_BindIFZN7process14ProcessManager12init_threadsEvEUlRKSt11atomic_boolE_St17reference_wrapperIS6_EEEvEEE6_M_runEv
@ 0x7f9175c261e0  (unknown)
@ 0x7f9175e7fdf5  start_thread
@ 0x7f917538e1ad  __clone
<...>
../../src/tests/slave_tests.cpp:2829: Failure
Value of: status.get().state()
  Actual: TASK_FAILED
<...>
Expected: TASK_RUNNING
<...>
../../src/tests/slave_tests.cpp:2855: Failure
Failed to wait 15secs for updateFrameworkMessage
<...>
../../3rdparty/libprocess/include/process/gmock.hpp:365: Failure
Actual function call count doesn't match EXPECT_CALL(filter->mock, 
filter(testing::A()))...
Expected args: message matcher (8-byte object <18-02 04-A0 78-7F 00-00>, 
1-byte object <02>, 1-byte object <18>)
 Expected: to be called once
   Actual: never called - unsatisfied and active
[  FAILED  ] SlaveTest.HTTPSchedulerSlaveRestart (15688 ms)
{code}
Full logs: 
https://builds.apache.org/job/Mesos/COMPILER=gcc,CONFIGURATION=--verbose,OS=centos:7,label_exp=docker||Hadoop/1206/consoleFull

> Investigate recent crashes in Command Executor
> --
>
> Key: MESOS-3851
> URL: https://issues.apache.org/jira/browse/MESOS-3851
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Reporter: Anand Mazumdar
>Priority: Blocker
>  Labels: mesosphere
>
> Post https://reviews.apache.org/r/38900 i.e. updating CommandExecutor to 
> support rootfs. There seem to be some tests showing frequent crashes due to 
> assert violations.
> {{FetcherCacheTest.SimpleEviction}} failed due to the following log:
> {code}
> I1107 19:36:46.360908 30657 slave.cpp:1793] Sending queued task '3' to 
> executor ''3' of framework 7d94c7fb-8950-4bcf-80c1-46112292dcd6- at 
> executor(1)@172.17.5.200:33871'
> I1107 19:36:46.363682  1236 exec.cpp:297] 
> I1107 19:36:46.373569  1245 exec.cpp:210] Executor registered on slave 
> 7d94c7fb-8950-4bcf-80c1-46112292dcd6-S0
> @ 0x7f9f5a7db3fa  google::LogMessage::Fail()
> I1107 

[jira] [Updated] (MESOS-3863) Investigate the requirements of programmatically re-initializing libprocess

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3863:
-
Description: 
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

*{{process::initialize}} covers the following components:*
* {{process_manager}}
** Garbage collector, help, logging, profiler, statistics, route processes 
(including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Any other {{spawn}} 'd process.
* {{socket_manager}}
* {{EventLoop}}
* {{Clock}}
* {{__s__}}
* {{__address__}}
* {{mime}}
** This is effectively a static map.  (It should be possible to clean this up.)

(Note: the list above is still incomplete/under-investigation.)

  was:
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

*{{process::initialize}} covers the following components:*
* {{process_manager}}
** Garbage collector, help, logging, profiler, statistics, route processes.
** Any other {{spawn}} 'd process.
* {{socket_manager}}
* {{EventLoop}}
* {{Clock}}
* {{__s__}}
* {{__address__}}
* {{mime}}
** This is effectively a static map.  (It should be possible to clean this up.)

(Note: the list above is still incomplete/under-investigation.)


> Investigate the requirements of programmatically re-initializing libprocess
> ---
>
> Key: MESOS-3863
> URL: https://issues.apache.org/jira/browse/MESOS-3863
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> This issue is for investigating what needs to be added/changed in 
> {{process::finalize}} such that {{process::initialize}} will start on a clean 
> slate.  Additional issues will be created once done.  Also see [the parent 
> issue|MESOS-3820].
> *{{process::initialize}} covers the following components:*
> * {{process_manager}}
> ** Garbage collector, help, logging, profiler, statistics, route processes 
> (including [this 
> one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
>  which currently leaks a pointer).
> ** Any other {{spawn}} 'd process.
> * {{socket_manager}}
> * {{EventLoop}}
> * {{Clock}}
> * {{__s__}}
> * {{__address__}}
> * {{mime}}
> ** This is effectively a static map.  (It should be possible to clean this 
> up.)
> (Note: the list above is still incomplete/under-investigation.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3863) Investigate the requirements of programmatically re-initializing libprocess

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3863:
-
Description: 
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

{{process::finalize}} should cover the following components:
* {{__s__}} (the server socket)
** {{delete}} should be sufficient.  This closes the socket and thereby 
prevents any further interaction from it.
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
** Manages the {{EventLoop}}.
* {{Clock}}
** The goal here is to clear any timers so that nothing can deference 
{{process_manager}} while we're finalizing/finalized.  It's probably not 
important to execute any remaining timers, since we're "shutting down" 
libprocess.  This means:
*** The clock should be {{paused}} and {{settled}} before the clean up of 
{{process_manager}}.
*** Processes, which might interact with the {{Clock}}, should be cleaned up 
next.
*** A new {{Clock::finalize}} method would then clear timers, process-specific 
clocks, and {{tick}} s; and then {{resume}} the clock.
* {{__address__}} (the advertised IP and port)
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{socket_manager}}
** The idea here is to close all sockets and deallocate any existing 
{{HttpProxy}} or {{Encoder}} objects.
** All sockets are created via {{__s__}}, so cleaning up the server socket 
prior will prevent any new activity.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

(Note: the list above is still incomplete/under-investigation.)

  was:
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

*{{process::initialize}} covers the following components:*
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
* {{socket_manager}}
* {{EventLoop}}
* {{Clock}}
** The goal here is to clear any timers so that nothing can deference 
{{process_manager}} while we're finalizing/finalized.  It's probably not 
important to execute any remaining timers, since we're "shutting down" 
libprocess.  This means:
*** The clock should be {{paused}} and {{settled}} before the clean up of 
{{process_manager}}.
*** Processes, which might interact with the {{Clock}}, should be cleaned up 
next.
*** A new {{Clock::finalize}} method would then clear timers, process-specific 
clocks, and {{tick}} s; and then {{resume}} the clock.
* {{__s__}}
** {{delete}} should be sufficient.
* {{__address__}}
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

(Note: the list above is still incomplete/under-investigation.)


> Investigate the requirements of programmatically re-initializing libprocess
> ---
>
> Key: MESOS-3863
> URL: https://issues.apache.org/jira/browse/MESOS-3863
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> This issue is for investigating what needs to be added/changed in 
> {{process::finalize}} such that {{process::initialize}} will start on a clean 
> slate.  Additional issues will be created once done.  Also see [the parent 
> issue|MESOS-3820].
> {{process::finalize}} should 

[jira] [Updated] (MESOS-3863) Investigate the requirements of programmatically re-initializing libprocess

2015-11-09 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3863:
-
Description: 
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

{{process::finalize}} should cover the following components:
* {{__s__}} (the server socket)
** {{delete}} should be sufficient.  This closes the socket and thereby 
prevents any further interaction from it.
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
** Manages the {{EventLoop}}.
* {{Clock}}
** The goal here is to clear any timers so that nothing can deference 
{{process_manager}} while we're finalizing/finalized.  It's probably not 
important to execute any remaining timers, since we're "shutting down" 
libprocess.  This means:
*** The clock should be {{paused}} and {{settled}} before the clean up of 
{{process_manager}}.
*** Processes, which might interact with the {{Clock}}, should be cleaned up 
next.
*** A new {{Clock::finalize}} method would then clear timers, process-specific 
clocks, and {{tick}} s; and then {{resume}} the clock.
* {{__address__}} (the advertised IP and port)
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{socket_manager}}
** The idea here is to close all sockets and deallocate any existing 
{{HttpProxy}} or {{Encoder}} objects.
** All sockets are created via {{__s__}}, so cleaning up the server socket 
prior will prevent any new activity.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

*Summary*:
* Implement {{Clock::finalize}}.
* Implement {{~SocketManager}}.
* Clean up {{mime}}.
* Wrap everything up in {{process::finalize}}.

  was:
This issue is for investigating what needs to be added/changed in 
{{process::finalize}} such that {{process::initialize}} will start on a clean 
slate.  Additional issues will be created once done.  Also see [the parent 
issue|MESOS-3820].

{{process::finalize}} should cover the following components:
* {{__s__}} (the server socket)
** {{delete}} should be sufficient.  This closes the socket and thereby 
prevents any further interaction from it.
* {{process_manager}}
** Related prior work: [MESOS-3158]
** Cleans up the garbage collector, help, logging, profiler, statistics, route 
processes (including [this 
one|https://github.com/apache/mesos/blob/3bda55da1d0b580a1b7de43babfdc0d30fbc87ea/3rdparty/libprocess/src/process.cpp#L963],
 which currently leaks a pointer).
** Cleans up any other {{spawn}} 'd process.
** Manages the {{EventLoop}}.
* {{Clock}}
** The goal here is to clear any timers so that nothing can deference 
{{process_manager}} while we're finalizing/finalized.  It's probably not 
important to execute any remaining timers, since we're "shutting down" 
libprocess.  This means:
*** The clock should be {{paused}} and {{settled}} before the clean up of 
{{process_manager}}.
*** Processes, which might interact with the {{Clock}}, should be cleaned up 
next.
*** A new {{Clock::finalize}} method would then clear timers, process-specific 
clocks, and {{tick}} s; and then {{resume}} the clock.
* {{__address__}} (the advertised IP and port)
** Needs to be cleared after {{process_manager}} has been cleaned up.  
Processes use this to communicate events.  If cleared prematurely, 
{{TerminateEvents}} will not be sent correctly, leading to infinite waits.
* {{socket_manager}}
** The idea here is to close all sockets and deallocate any existing 
{{HttpProxy}} or {{Encoder}} objects.
** All sockets are created via {{__s__}}, so cleaning up the server socket 
prior will prevent any new activity.
* {{mime}}
** This is effectively a static map.
** It should be possible to statically initialize it.
* Synchronization atomics {{initialized}} & {{initializing}}.
** Once cleanup is done, these should be reset.

(Note: the list above is still incomplete/under-investigation.)


> Investigate the requirements of programmatically re-initializing libprocess
> ---
>
> Key: MESOS-3863
> URL: https://issues.apache.org/jira/browse/MESOS-3863
> Project: Mesos
>   

[jira] [Created] (MESOS-3848) Refactor Environment::mkdtemp into TemporaryDirectoryTest.

2015-11-06 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3848:


 Summary: Refactor Environment::mkdtemp into TemporaryDirectoryTest.
 Key: MESOS-3848
 URL: https://issues.apache.org/jira/browse/MESOS-3848
 Project: Mesos
  Issue Type: Task
  Components: test
Reporter: Joseph Wu
Assignee: Joseph Wu
Priority: Minor


As part of [MESOS-3762], many tests were changed from one 
{{TemporaryDirectoryTest}} to another {{TemporaryDirectoryTest}}.  One subtle 
difference is that the name of the temporary directory no longer contains the 
name of the test.  In [MESOS-3847], the duplicate {{TemporaryDirectoryTest}} 
was removed.

The original {{TemporaryDirectoryTest}} called 
[{{environment->mkdtemp}}|https://github.com/apache/mesos/blob/master/src/tests/environment.cpp#L494].
  We would like the naming, which is valuable for debugging, to be available 
for a majority of tests.  (A majority of tests inherit from 
{{TemporaryDirectoryTest}} in some way.)

Note:
* Any additional directories created via {{environment->mkdtemp}} are cleaned 
up after the test.
* We don't want mesos-specific logic in Stout, like the {{umount}} shell 
command in {{Environment::TearDown}}.

*Proposed change:*
Move the temporary directory logic from {{Environment::mkdtemp}} to 
{{TemporaryDirectoryTest}}.

*Tests that need to change*
| {{log_tests.cpp}} | {{LogZooKeeperTest}} | We can change {{ZooKeeperTest}} to 
inherit from {{TemporaryDirectoryTest}} to get rid of code duplication |
| {{tests/mesos.cpp}} | {{MesosTest::CreateSlaveFlags}} | {{MesosTest}} already 
inherits from {{TemporaryDirectoryTest}}. |
| {{tests/script.hpp}} | {{TEST_SCRIPT}} | This is used for the 
{{ExampleTests}}.  We can define a test class that inherits appropriately. |
| {{docker_tests.cpp}} | {{*}} | Already inherits from {{MesosTest}}. |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3820) Refactor libprocess initialization to allow for test-only reinitialization of the server socket

2015-11-06 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994532#comment-14994532
 ] 

Joseph Wu commented on MESOS-3820:
--

Two patches from some of my initial investigation into the 
{{process::finalize}} approach:
* Remove some stale comments: https://reviews.apache.org/r/39948/
* Update to {{process::initialize}} synchronization code: 
https://reviews.apache.org/r/39949/

> Refactor libprocess initialization to allow for test-only reinitialization of 
> the server socket
> ---
>
> Key: MESOS-3820
> URL: https://issues.apache.org/jira/browse/MESOS-3820
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> *Background*
> Libprocess initialization includes the spawning of a variety of global 
> processes and the creation of the server socket which listens for incoming 
> requests.  Some properties of the server socket are configured via 
> environment variables, such as the IP and port or the SSL configuration.
> In the case of tests, libprocess is initialized once per test binary.  This 
> means that testing different configurations (SSL in particular) is cumbersome 
> as a separate process would be needed for every test case.
> *Proposal* (Still under investigation)
> # Augment {{process::finalize}} to completely clean up libprocess.
> #* {{process_manager}}
> #* {{socket_manager}}
> #* {{EventLoop}}
> #* {{Clock}}
> #* {{__s__}}
> #* {{__address__}}
> #* Garbage collector, help, logging, profiler, statistics, route processes 
> (should fall under {{process_manager}}).
> #* {{mime}}
> # Add a test-only {{process::reinitialize}} function, which should be roughly 
> equivalent to a first-time run of {{process::initialize}}.
> -*Proposal to swap out server socket*- (Does not work)
> # Follow the [example of the SSL 
> library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
>  and allow tests to declare an internal function for re-initializing a 
> portion of libprocess.
> # Move the [existing creation of the server 
> socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
>  into a {{reinitialize_server_socket}} function.
> # Add any necessary cleanup for swapping server sockets.
> # Consider whether any additional locking is required in the 
> {{reinitialize_server_socket}} function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3847) Root tests for LinuxFilesystemIsolatorTest are broken

2015-11-06 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3847:


 Summary: Root tests for LinuxFilesystemIsolatorTest are broken
 Key: MESOS-3847
 URL: https://issues.apache.org/jira/browse/MESOS-3847
 Project: Mesos
  Issue Type: Bug
Reporter: Joseph Wu
Assignee: Joseph Wu
Priority: Minor


The refactor in [MESOS-3762] ended up exposing some differences in the 
{{TemporaryDirectoryTest}} classes (one in Stout, one in Mesos-proper).

The tests that broke (during tear down):
{code}
LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithRootFilesystem
LinuxFilesystemIsolatorTest.ROOT_PersistentVolumeWithoutRootFilesystem
LinuxFilesystemIsolatorTest.ROOT_MultipleContainers
{code}

As per an offline discussion between [~jvanremoortere] and [~jieyu], the 
solution is to merge the two {{TemporaryDirectoryTest}} classes and to fix the 
tear down of {{LinuxFilesystemIsolatorTest}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3041) Decline call does not include an optional "reason", in the Event/Call API

2015-11-05 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3041:
-
Target Version/s:   (was: 0.26.0)

> Decline call does not include an optional "reason", in the Event/Call API
> -
>
> Key: MESOS-3041
> URL: https://issues.apache.org/jira/browse/MESOS-3041
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Joseph Wu
>Assignee: Guangya Liu
>  Labels: mesosphere
>
> In the Event/Call API, the Decline call is currently used by frameworks to 
> reject resource offers.
> In the case of InverseOffers, the framework could give additional information 
> to the operators and/or allocator, as to why the InverseOffer is declined. 
> i.e. Suppose a cluster running some consensus algorithm is given an 
> InverseOffer on one of its nodes.  It may decline saying "Too few nodes" (or, 
> more verbosely, "Specified InverseOffer would lower the number of active 
> nodes below quorum").
> This change requires the following changes:
> * include/mesos/scheduler/scheduler.proto:
> {code}
> message Call {
>   ...
>   message Decline {
> repeated OfferID offer_ids = 1;
> optional Filters filters = 2;
> // Add this extra string for each OfferID
> // i.e. reasons[i] is for offer_ids[i]
> repeated string reasons = 3;
>   }
>   ...
> }
> {code}
> * src/master/master.cpp
> Change Master::decline to either store the reason, or log it.
> * Add a declineOffer overload in the (Mesos)SchedulerDriver with an optional 
> "reason".
> ** Extend the interface in include/mesos/scheduler.hpp
> ** Add/change the declineOffer method in src/sched/sched.cpp



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3820) Refactor libprocess initialization to allow for test-only reinitialization of the server socket

2015-11-03 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988698#comment-14988698
 ] 

Joseph Wu commented on MESOS-3820:
--

I'll investigate that approach.  

It seems like swapping out the server socket does not do the trick ([attempted 
here|https://github.com/kaysoky/mesos/commits/process_reinit]).

We still want to make any re-initialization test-only though.  So most likely, 
I'll need to investigate how to fully {{finalize}} libprocess.  Right now, we 
only clean up the {{process_manager}}.

> Refactor libprocess initialization to allow for test-only reinitialization of 
> the server socket
> ---
>
> Key: MESOS-3820
> URL: https://issues.apache.org/jira/browse/MESOS-3820
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> *Background*
> Libprocess initialization includes the spawning of a variety of global 
> processes and the creation of the server socket which listens for incoming 
> requests.  Some properties of the server socket are configured via 
> environment variables, such as the IP and port or the SSL configuration.
> In the case of tests, libprocess is initialized once per test binary.  This 
> means that testing different configurations (SSL in particular) is cumbersome 
> as a separate process would be needed for every test case.
> *Proposal*
> # Follow the [example of the SSL 
> library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
>  and allow tests to declare an internal function for re-initializing a 
> portion of libprocess.
> # Move the [existing creation of the server 
> socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
>  into a {{reinitialize_server_socket}} function.
> # Add any necessary cleanup for swapping server sockets.
> # Consider whether any additional locking is required in the 
> {{reinitialize_server_socket}} function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3820) Refactor libprocess initialization to allow for test-only reinitialization of the server socket

2015-11-03 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3820:
-
Story Points: 5  (was: 3)

> Refactor libprocess initialization to allow for test-only reinitialization of 
> the server socket
> ---
>
> Key: MESOS-3820
> URL: https://issues.apache.org/jira/browse/MESOS-3820
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> *Background*
> Libprocess initialization includes the spawning of a variety of global 
> processes and the creation of the server socket which listens for incoming 
> requests.  Some properties of the server socket are configured via 
> environment variables, such as the IP and port or the SSL configuration.
> In the case of tests, libprocess is initialized once per test binary.  This 
> means that testing different configurations (SSL in particular) is cumbersome 
> as a separate process would be needed for every test case.
> *Proposal*
> # Follow the [example of the SSL 
> library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
>  and allow tests to declare an internal function for re-initializing a 
> portion of libprocess.
> # Move the [existing creation of the server 
> socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
>  into a {{reinitialize_server_socket}} function.
> # Add any necessary cleanup for swapping server sockets.
> # Consider whether any additional locking is required in the 
> {{reinitialize_server_socket}} function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3820) Refactor libprocess initialization to allow for test-only reinitialization of the server socket

2015-11-03 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3820:
-
Description: 
*Background*
Libprocess initialization includes the spawning of a variety of global 
processes and the creation of the server socket which listens for incoming 
requests.  Some properties of the server socket are configured via environment 
variables, such as the IP and port or the SSL configuration.

In the case of tests, libprocess is initialized once per test binary.  This 
means that testing different configurations (SSL in particular) is cumbersome 
as a separate process would be needed for every test case.

*Proposal* (Still under investigation)
# Augment {{process::finalize}} to completely clean up libprocess.
#* {{process_manager}}
#* {{socket_manager}}
#* {{EventLoop}}
#* {{Clock}}
#* {{__s__}}
#* {{__address__}}
#* Garbage collector, help, logging, profiler, statistics, route processes 
(should fall under {{process_manager}}).
#* {{mime}}
# Add a test-only {{process::reinitialize}} function, which should be roughly 
equivalent to a first-time run of {{process::initialize}}.

-*Proposal to swap out server socket*- (Does not work)
# Follow the [example of the SSL 
library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
 and allow tests to declare an internal function for re-initializing a portion 
of libprocess.
# Move the [existing creation of the server 
socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
 into a {{reinitialize_server_socket}} function.
# Add any necessary cleanup for swapping server sockets.
# Consider whether any additional locking is required in the 
{{reinitialize_server_socket}} function.

  was:
*Background*
Libprocess initialization includes the spawning of a variety of global 
processes and the creation of the server socket which listens for incoming 
requests.  Some properties of the server socket are configured via environment 
variables, such as the IP and port or the SSL configuration.

In the case of tests, libprocess is initialized once per test binary.  This 
means that testing different configurations (SSL in particular) is cumbersome 
as a separate process would be needed for every test case.

*Proposal*
# Follow the [example of the SSL 
library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
 and allow tests to declare an internal function for re-initializing a portion 
of libprocess.
# Move the [existing creation of the server 
socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
 into a {{reinitialize_server_socket}} function.
# Add any necessary cleanup for swapping server sockets.
# Consider whether any additional locking is required in the 
{{reinitialize_server_socket}} function.


> Refactor libprocess initialization to allow for test-only reinitialization of 
> the server socket
> ---
>
> Key: MESOS-3820
> URL: https://issues.apache.org/jira/browse/MESOS-3820
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> *Background*
> Libprocess initialization includes the spawning of a variety of global 
> processes and the creation of the server socket which listens for incoming 
> requests.  Some properties of the server socket are configured via 
> environment variables, such as the IP and port or the SSL configuration.
> In the case of tests, libprocess is initialized once per test binary.  This 
> means that testing different configurations (SSL in particular) is cumbersome 
> as a separate process would be needed for every test case.
> *Proposal* (Still under investigation)
> # Augment {{process::finalize}} to completely clean up libprocess.
> #* {{process_manager}}
> #* {{socket_manager}}
> #* {{EventLoop}}
> #* {{Clock}}
> #* {{__s__}}
> #* {{__address__}}
> #* Garbage collector, help, logging, profiler, statistics, route processes 
> (should fall under {{process_manager}}).
> #* {{mime}}
> # Add a test-only {{process::reinitialize}} function, which should be roughly 
> equivalent to a first-time run of {{process::initialize}}.
> -*Proposal to swap out server socket*- (Does not work)
> # Follow the [example of the SSL 
> library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
>  and allow tests to declare an internal function for re-initializing a 
> portion of libprocess.
> # Move the [existing creation of the server 
> socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
>  into a {{reinitialize_server_socket}} function.
> # Add any 

[jira] [Updated] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-11-02 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3771:
-
Shepherd: Benjamin Mahler

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1, 0.26.0
>Reporter: Steven Schlansker
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3794) Master should not store arbitrarily sized data in ExecutorInfo

2015-11-02 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3794:
-
Shepherd: Benjamin Mahler

> Master should not store arbitrarily sized data in ExecutorInfo
> --
>
> Key: MESOS-3794
> URL: https://issues.apache.org/jira/browse/MESOS-3794
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> From a comment in [MESOS-3771]:
> Master should not be storing the {{data}} fields from {{ExecutorInfo}}.  We 
> currently [store the entire 
> object|https://github.com/apache/mesos/blob/master/src/master/master.hpp#L262-L271],
>  which means master would be at high risk of OOM-ing if a bunch of executors 
> were started with big {{data}} blobs.
> * Master should scrub out unneeded bloat from {{ExecutorInfo}} before storing 
> it.
> * We can use an alternate internal object, like we do for {{TaskInfo}} vs 
> {{Task}}; see 
> [this|https://github.com/apache/mesos/blob/master/src/messages/messages.proto#L39-L41].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3820) Refactor libprocess initialization to allow for test-only reinitialization of the server socket

2015-11-02 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3820:


 Summary: Refactor libprocess initialization to allow for test-only 
reinitialization of the server socket
 Key: MESOS-3820
 URL: https://issues.apache.org/jira/browse/MESOS-3820
 Project: Mesos
  Issue Type: Task
  Components: libprocess, test
Reporter: Joseph Wu
Assignee: Joseph Wu


*Background*
Libprocess initialization includes the spawning of a variety of global 
processes and the creation of the server socket which listens for incoming 
requests.  Some properties of the server socket are configured via environment 
variables, such as the IP and port or the SSL configuration.

In the case of tests, libprocess is initialized once per test binary.  This 
means that testing different configurations (SSL in particular) is cumbersome 
as a separate process would be needed for every test case.

*Proposal*
# Follow the [example of the SSL 
library|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/openssl.cpp#L280]
 and allow tests to declare an internal function for re-initializing a portion 
of libprocess.
# Move the [existing creation of the server 
socket|https://github.com/apache/mesos/blob/master/3rdparty/libprocess/src/process.cpp#L852-L856]
 into a {{reinitialize_server_socket}} function.
# Add any necessary cleanup for swapping server sockets.
# Consider whether any additional locking is required in the 
{{reinitialize_server_socket}} function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2015-11-01 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3753:
-
Description: 
Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
(You can manually test this by spinning up an SSL-enabled master and attempt to 
run the event-call framework example against it.)

We need to add tests that check the HTTP Scheduler library against SSL-enabled 
Mesos:
* with downgrade support,
* with required framework/client-side certifications,
* with/without verification of certificates (master-side),
* with/without verification of certificates (framework-side),
* with a custom certificate authority (CA)

These options should be controlled by the same environment variables found on 
the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].

Note: This issue will be broken down into smaller sub-issues as bugs/problems 
are discovered.

  was:
Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  

We need to add tests that check the schedule library against SSL-enabled Mesos 
with SSL:
* with downgrade support,
* with/without verification of certificates (framework-side),
* with required framework/client-side certifications,
* with/without verification of certificates (master-side),
* with a custom certificate authority (CA),

These options should be controlled by the same environment variables found on 
the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].

Note: This issue will be broken down into smaller sub-issues as bugs/problems 
are discovered.


> Test the HTTP Scheduler library with SSL enabled
> 
>
> Key: MESOS-3753
> URL: https://issues.apache.org/jira/browse/MESOS-3753
> Project: Mesos
>  Issue Type: Story
>  Components: framework, HTTP API, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere, security
>
> Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
> (You can manually test this by spinning up an SSL-enabled master and attempt 
> to run the event-call framework example against it.)
> We need to add tests that check the HTTP Scheduler library against 
> SSL-enabled Mesos:
> * with downgrade support,
> * with required framework/client-side certifications,
> * with/without verification of certificates (master-side),
> * with/without verification of certificates (framework-side),
> * with a custom certificate authority (CA)
> These options should be controlled by the same environment variables found on 
> the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].
> Note: This issue will be broken down into smaller sub-issues as bugs/problems 
> are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3139) Incorporate CMake into standard documentation

2015-10-30 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983145#comment-14983145
 ] 

Joseph Wu commented on MESOS-3139:
--

Unfortunately, the CMake build system isn't complete yet.  (Follow the 
[epic|https://issues.apache.org/jira/browse/MESOS-898] for progress.)

I think we only build up to libmesos at the moment.

> Incorporate CMake into standard documentation
> -
>
> Key: MESOS-3139
> URL: https://issues.apache.org/jira/browse/MESOS-3139
> Project: Mesos
>  Issue Type: Task
>  Components: cmake
>Reporter: Alex Clemmer
>Assignee: Alex Clemmer
>  Labels: build, cmake, mesosphere
>
> Right now it's anyone's guess how to build with CMake. If we want people to 
> use it, we should put up documentation. The central challenge is that the 
> CMake instructions will be slightly different for different platforms.
> For example, on Linux, the gist of the build is basically the same as 
> autotools; you pull down the system dependencies (like APR, _etc_.), and then:
> ```
> ./bootstrap
> mkdir build-cmake && cd build-cmake
> cmake ..
> make
> ```
> But, on Windows, it will be somewhat more complicated. There is no bootstrap 
> step, for example, because Windows doesn't have bash natively. And even when 
> we put that in, you'll still have to build the glog stuff out-of-band because 
> CMake has no way of booting up Visual Studio and calling "build."
> So practically, we need to figure out:
> * What our build story is for different platforms
> * Write specific instructions for our "core" target platforms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3786) Backticks are not mentioned in Mesos C++ Style Guide

2015-10-23 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971374#comment-14971374
 ] 

Joseph Wu commented on MESOS-3786:
--

This was definitely intentional for the maintenance comments.

> Backticks are not mentioned in Mesos C++ Style Guide
> 
>
> Key: MESOS-3786
> URL: https://issues.apache.org/jira/browse/MESOS-3786
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Greg Mann
>Assignee: Greg Mann
>Priority: Minor
>  Labels: documentation, mesosphere
>
> As far as I can tell, current practice is to quote code excerpts and object 
> names with backticks when writing comments. For example:
> {code}
> // You know, `sadPanda` seems extra sad lately.
> std::string sadPanda;
> sadPanda = "   :'(   ";
> {code}
> However, I don't see this documented in our C++ style guide at all. It should 
> be added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-22 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-3771:


Assignee: Joseph Wu

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1, 0.26.0
>Reporter: Steven Schlansker
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-22 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3771:
-
Labels: mesosphere  (was: )

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1, 0.26.0
>Reporter: Steven Schlansker
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970102#comment-14970102
 ] 

Joseph Wu commented on MESOS-3771:
--

Created [MESOS-3794] to track the point #2 above.  That portion will likely be 
a more involved change.

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1, 0.26.0
>Reporter: Steven Schlansker
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970063#comment-14970063
 ] 

Joseph Wu commented on MESOS-3771:
--

Sync'd with BenH and [~bmahler] about this (offline).

The proposed solution is the following:
# None of the state endpoints should be dumping out the binary {{data}} fields 
in the first place.  This includes {{ExecutorInfo}} ([dumped by 
master|https://github.com/apache/mesos/blob/master/src/common/http.cpp#L317]) 
and {{TaskInfo}} ([dumped by 
agent|https://github.com/apache/mesos/blob/master/src/slave/http.cpp#L103]).  
#* These fields should be removed from the output entirely.  Existing 
frameworks should not be relying on this information.  [~stevenschlansker], can 
you confirm this with Spark?
#* We can easily back-port this patch, if absolutely necessary.
# Master should not be storing the {{data}} fields from {{ExecutorInfo}}.  We 
currently [store the entire 
object|https://github.com/apache/mesos/blob/master/src/master/master.hpp#L262-L271],
 which means master would be as high risk of OOM-ing if a bunch of executors 
were started with big {{data}} blobs.
#* Master should scrub out unneeded bloat from {{ExecutorInfo}} before storing 
it.
#* We can use an alternate internal object, like we do for {{TaskInfo}} vs 
{{Task}}; see 
[this|https://github.com/apache/mesos/blob/master/src/messages/messages.proto#L39-L41].

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1, 0.26.0
>Reporter: Steven Schlansker
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3794) Master should not store arbitrarily sized data in ExecutorInfo

2015-10-22 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3794:


 Summary: Master should not store arbitrarily sized data in 
ExecutorInfo
 Key: MESOS-3794
 URL: https://issues.apache.org/jira/browse/MESOS-3794
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Joseph Wu
Assignee: Joseph Wu
Priority: Critical


>From a comment in [MESOS-3771]:

Master should not be storing the {{data}} fields from {{ExecutorInfo}}.  We 
currently [store the entire 
object|https://github.com/apache/mesos/blob/master/src/master/master.hpp#L262-L271],
 which means master would be at high risk of OOM-ing if a bunch of executors 
were started with big {{data}} blobs.
* Master should scrub out unneeded bloat from {{ExecutorInfo}} before storing 
it.
* We can use an alternate internal object, like we do for {{TaskInfo}} vs 
{{Task}}; see 
[this|https://github.com/apache/mesos/blob/master/src/messages/messages.proto#L39-L41].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1478) Replace Master/Slave terminology

2015-10-21 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-1478:
-
Epic Name: Agent Rename

> Replace Master/Slave terminology
> 
>
> Key: MESOS-1478
> URL: https://issues.apache.org/jira/browse/MESOS-1478
> Project: Mesos
>  Issue Type: Epic
>Reporter: Clark Breyman
>Assignee: Benjamin Hindman
>Priority: Minor
>  Labels: mesosphere
>
> Inspired by the comments on this PR:
> https://github.com/django/django/pull/2692
> TL;DR - Computers sharing work should be a good thing. Using the language of 
> human bondage and suffering is inappropriate in this context. It also has the 
> potential to alienate users and community members. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1607) Introduce optimistic offers.

2015-10-21 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967410#comment-14967410
 ] 

Joseph Wu commented on MESOS-1607:
--

We plan to release the MVP before the end of this year, so tentatively sometime 
between v0.26.0 and v0.28.0.

> Introduce optimistic offers.
> 
>
> Key: MESOS-1607
> URL: https://issues.apache.org/jira/browse/MESOS-1607
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, framework, master
>Reporter: Benjamin Hindman
>Assignee: Artem Harutyunyan
>  Labels: mesosphere
> Attachments: optimisitic-offers.pdf
>
>
> *Background*
> The current implementation of resource offers only enable a single framework 
> scheduler to make scheduling decisions for some available resources at a 
> time. In some circumstances, this is good, i.e., when we don't want other 
> framework schedulers to have access to some resources. However, in other 
> circumstances, there are advantages to letting multiple framework schedulers 
> attempt to make scheduling decisions for the _same_ allocation of resources 
> in parallel.
> If you think about this from a "concurrency control" perspective, the current 
> implementation of resource offers is _pessimistic_, the resources contained 
> within an offer are _locked_ until the framework scheduler that they were 
> offered to launches tasks with them or declines them. In addition to making 
> pessimistic offers we'd like to give out _optimistic_ offers, where the same 
> resources are offered to multiple framework schedulers at the same time, and 
> framework schedulers "compete" for those resources on a 
> first-come-first-serve basis (i.e., the first to launch a task "wins"). We've 
> always reserved the right to rescind resource offers using the 'rescind' 
> primitive in the API, and a framework scheduler should be prepared to launch 
> a task and have those tasks go lost because another framework already started 
> to use those resources.
> *Feature*
> We plan to take a step towards optimistic offers, by introducing primitives 
> that allow resources to be offered to multiple frameworks at once.  At first, 
> we will use these primitives to optimistically allocate resources that are 
> reserved for a particular framework/role but have not been allocated by that 
> framework/role.  
> The work with optimistic offers will closely resemble the existing 
> oversubscription feature.  Optimistically offered resources are likely to be 
> considered "revocable resources" (the concept that using resources not 
> reserved for you means you might get those resources revoked).  In effect, we 
> can may create something like a "spot" market for unused resources, driving 
> up utilization by letting frameworks that are willing to use revocable 
> resources run tasks.
> *Future Work*
> This ticket tracks the introduction of some aspects of optimistic offers.  
> Taken to the limit, one could imagine always making optimistic resource 
> offers. This bears a striking resemblance with the Google Omega model (an 
> isomorphism even). However, being able to configure what resources should be 
> allocated optimistically and what resources should be allocated 
> pessimistically gives even more control to a datacenter/cluster operator that 
> might want to, for example, never let multiple frameworks (roles) compete for 
> some set of resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1478) Replace Master/Slave terminology

2015-10-21 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-1478:
-
Issue Type: Epic  (was: Wish)

> Replace Master/Slave terminology
> 
>
> Key: MESOS-1478
> URL: https://issues.apache.org/jira/browse/MESOS-1478
> Project: Mesos
>  Issue Type: Epic
>Reporter: Clark Breyman
>Assignee: Benjamin Hindman
>Priority: Minor
>  Labels: mesosphere
>
> Inspired by the comments on this PR:
> https://github.com/django/django/pull/2692
> TL;DR - Computers sharing work should be a good thing. Using the language of 
> human bondage and suffering is inappropriate in this context. It also has the 
> potential to alienate users and community members. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-21 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968275#comment-14968275
 ] 

Joseph Wu commented on MESOS-3771:
--

Looks like our JSON library will never catch this (it's more permissive), which 
is why none of our unit tests have caught this.

I agree that this is a problem though.  I'll see if I can get more eyes on this.

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1, 0.26.0
>Reporter: Steven Schlansker
>Priority: Critical
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.

2015-10-21 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965883#comment-14965883
 ] 

Joseph Wu edited comment on MESOS-3762 at 10/22/15 12:06 AM:
-

Reviews for:
Step 1)
https://reviews.apache.org/r/39498/
https://reviews.apache.org/r/39499/
Step 2 & 3)
https://reviews.apache.org/r/39501/
Step 4) 
https://reviews.apache.org/r/39533/
https://reviews.apache.org/r/39534/


was (Author: kaysoky):
Reviews for:
Step 1)
https://reviews.apache.org/r/39498/
https://reviews.apache.org/r/39499/
Step 2 & 3)
https://reviews.apache.org/r/39501/

> Refactor SSLTest fixture such that MesosTest can use the same helpers.
> --
>
> Key: MESOS-3762
> URL: https://issues.apache.org/jira/browse/MESOS-3762
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> In order to write tests that exercise SSL with other components of Mesos, 
> such as the HTTP scheduler library, we need to use the setup/teardown logic 
> found in the {{SSLTest}} fixture.
> Currently, the test fixtures have separate inheritance structures like this:
> {code}
> SSLTest <- ::testing::Test
> MesosTest <- TemporaryDirectoryTest <- ::testing::Test
> {code}
> where {{::testing::Test}} is a gtest class.
> The plan is the following:
> # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
> require moving the setup (generation of keys and certs) from 
> {{SetUpTestCase}} to {{SetUp}}.  At the same time, *some* of the cleanup 
> logic in the SSLTest will not be needed.
> # Move the logic of generating keys/certs into helpers, so that individual 
> tests can call them when needed, much like {{MesosTest}}.
> # Write a child class of {{SSLTest}} which has the same functionality as the 
> existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} 
> or the {{RegistryClientTest}}.
> # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
> the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
> {{SSLTest}} could be {{#ifdef}}'d into any empty class.
> The resulting structure should be like:
> {code}
> MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
> ChildOfSSLTest /
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3759) Document messages.proto

2015-10-20 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3759:
-
Labels: docathon documentation mesosphere  (was: documentation mesosphere)

> Document messages.proto
> ---
>
> Key: MESOS-3759
> URL: https://issues.apache.org/jira/browse/MESOS-3759
> Project: Mesos
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: docathon, documentation, mesosphere
>
> The messages we pass between Mesos components are largely undocumented.  See 
> this 
> [TODO|https://github.com/apache/mesos/blob/19f14d06bac269b635657960d8ea8b2928b7830c/src/messages/messages.proto#L23].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.

2015-10-20 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965341#comment-14965341
 ] 

Joseph Wu commented on MESOS-3581:
--

IMO, more importantly, we should actually update the Doxygen docs.  They were 
last updated 13 months ago.  (See linked issues)

Also, we can easily get rid of the license headers by actually documenting the 
classes.
For example, the [Watcher 
class|http://mesos.apache.org/api/latest/c++/classWatcher.html] has proper 
documentation *and* a license.

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-20 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965808#comment-14965808
 ] 

Joseph Wu commented on MESOS-3771:
--

^ That's actually what would (sort of) fix your issue.  There's an old TODO 
[here|https://github.com/apache/mesos/blob/master/src/master/http.cpp#L118-L119]
 to make the change.

We do actually encode {{bytes}} in base64, but only when they are transformed 
into JSON from Protobuf.  However, some of the endpoints (the ones which must 
be backwards compatible) appear to treat {{bytes}} as ASCII strings.  

If you have more control over your version of Spark, you could base64 encode 
from Spark:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala#L47

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1
>Reporter: Steven Schlansker
>Priority: Critical
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.

2015-10-20 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3762:
-
Description: 
In order to write tests that exercise SSL with other components of Mesos, such 
as the HTTP scheduler library, we need to use the setup/teardown logic found in 
the {{SSLTest}} fixture.

Currently, the test fixtures have separate inheritance structures like this:
{code}
SSLTest <- ::testing::Test
MesosTest <- TemporaryDirectoryTest <- ::testing::Test
{code}
where {{::testing::Test}} is a gtest class.

The plan is the following:
# Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
require moving the setup (generation of keys and certs) from {{SetUpTestCase}} 
to {{SetUp}}.  At the same time, *some* of the cleanup logic in the SSLTest 
will not be needed.
# Move the logic of generating keys/certs into helpers, so that individual 
tests can call them when needed, much like {{MesosTest}}.
# Write a child class of {{SSLTest}} which has the same functionality as the 
existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} or 
the {{RegistryClientTest}}.
# Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
{{SSLTest}} could be {{#ifdef}}'d into any empty class.

The resulting structure should be like:
{code}
MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
ChildOfSSLTest /
{code}

  was:
In order to write tests that exercise SSL with other components of Mesos, such 
as the HTTP scheduler library, we need to use the setup/teardown logic found in 
the {{SSLTest}} fixture.

Currently, the test fixtures have separate inheritance structures like this:
{code}
SSLTest <- ::testing::Test
MesosTest <- TemporaryDirectoryTest <- ::testing::Test
{code}
where {{::testing::Test}} is a gtest class.

The plan is the following:
1) Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
require moving the setup (generation of keys and certs) from {{SetUpTestCase}} 
to {{SetUp}}.  At the same time, *some* of the cleanup logic in the SSLTest 
will not be needed.
2) Move the logic of generating keys/certs into helpers, so that individual 
tests can call them when needed, much like {{MesosTest}}.
3) Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
{{SSLTest}} could be {{#ifdef}}'d into any empty class.
4) Write a child class of {{SSLTest}} which has the same functionality as the 
existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} or 
the {{RegistryClientTest}}.

The resulting structure should be like:
{code}
MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
ChildOfSSLTest /
{code}


> Refactor SSLTest fixture such that MesosTest can use the same helpers.
> --
>
> Key: MESOS-3762
> URL: https://issues.apache.org/jira/browse/MESOS-3762
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> In order to write tests that exercise SSL with other components of Mesos, 
> such as the HTTP scheduler library, we need to use the setup/teardown logic 
> found in the {{SSLTest}} fixture.
> Currently, the test fixtures have separate inheritance structures like this:
> {code}
> SSLTest <- ::testing::Test
> MesosTest <- TemporaryDirectoryTest <- ::testing::Test
> {code}
> where {{::testing::Test}} is a gtest class.
> The plan is the following:
> # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
> require moving the setup (generation of keys and certs) from 
> {{SetUpTestCase}} to {{SetUp}}.  At the same time, *some* of the cleanup 
> logic in the SSLTest will not be needed.
> # Move the logic of generating keys/certs into helpers, so that individual 
> tests can call them when needed, much like {{MesosTest}}.
> # Write a child class of {{SSLTest}} which has the same functionality as the 
> existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} 
> or the {{RegistryClientTest}}.
> # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
> the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
> {{SSLTest}} could be {{#ifdef}}'d into any empty class.
> The resulting structure should be like:
> {code}
> MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
> ChildOfSSLTest /
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.

2015-10-20 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965877#comment-14965877
 ] 

Joseph Wu commented on MESOS-3762:
--

Found and wrote a fix for an SSL-related test cleanup bug: 
https://reviews.apache.org/r/39495/

> Refactor SSLTest fixture such that MesosTest can use the same helpers.
> --
>
> Key: MESOS-3762
> URL: https://issues.apache.org/jira/browse/MESOS-3762
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> In order to write tests that exercise SSL with other components of Mesos, 
> such as the HTTP scheduler library, we need to use the setup/teardown logic 
> found in the {{SSLTest}} fixture.
> Currently, the test fixtures have separate inheritance structures like this:
> {code}
> SSLTest <- ::testing::Test
> MesosTest <- TemporaryDirectoryTest <- ::testing::Test
> {code}
> where {{::testing::Test}} is a gtest class.
> The plan is the following:
> # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
> require moving the setup (generation of keys and certs) from 
> {{SetUpTestCase}} to {{SetUp}}.  At the same time, *some* of the cleanup 
> logic in the SSLTest will not be needed.
> # Move the logic of generating keys/certs into helpers, so that individual 
> tests can call them when needed, much like {{MesosTest}}.
> # Write a child class of {{SSLTest}} which has the same functionality as the 
> existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} 
> or the {{RegistryClientTest}}.
> # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
> the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
> {{SSLTest}} could be {{#ifdef}}'d into any empty class.
> The resulting structure should be like:
> {code}
> MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
> ChildOfSSLTest /
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.

2015-10-20 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965883#comment-14965883
 ] 

Joseph Wu edited comment on MESOS-3762 at 10/20/15 11:30 PM:
-

Reviews for:
Step 1)
https://reviews.apache.org/r/39498/
https://reviews.apache.org/r/39499/
Step 2 & 3)
https://reviews.apache.org/r/39501/


was (Author: kaysoky):
Reviews for step 1)
https://reviews.apache.org/r/39498/
https://reviews.apache.org/r/39499/

> Refactor SSLTest fixture such that MesosTest can use the same helpers.
> --
>
> Key: MESOS-3762
> URL: https://issues.apache.org/jira/browse/MESOS-3762
> Project: Mesos
>  Issue Type: Task
>  Components: test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> In order to write tests that exercise SSL with other components of Mesos, 
> such as the HTTP scheduler library, we need to use the setup/teardown logic 
> found in the {{SSLTest}} fixture.
> Currently, the test fixtures have separate inheritance structures like this:
> {code}
> SSLTest <- ::testing::Test
> MesosTest <- TemporaryDirectoryTest <- ::testing::Test
> {code}
> where {{::testing::Test}} is a gtest class.
> The plan is the following:
> # Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
> require moving the setup (generation of keys and certs) from 
> {{SetUpTestCase}} to {{SetUp}}.  At the same time, *some* of the cleanup 
> logic in the SSLTest will not be needed.
> # Move the logic of generating keys/certs into helpers, so that individual 
> tests can call them when needed, much like {{MesosTest}}.
> # Write a child class of {{SSLTest}} which has the same functionality as the 
> existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} 
> or the {{RegistryClientTest}}.
> # Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
> the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
> {{SSLTest}} could be {{#ifdef}}'d into any empty class.
> The resulting structure should be like:
> {code}
> MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
> ChildOfSSLTest /
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-20 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14966036#comment-14966036
 ] 

Joseph Wu commented on MESOS-3771:
--

Actually, I can't repro this behavior in a unit test.  ([Attempted 
here|https://github.com/kaysoky/mesos/commit/d8869f0aa1fdcf38072b45a6238b191c67b7e0f7]).

I've constructed an {{ExecutorInfo}} with a {{data}} field holding the same 
data you have above.  Fetching the same {{ExecutorInfo}} from the {{/state}} 
endpoint also gives valid JSON.

Am I doing something differently?

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1
>Reporter: Steven Schlansker
>Priority: Critical
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3759) Document messages.proto

2015-10-19 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3759:


 Summary: Document messages.proto
 Key: MESOS-3759
 URL: https://issues.apache.org/jira/browse/MESOS-3759
 Project: Mesos
  Issue Type: Improvement
  Components: documentation
Reporter: Joseph Wu
Assignee: Joseph Wu


The messages we pass between Mesos components are largely undocumented.  See 
this 
[TODO|https://github.com/apache/mesos/blob/master/src/messages/messages.proto#L23].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3759) Document messages.proto

2015-10-19 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3759:
-
Shepherd: Joris Van Remoortere

> Document messages.proto
> ---
>
> Key: MESOS-3759
> URL: https://issues.apache.org/jira/browse/MESOS-3759
> Project: Mesos
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: documentation, mesosphere
>
> The messages we pass between Mesos components are largely undocumented.  See 
> this 
> [TODO|https://github.com/apache/mesos/blob/master/src/messages/messages.proto#L23].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3762) Refactor SSLTest fixture such that MesosTest can use the same helpers.

2015-10-19 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3762:


 Summary: Refactor SSLTest fixture such that MesosTest can use the 
same helpers.
 Key: MESOS-3762
 URL: https://issues.apache.org/jira/browse/MESOS-3762
 Project: Mesos
  Issue Type: Task
  Components: test
Reporter: Joseph Wu
Assignee: Joseph Wu


In order to write tests that exercise SSL with other components of Mesos, such 
as the HTTP scheduler library, we need to use the setup/teardown logic found in 
the {{SSLTest}} fixture.

Currently, the test fixtures have separate inheritance structures like this:
{code}
SSLTest <- ::testing::Test
MesosTest <- TemporaryDirectoryTest <- ::testing::Test
{code}
where {{::testing::Test}} is a gtest class.

The plan is the following:
1) Change {{SSLTest}} to inherit from {{TemporaryDirectoryTest}}.  This will 
require moving the setup (generation of keys and certs) from {{SetUpTestCase}} 
to {{SetUp}}.  At the same time, *some* of the cleanup logic in the SSLTest 
will not be needed.
2) Move the logic of generating keys/certs into helpers, so that individual 
tests can call them when needed, much like {{MesosTest}}.
3) Have {{MesosTest}} inherit from {{SSLTest}} (which might be renamed during 
the refactor).  If Mesos is not compiled with {{--enable-ssl}}, then 
{{SSLTest}} could be {{#ifdef}}'d into any empty class.
4) Write a child class of {{SSLTest}} which has the same functionality as the 
existing {{SSLTest}}, for use by the existing tests that rely on {{SSLTest}} or 
the {{RegistryClientTest}}.

The resulting structure should be like:
{code}
MesosTest <- SSLTest <- TemporaryDirectoryTest <- ::testing::Test
ChildOfSSLTest /
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2015-10-19 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3753:
-
Sprint:   (was: Mesosphere Sprint 21)

> Test the HTTP Scheduler library with SSL enabled
> 
>
> Key: MESOS-3753
> URL: https://issues.apache.org/jira/browse/MESOS-3753
> Project: Mesos
>  Issue Type: Story
>  Components: framework, HTTP API, test
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere, security
>
> Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  
> We need to add tests that check the schedule library against SSL-enabled 
> Mesos with SSL:
> * with downgrade support,
> * with/without verification of certificates (framework-side),
> * with required framework/client-side certifications,
> * with/without verification of certificates (master-side),
> * with a custom certificate authority (CA),
> These options should be controlled by the same environment variables found on 
> the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].
> Note: This issue will be broken down into smaller sub-issues as bugs/problems 
> are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3751) MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with --executor_environmnent_variables

2015-10-16 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3751:
-
Summary: MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with 
--executor_environmnent_variables  (was: MESOS_NATIVE_JAVA_LIBRARY not set on 
MesosContainerizre tasks with --executor_environmnent_variables)

> MESOS_NATIVE_JAVA_LIBRARY not set on MesosContainerize tasks with 
> --executor_environmnent_variables
> ---
>
> Key: MESOS-3751
> URL: https://issues.apache.org/jira/browse/MESOS-3751
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization
>Affects Versions: 0.24.1, 0.25.0
>Reporter: Cody Maloney
>Assignee: Gilbert Song
>  Labels: mesosphere, newbie
>
> When using --executor_environment_variables, and having 
> MESOS_NATIVE_JAVA_LIBRARY in the environment of mesos-slave, the mesos 
> containerizer does not set MESOS_NATIVE_JAVA_LIBRARY itself.
> Relevant code: 
> https://github.com/apache/mesos/blob/14f7967ef307f3d98e3a4b93d92d6b3a56399b20/src/slave/containerizer/containerizer.cpp#L281
> It sees that the variable is in the mesos-slave's environment (os::getenv), 
> rather than checking if it is set in the environment variable set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3753) Test the HTTP Scheduler library with SSL enabled

2015-10-16 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3753:


 Summary: Test the HTTP Scheduler library with SSL enabled
 Key: MESOS-3753
 URL: https://issues.apache.org/jira/browse/MESOS-3753
 Project: Mesos
  Issue Type: Story
  Components: framework, HTTP API, test
Reporter: Joseph Wu
Assignee: Joseph Wu


Currently, the HTTP Scheduler library does not support SSL-enabled Mesos.  

We need to add tests that check the schedule library against SSL-enabled Mesos 
with SSL:
* with downgrade support,
* with/without verification of certificates (framework-side),
* with required framework/client-side certifications,
* with/without verification of certificates (master-side),
* with a custom certificate authority (CA),

These options should be controlled by the same environment variables found on 
the [SSL user doc|http://mesos.apache.org/documentation/latest/ssl/].

Note: This issue will be broken down into smaller sub-issues as bugs/problems 
are discovered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3748) HTTP scheduler library does not gracefully parse invalid resource identifiers

2015-10-15 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3748:
-
Description: 
If you pass a nonsense string for "master" into a framework using the HTTP 
scheduler library, the framework segfaults.

For example, using the example frameworks:

{code}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

Using the HTTP API:
{code}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}

  was:
If you pass a nonsense string for "master" into a framework using the HTTP API, 
the framework segfaults.

For example, using the example frameworks:

{code}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

Using the HTTP API:
{code}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}


> HTTP scheduler library does not gracefully parse invalid resource identifiers
> -
>
> Key: MESOS-3748
> URL: https://issues.apache.org/jira/browse/MESOS-3748
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Affects Versions: 0.25.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> If you pass a nonsense string for "master" into a framework using the HTTP 
> scheduler library, the framework segfaults.
> For example, using the example frameworks:
> {code}
> build/src/test-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in:
> {code}
> Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to 
> parse 'asdf://127.0.0.1:5050'
> {code}
> Using the HTTP API:
> {code}
> export DEFAULT_PRINCIPAL=root
> build/src/event-call-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in
> {code}
> I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
> Segmentation fault: 11
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3748) HTTP scheduler library does not gracefully parse invalid resource identifiers

2015-10-15 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3748:
-
Summary: HTTP scheduler library does not gracefully parse invalid resource 
identifiers  (was: HTTP API does not gracefully parse invalid resource 
identifiers)

> HTTP scheduler library does not gracefully parse invalid resource identifiers
> -
>
> Key: MESOS-3748
> URL: https://issues.apache.org/jira/browse/MESOS-3748
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Affects Versions: 0.25.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> If you pass a nonsense string for "master" into a framework using the HTTP 
> API, the framework segfaults.
> For example, using the example frameworks:
> {code}
> build/src/test-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in:
> {code}
> Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to 
> parse 'asdf://127.0.0.1:5050'
> {code}
> Using the HTTP API:
> {code}
> export DEFAULT_PRINCIPAL=root
> build/src/event-call-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in
> {code}
> I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
> Segmentation fault: 11
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3748) HTTP API does not gracefully parse invalid resource identifiers

2015-10-15 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3748:
-
Description: 
If you pass a nonsense string for "master" into a framework using the HTTP API, 
the framework segfaults.

For example, using the example frameworks:

{code}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

Using the HTTP API:
{code}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}

  was:
If you pass a nonsense string for "master" into a framework using the HTTP API, 
the framework segfaults.

For example, using the example frameworks:

{code}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

{code}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}


> HTTP API does not gracefully parse invalid resource identifiers
> ---
>
> Key: MESOS-3748
> URL: https://issues.apache.org/jira/browse/MESOS-3748
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Affects Versions: 0.25.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> If you pass a nonsense string for "master" into a framework using the HTTP 
> API, the framework segfaults.
> For example, using the example frameworks:
> {code}
> build/src/test-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in:
> {code}
> Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to 
> parse 'asdf://127.0.0.1:5050'
> {code}
> Using the HTTP API:
> {code}
> export DEFAULT_PRINCIPAL=root
> build/src/event-call-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in
> {code}
> I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
> Segmentation fault: 11
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3748) HTTP API does not gracefully parse invalid resource identifiers

2015-10-15 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3748:


 Summary: HTTP API does not gracefully parse invalid resource 
identifiers
 Key: MESOS-3748
 URL: https://issues.apache.org/jira/browse/MESOS-3748
 Project: Mesos
  Issue Type: Bug
  Components: framework, HTTP API
Affects Versions: 0.25.0
Reporter: Joseph Wu
Assignee: Joseph Wu


If you pass a nonsense string for "master" into a framework using the HTTP API, 
the framework segfaults.

For example, using the example frameworks:

{code}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

{code}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3748) HTTP scheduler library does not gracefully parse invalid resource identifiers

2015-10-15 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3748:
-
Description: 
If you pass a nonsense string for "master" into a framework using the HTTP 
scheduler library, the framework segfaults.

For example, using the example frameworks:

{code:title=Scheduler Driver}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

{code:title=HTTP Scheduler Library}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}

{code:title=Stack Trace}
* thread #2: tid = 0x28b6bb, 0x000100ad03ca 
libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
 + 42 at scheduler.cpp:213, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000100ad03ca 
libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
 + 42 at scheduler.cpp:213
frame #1: 0x000100ad05f2 libmesos-0.26.0.dylib`virtual thunk to 
mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0) + 34 at 
scheduler.cpp:210
frame #2: 0x0001022b60f3 libmesos-0.26.0.dylib`::resume() + 931 at 
process.cpp:2449
frame #3: 0x0001022c131c libmesos-0.26.0.dylib`::operator()() + 268 at 
process.cpp:2174
frame #4: 0x0001022c0fa2 
libmesos-0.26.0.dylib`::__thread_proxy > > >() [inlined] 
__invoke<(lambda at ../../../3rdparty/libprocess/src/process.cpp:2158:35) &, 
const std::__1::atomic &> + 27 at __functional_base:415
frame #5: 0x0001022c0f87 
libmesos-0.26.0.dylib`::__thread_proxy > > >() [inlined] 
__apply_functor<(lambda at 
../../../3rdparty/libprocess/src/process.cpp:2158:35), 
std::__1::tuple >, 
0, std::__1::tuple<> > + 55 at functional:2060
frame #6: 0x0001022c0f50 
libmesos-0.26.0.dylib`::__thread_proxy > > >() [inlined] 
operator()<> + 41 at functional:2123
frame #7: 0x0001022c0f27 
libmesos-0.26.0.dylib`::__thread_proxy > > >() [inlined] 
__invoke >> + 14 at 
__functional_base:415
frame #8: 0x0001022c0f19 
libmesos-0.26.0.dylib`::__thread_proxy > > >() [inlined] 
__thread_execute >> + 25 at thread:337
frame #9: 0x0001022c0f00 
libmesos-0.26.0.dylib`::__thread_proxy > > >() + 368 at 
thread:347
frame #10: 0x7fff964c705a libsystem_pthread.dylib`_pthread_body + 131
frame #11: 0x7fff964c6fd7 libsystem_pthread.dylib`_pthread_start + 176
frame #12: 0x7fff964c43ed libsystem_pthread.dylib`thread_start + 13
{code}

  was:
If you pass a nonsense string for "master" into a framework using the HTTP 
scheduler library, the framework segfaults.

For example, using the example frameworks:

{code}
build/src/test-framework --master="asdf://127.0.0.1:5050"
{code}
Results in:
{code}
Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to parse 
'asdf://127.0.0.1:5050'
{code}

Using the HTTP API:
{code}
export DEFAULT_PRINCIPAL=root
build/src/event-call-framework --master="asdf://127.0.0.1:5050"
{code}
Results in
{code}
I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
Segmentation fault: 11
{code}


> HTTP scheduler library does not gracefully parse invalid resource identifiers
> -
>
> Key: MESOS-3748
> URL: https://issues.apache.org/jira/browse/MESOS-3748
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Affects Versions: 0.25.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> If you 

[jira] [Commented] (MESOS-3748) HTTP scheduler library does not gracefully parse invalid resource identifiers

2015-10-15 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959889#comment-14959889
 ] 

Joseph Wu commented on MESOS-3748:
--

Turns out to be a rather trivial issue: https://reviews.apache.org/r/39365/

> HTTP scheduler library does not gracefully parse invalid resource identifiers
> -
>
> Key: MESOS-3748
> URL: https://issues.apache.org/jira/browse/MESOS-3748
> Project: Mesos
>  Issue Type: Bug
>  Components: framework, HTTP API
>Affects Versions: 0.25.0
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> If you pass a nonsense string for "master" into a framework using the HTTP 
> scheduler library, the framework segfaults.
> For example, using the example frameworks:
> {code:title=Scheduler Driver}
> build/src/test-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in:
> {code}
> Failed to create a master detector for 'asdf://127.0.0.1:5050': Failed to 
> parse 'asdf://127.0.0.1:5050'
> {code}
> {code:title=HTTP Scheduler Library}
> export DEFAULT_PRINCIPAL=root
> build/src/event-call-framework --master="asdf://127.0.0.1:5050"
> {code}
> Results in
> {code}
> I1015 16:18:45.432075 2062201600 scheduler.cpp:157] Version: 0.26.0
> Segmentation fault: 11
> {code}
> {code:title=Stack Trace}
> * thread #2: tid = 0x28b6bb, 0x000100ad03ca 
> libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
>  + 42 at scheduler.cpp:213, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
>   * frame #0: 0x000100ad03ca 
> libmesos-0.26.0.dylib`mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0)
>  + 42 at scheduler.cpp:213
> frame #1: 0x000100ad05f2 libmesos-0.26.0.dylib`virtual thunk to 
> mesos::v1::scheduler::MesosProcess::initialize(this=0x0001076031a0) + 34 
> at scheduler.cpp:210
> frame #2: 0x0001022b60f3 libmesos-0.26.0.dylib`::resume() + 931 at 
> process.cpp:2449
> frame #3: 0x0001022c131c libmesos-0.26.0.dylib`::operator()() + 268 
> at process.cpp:2174
> frame #4: 0x0001022c0fa2 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> __invoke<(lambda at ../../../3rdparty/libprocess/src/process.cpp:2158:35) &, 
> const std::__1::atomic &> + 27 at __functional_base:415
> frame #5: 0x0001022c0f87 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> __apply_functor<(lambda at 
> ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::tuple >, 
> 0, std::__1::tuple<> > + 55 at functional:2060
> frame #6: 0x0001022c0f50 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> operator()<> + 41 at functional:2123
> frame #7: 0x0001022c0f27 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> __invoke ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > >> + 14 at 
> __functional_base:415
> frame #8: 0x0001022c0f19 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() [inlined] 
> __thread_execute ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > >> + 25 at 
> thread:337
> frame #9: 0x0001022c0f00 
> libmesos-0.26.0.dylib`::__thread_proxy  at ../../../3rdparty/libprocess/src/process.cpp:2158:35), 
> std::__1::reference_wrapper > > > >() + 368 at 
> thread:347
> frame #10: 0x7fff964c705a libsystem_pthread.dylib`_pthread_body + 131
> frame #11: 0x7fff964c6fd7 libsystem_pthread.dylib`_pthread_start + 176
> frame #12: 0x7fff964c43ed libsystem_pthread.dylib`thread_start + 13
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1607) Introduce optimistic offers.

2015-10-14 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-1607:
-
Description: 
*Background*
The current implementation of resource offers only enable a single framework 
scheduler to make scheduling decisions for some available resources at a time. 
In some circumstances, this is good, i.e., when we don't want other framework 
schedulers to have access to some resources. However, in other circumstances, 
there are advantages to letting multiple framework schedulers attempt to make 
scheduling decisions for the _same_ allocation of resources in parallel.

If you think about this from a "concurrency control" perspective, the current 
implementation of resource offers is _pessimistic_, the resources contained 
within an offer are _locked_ until the framework scheduler that they were 
offered to launches tasks with them or declines them. In addition to making 
pessimistic offers we'd like to give out _optimistic_ offers, where the same 
resources are offered to multiple framework schedulers at the same time, and 
framework schedulers "compete" for those resources on a first-come-first-serve 
basis (i.e., the first to launch a task "wins"). We've always reserved the 
right to rescind resource offers using the 'rescind' primitive in the API, and 
a framework scheduler should be prepared to launch a task and have those tasks 
go lost because another framework already started to use those resources.

*Feature*
We plan to take a step towards optimistic offers, by introducing primitives 
that allow resources to be offered to multiple frameworks at once.  At first, 
we will use these primitives to optimistically allocate resources that are 
reserved for a particular framework/role but have not been allocated by that 
framework/role.  

The work with optimistic offers will closely resemble the existing 
oversubscription feature.  Optimistically offered resources are likely to be 
considered "revocable resources" (the concept that using resources not reserved 
for you means you might get those resources revoked).  In effect, we can may 
create something like a "spot" market for unused resources, driving up 
utilization by letting frameworks that are willing to use revocable resources 
run tasks.

*Future Work*
This ticket tracks the introduction of some aspects of optimistic offers.  

Taken to the limit, one could imagine always making optimistic resource offers. 
This bears a striking resemblance with the Google Omega model (an isomorphism 
even). However, being able to configure what resources should be allocated 
optimistically and what resources should be allocated pessimistically gives 
even more control to a datacenter/cluster operator that might want to, for 
example, never let multiple frameworks (roles) compete for some set of 
resources.

  was:
The current implementation of resource offers only enable a single framework 
scheduler to make scheduling decisions for some available resources at a time. 
In some circumstances, this is good, i.e., when we don't want other framework 
schedulers to have access to some resources. However, in other circumstances, 
there are advantages to letting multiple framework schedulers attempt to make 
scheduling decisions for the _same_ allocation of resources in parallel.

If you think about this from a "concurrency control" perspective, the current 
implementation of resource offers is _pessimistic_, the resources contained 
within an offer are _locked_ until the framework scheduler that they were 
offered to launches tasks with them or declines them. In addition to making 
pessimistic offers we'd like to give out _optimistic_ offers, where the same 
resources are offered to multiple framework schedulers at the same time, and 
framework schedulers "compete" for those resources on a first-come-first-serve 
basis (i.e., the first to launch a task "wins"). We've always reserved the 
right to rescind resource offers using the 'rescind' primitive in the API, and 
a framework scheduler should be prepared to launch a task and have those tasks 
go lost because another framework already started to use those resources.

Introducing optimistic offers will enable more sophisticated allocation 
algorithms. For example, we can optimistically allocate resources that are 
reserved for a particular framework (role) but are not being used. In 
conjunction with revocable resources (the concept that using resources not 
reserved for you means you might get those resources revoked) we can easily 
create a "spot" market for unused resources, driving up utilization by letting 
frameworks that are willing to use revocable resources run tasks.

In the limit, one could imagine always making optimistic resource offers. This 
bears a striking resemblance with the Google Omega model (an isomorphism even). 
However, being able to configure what resources should be allocated 

[jira] [Commented] (MESOS-3183) Documentation images do not load

2015-10-09 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14950686#comment-14950686
 ] 

Joseph Wu commented on MESOS-3183:
--

Update: [~davelester] has submitted the {{rake.patch}}, see [the svn 
commit|http://svn.apache.org/viewvc?view=revision=1707725].

Images still don't show up, since all links which are formatted like 
{{images/foo.png}} will point to 
{{/documentation/latest/some-document/images/foo.png}} instead of 
{{/documentation/latest/images/foo.png}}.

We're still discussing how to make the URL change compatible with versioned 
documentation.

> Documentation images do not load
> 
>
> Key: MESOS-3183
> URL: https://issues.apache.org/jira/browse/MESOS-3183
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.0
>Reporter: James Mulcahy
>Assignee: Joseph Wu
>Priority: Minor
>  Labels: mesosphere
> Attachments: rake.patch
>
>
> Any images which are referenced from the generated docs ({{docs/*.md}}) do 
> not show up on the website.  For example:
> * [External 
> Containerizer|http://mesos.apache.org/documentation/latest/external-containerizer/]
> * [Fetcher Cache 
> Internals|http://mesos.apache.org/documentation/latest/fetcher-cache-internals/]
> * [Maintenance|http://mesos.apache.org/documentation/latest/maintenance/] 
> * 
> [Oversubscription|http://mesos.apache.org/documentation/latest/oversubscription/]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3409) Refactor the plain JSON parsing in the docker containerizer

2015-09-30 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3409:
-
Assignee: Gilbert Song  (was: Joseph Wu)

> Refactor the plain JSON parsing in the docker containerizer
> ---
>
> Key: MESOS-3409
> URL: https://issues.apache.org/jira/browse/MESOS-3409
> Project: Mesos
>  Issue Type: Improvement
>  Components: docker
>Reporter: Joseph Wu
>Assignee: Gilbert Song
>Priority: Minor
>
> Two functions in the Docker-related code take a string and parse it to JSON:
> * {{Docker::Container::create}} in {{src/docker/docker.cpp}}
> * {{Token::create}} in 
> {{src/slave/containerizer/provisioners/docker/token_manager.cpp}}
> This JSON is then validated (lots of if-elses) and used via the 
> {{JSON::Value}} accessors.  We could instead use a protobuf and the related 
> Stout JSON->Protobuf conversion function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3183) Documentation images do not load

2015-09-28 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3183:
-
Sprint: Mesosphere Sprint 20

> Documentation images do not load
> 
>
> Key: MESOS-3183
> URL: https://issues.apache.org/jira/browse/MESOS-3183
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.0
>Reporter: James Mulcahy
>Assignee: Joseph Wu
>Priority: Minor
>  Labels: mesosphere
> Attachments: rake.patch
>
>
> Any images which are referenced from the generated docs ({{docs/*.md}}) do 
> not show up on the website.  For example:
> * [External 
> Containerizer|http://mesos.apache.org/documentation/latest/external-containerizer/]
> * [Fetcher Cache 
> Internals|http://mesos.apache.org/documentation/latest/fetcher-cache-internals/]
> * [Maintenance|http://mesos.apache.org/documentation/latest/maintenance/] 
> * 
> [Oversubscription|http://mesos.apache.org/documentation/latest/oversubscription/]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1615) Create design document for Optimistic Offers

2015-09-28 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-1615:
-
Sprint: Mesosphere Sprint 20

> Create design document for Optimistic Offers
> 
>
> Key: MESOS-1615
> URL: https://issues.apache.org/jira/browse/MESOS-1615
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Dominic Hamon
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> As a first step toward Optimistic Offers, take the description from the epic 
> and build an implementation design doc that can be shared for comments.
> Note: the links to the working group notes and design doc are located in the 
> [JIRA Epic|MESOS-1607].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1615) Create design document for Optimistic Offers

2015-09-28 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-1615:
-
Story Points: 8  (was: 5)

> Create design document for Optimistic Offers
> 
>
> Key: MESOS-1615
> URL: https://issues.apache.org/jira/browse/MESOS-1615
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Dominic Hamon
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> As a first step toward Optimistic Offers, take the description from the epic 
> and build an implementation design doc that can be shared for comments.
> Note: the links to the working group notes and design doc are located in the 
> [JIRA Epic|MESOS-1607].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1615) Create design document for Optimistic Offers

2015-09-28 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-1615:
-
Description: 
As a first step toward Optimistic Offers, take the description from the epic 
and build an implementation design doc that can be shared for comments.

Note: the links to the working group notes and design doc are located in the 
[JIRA Epic|MESOS-1607].

  was:As a first step toward Optimistic Offers, take the description from the 
epic and build an implementation design doc that can be shared for comments.


> Create design document for Optimistic Offers
> 
>
> Key: MESOS-1615
> URL: https://issues.apache.org/jira/browse/MESOS-1615
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Dominic Hamon
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> As a first step toward Optimistic Offers, take the description from the epic 
> and build an implementation design doc that can be shared for comments.
> Note: the links to the working group notes and design doc are located in the 
> [JIRA Epic|MESOS-1607].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1615) Create design document for Optimistic Offers

2015-09-28 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-1615:
-
Labels: mesosphere  (was: )

> Create design document for Optimistic Offers
> 
>
> Key: MESOS-1615
> URL: https://issues.apache.org/jira/browse/MESOS-1615
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Dominic Hamon
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> As a first step toward Optimistic Offers, take the description from the epic 
> and build an implementation design doc that can be shared for comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1615) Create design document for Optimistic Offers

2015-09-28 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-1615:
-
Story Points: 5

> Create design document for Optimistic Offers
> 
>
> Key: MESOS-1615
> URL: https://issues.apache.org/jira/browse/MESOS-1615
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Dominic Hamon
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> As a first step toward Optimistic Offers, take the description from the epic 
> and build an implementation design doc that can be shared for comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-1615) Create design document for Optimistic Offers

2015-09-28 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-1615:


Assignee: Joseph Wu

> Create design document for Optimistic Offers
> 
>
> Key: MESOS-1615
> URL: https://issues.apache.org/jira/browse/MESOS-1615
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Dominic Hamon
>Assignee: Joseph Wu
>
> As a first step toward Optimistic Offers, take the description from the epic 
> and build an implementation design doc that can be shared for comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-1607) Introduce optimistic offers.

2015-09-28 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-1607:
-
Labels: mesosphere  (was: )

> Introduce optimistic offers.
> 
>
> Key: MESOS-1607
> URL: https://issues.apache.org/jira/browse/MESOS-1607
> Project: Mesos
>  Issue Type: Epic
>  Components: allocation, framework, master
>Reporter: Benjamin Hindman
>Assignee: Artem Harutyunyan
>  Labels: mesosphere
> Attachments: optimisitic-offers.pdf
>
>
> The current implementation of resource offers only enable a single framework 
> scheduler to make scheduling decisions for some available resources at a 
> time. In some circumstances, this is good, i.e., when we don't want other 
> framework schedulers to have access to some resources. However, in other 
> circumstances, there are advantages to letting multiple framework schedulers 
> attempt to make scheduling decisions for the _same_ allocation of resources 
> in parallel.
> If you think about this from a "concurrency control" perspective, the current 
> implementation of resource offers is _pessimistic_, the resources contained 
> within an offer are _locked_ until the framework scheduler that they were 
> offered to launches tasks with them or declines them. In addition to making 
> pessimistic offers we'd like to give out _optimistic_ offers, where the same 
> resources are offered to multiple framework schedulers at the same time, and 
> framework schedulers "compete" for those resources on a 
> first-come-first-serve basis (i.e., the first to launch a task "wins"). We've 
> always reserved the right to rescind resource offers using the 'rescind' 
> primitive in the API, and a framework scheduler should be prepared to launch 
> a task and have those tasks go lost because another framework already started 
> to use those resources.
> Introducing optimistic offers will enable more sophisticated allocation 
> algorithms. For example, we can optimistically allocate resources that are 
> reserved for a particular framework (role) but are not being used. In 
> conjunction with revocable resources (the concept that using resources not 
> reserved for you means you might get those resources revoked) we can easily 
> create a "spot" market for unused resources, driving up utilization by 
> letting frameworks that are willing to use revocable resources run tasks.
> In the limit, one could imagine always making optimistic resource offers. 
> This bears a striking resemblance with the Google Omega model (an isomorphism 
> even). However, being able to configure what resources should be allocated 
> optimistically and what resources should be allocated pessimistically gives 
> even more control to a datacenter/cluster operator that might want to, for 
> example, never let multiple frameworks (roles) compete for some set of 
> resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1615) Create design document for Optimistic Offers

2015-09-25 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14908870#comment-14908870
 ] 

Joseph Wu commented on MESOS-1615:
--

[~bmahler], Is it ok if we (Mesosphere/IBM working group on optimistic offers) 
take over this ticket?

> Create design document for Optimistic Offers
> 
>
> Key: MESOS-1615
> URL: https://issues.apache.org/jira/browse/MESOS-1615
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Dominic Hamon
>Assignee: Benjamin Mahler
>
> As a first step toward Optimistic Offers, take the description from the epic 
> and build an implementation design doc that can be shared for comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3492) Expose maintenance user doc via the documentation home page

2015-09-22 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3492:


 Summary: Expose maintenance user doc via the documentation home 
page
 Key: MESOS-3492
 URL: https://issues.apache.org/jira/browse/MESOS-3492
 Project: Mesos
  Issue Type: Task
  Components: documentation
Reporter: Joseph Wu
Assignee: Joseph Wu
Priority: Trivial


The committed docs can be found here:
http://mesos.apache.org/documentation/latest/maintenance/

We need to add a link to {{docs/home.md}}
Also, the doc needs some minor formatting tweaks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3458) Segfault when accepting or declining inverse offers

2015-09-22 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3458:
-
Sprint: Mesosphere Sprint 19

> Segfault when accepting or declining inverse offers
> ---
>
> Key: MESOS-3458
> URL: https://issues.apache.org/jira/browse/MESOS-3458
> Project: Mesos
>  Issue Type: Bug
>  Components: master
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>Priority: Blocker
>  Labels: mesosphere
>
> Discovered while writing a test for filters (in regards to inverse offers).
> Fix here: https://reviews.apache.org/r/38470/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3459) Change /machine/up and /machine/down endpoints to take an array

2015-09-22 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3459:
-
Sprint: Mesosphere Sprint 19

> Change /machine/up and /machine/down endpoints to take an array
> ---
>
> Key: MESOS-3459
> URL: https://issues.apache.org/jira/browse/MESOS-3459
> Project: Mesos
>  Issue Type: Task
>  Components: master
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> With [MESOS-3312] committed, the {{/machine/up}} and {{/machine/down}} 
> endpoints should also take an input as an array.
> It is important to change this before maintenance primitives are released:
> https://reviews.apache.org/r/38011/
> Also, a minor change to the error message from these endpoints:
> https://reviews.apache.org/r/37969/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3490) Mesos UI fails to represent JSON entities

2015-09-22 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-3490:


Assignee: Joseph Wu

> Mesos UI fails to represent JSON entities
> -
>
> Key: MESOS-3490
> URL: https://issues.apache.org/jira/browse/MESOS-3490
> Project: Mesos
>  Issue Type: Bug
>Reporter: Isabel Jimenez
>Assignee: Joseph Wu
>
> The Mesos UI is broken, it seems to fail to represent JSON from /state.
> This may got introduced with https://reviews.apache.org/r/38028/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3490) Mesos UI fails to represent JSON entities

2015-09-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902965#comment-14902965
 ] 

Joseph Wu edited comment on MESOS-3490 at 9/22/15 4:55 PM:
---

The error lies with how we printed JSON numbers with a trailing period {{.}}, 
which is invalid JSON.

Simple fix is to print numbers with a trailing {{.0}} instead.
Review: https://reviews.apache.org/r/38632/


was (Author: kaysoky):
The error lies with how we printed JSON numbers with a trailing period '.', 
which is invalid JSON.

Review: https://reviews.apache.org/r/38632/

> Mesos UI fails to represent JSON entities
> -
>
> Key: MESOS-3490
> URL: https://issues.apache.org/jira/browse/MESOS-3490
> Project: Mesos
>  Issue Type: Bug
>Reporter: Isabel Jimenez
>Assignee: Joseph Wu
>
> The Mesos UI is broken, it seems to fail to represent JSON from /state.
> This may got introduced with https://reviews.apache.org/r/38028/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3490) Mesos UI fails to represent JSON entities

2015-09-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14902965#comment-14902965
 ] 

Joseph Wu commented on MESOS-3490:
--

The error lies with how we printed JSON numbers with a trailing period '.', 
which is invalid JSON.

Review: https://reviews.apache.org/r/38632/

> Mesos UI fails to represent JSON entities
> -
>
> Key: MESOS-3490
> URL: https://issues.apache.org/jira/browse/MESOS-3490
> Project: Mesos
>  Issue Type: Bug
>Reporter: Isabel Jimenez
>Assignee: Joseph Wu
>
> The Mesos UI is broken, it seems to fail to represent JSON from /state.
> This may got introduced with https://reviews.apache.org/r/38028/.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3489) Add support for exposing Accept/Decline responses for inverse offers

2015-09-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14903618#comment-14903618
 ] 

Joseph Wu commented on MESOS-3489:
--

We should be able to get this done tonight.

> Add support for exposing Accept/Decline responses for inverse offers
> 
>
> Key: MESOS-3489
> URL: https://issues.apache.org/jira/browse/MESOS-3489
> Project: Mesos
>  Issue Type: Bug
>Reporter: Artem Harutyunyan
>Assignee: Joseph Wu
>  Labels: mesosphere
>
> Current implementation of maintenance primitives does not support exposing 
> Accept/Decline responses of frameworks to the cluster operators. 
> This functionality is necessary to provide visibility to operators into 
> whether a given framework is ready to comply with the posted maintenance 
> schedule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    4   5   6   7   8   9   10   11   >