[jira] [Created] (MESOS-3795) process::io::write takes parameter as void* which could be const

2015-10-23 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-3795:
---

 Summary: process::io::write takes parameter as void* which could 
be const
 Key: MESOS-3795
 URL: https://issues.apache.org/jira/browse/MESOS-3795
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Benjamin Bannier


In libprocess we have

{code}
Future write(int fd, void* data, size_t size);
{code}

which expects a non-{{const}} {{void*}} for its {{data}} parameter. Under the 
covers {{data}} appears to be handled as a {{const}} (like one would expect 
from the signature its inspiration {{::write}}).

This function is not used too often, but since it expects a non-{{const}} value 
for {{data}} automatic conversions to {{void*}} from other pointer types are 
disabled; instead callers seem cast manually to {{void*}} -- often with C-style 
casts.

We should sync this method's signature with that of {{::write}}.

In addition to following the expected semantics of {{::write}}, having this 
work without casts with any pointer value {{data}} would make it easier to 
interface this with character literals, or raw data ptrs from STL containers 
(e.g. {{Container::data}}). It would probably also indirectly eliminate 
temptation to use C-casts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.

2015-10-23 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970921#comment-14970921
 ] 

Benjamin Bannier commented on MESOS-3581:
-

After soliciting feedback on the [mailing 
list|http://www.mail-archive.com/dev@mesos.apache.org/msg33488.html] there was 
some consensus that updating the source files was preferable over a workaround 
using e.g. {{INPUT_FILTER}}.

Will propose a patch implementing changed license headers next. 

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.

2015-10-23 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14971038#comment-14971038
 ] 

Benjamin Bannier commented on MESOS-3581:
-

RRs:

- https://reviews.apache.org/r/39590/
- https://reviews.apache.org/r/39591/
- https://reviews.apache.org/r/39592/

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2275) Document header include rules in style guide

2015-10-22 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968658#comment-14968658
 ] 

Benjamin Bannier commented on MESOS-2275:
-

I think we probably would also want an example that makes it clearer if in each 
component we use pure lex sort, or instead do enforce some residual level of 
logical ordering, e.g. {{clang-format}} (from trunk) prefers lexicographical 
sort

{code}
#include 
#include 
{code}

while one could also imagine the opposite ordering which emphasizes {{foo.hpp}} 
as some sort of "heading header" (currently not supported by {{clang-format}}).

The Google style guide asks for "alphabetical ordering" which isn't helpful 
here.

> Document header include rules in style guide
> 
>
> Key: MESOS-2275
> URL: https://issues.apache.org/jira/browse/MESOS-2275
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Niklas Quarfot Nielsen
>Assignee: Jan Schlicht
>Priority: Trivial
>  Labels: beginner, docathon, mesosphere
>
> We have several ways of sorting, grouping and ordering headers includes in 
> Mesos. We should agree on a rule set and do a style scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3709) Modulize the containerizer interface.

2015-10-22 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969179#comment-14969179
 ] 

Benjamin Bannier commented on MESOS-3709:
-

Another approach would be to internalize the various {{*ID}} and {{SlaveState}} 
parameters and e.g. supply them on construction of the base class and so keep 
all this inside internal code.

> Modulize the containerizer interface.
> -
>
> Key: MESOS-3709
> URL: https://issues.apache.org/jira/browse/MESOS-3709
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Benjamin Bannier
>
> So that people can implement their own containerizer as a module. That's more 
> efficient than having an external containerizer and shell out. The module 
> system also provides versioning support, this is definitely better than 
> unversioned external containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3709) Modulize the containerizer interface.

2015-10-22 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3709:

Assignee: (was: Benjamin Bannier)

> Modulize the containerizer interface.
> -
>
> Key: MESOS-3709
> URL: https://issues.apache.org/jira/browse/MESOS-3709
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>
> So that people can implement their own containerizer as a module. That's more 
> efficient than having an external containerizer and shell out. The module 
> system also provides versioning support, this is definitely better than 
> unversioned external containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3856) Add mtime-related fetcher tests

2015-11-09 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-3856:
---

 Summary: Add mtime-related fetcher tests
 Key: MESOS-3856
 URL: https://issues.apache.org/jira/browse/MESOS-3856
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3581) License headers show up all over doxygen documentation.

2015-11-10 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3581:

Sprint: Mesosphere Sprint 22

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>  Labels: mesosphere
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3551) Replace use of strerror with thread-safe alternatives strerror_r / strerror_l.

2015-11-10 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3551:

Sprint: Mesosphere Sprint 22

> Replace use of strerror with thread-safe alternatives strerror_r / strerror_l.
> --
>
> Key: MESOS-3551
> URL: https://issues.apache.org/jira/browse/MESOS-3551
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, stout
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: mesosphere, newbie, tech-debt
>
> {{strerror()}} is not required to be thread safe by POSIX and is listed as 
> unsafe on Linux:
> http://pubs.opengroup.org/onlinepubs/9699919799/
> http://man7.org/linux/man-pages/man3/strerror.3.html
> I don't believe we've seen any issues reported due to this. We should replace 
> occurrences of strerror accordingly, possibly offering a wrapper in stout to 
> simplify callsites.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3839) Update documentation for FetcherCache mtime-related changes

2015-11-10 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3839:

Story Points: 2

> Update documentation for FetcherCache mtime-related changes
> ---
>
> Key: MESOS-3839
> URL: https://issues.apache.org/jira/browse/MESOS-3839
> Project: Mesos
>  Issue Type: Documentation
>  Components: fetcher, slave
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3839) Update documentation for FetcherCache mtime-related changes

2015-11-10 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3839:

Sprint: Mesosphere Sprint 23

> Update documentation for FetcherCache mtime-related changes
> ---
>
> Key: MESOS-3839
> URL: https://issues.apache.org/jira/browse/MESOS-3839
> Project: Mesos
>  Issue Type: Documentation
>  Components: fetcher, slave
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3856) Add mtime-related fetcher tests

2015-11-10 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3856:

Story Points: 2

> Add mtime-related fetcher tests
> ---
>
> Key: MESOS-3856
> URL: https://issues.apache.org/jira/browse/MESOS-3856
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3839) Update documentation for FetcherCache mtime-related changes

2015-11-10 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3839:

Story Points: 1  (was: 2)

> Update documentation for FetcherCache mtime-related changes
> ---
>
> Key: MESOS-3839
> URL: https://issues.apache.org/jira/browse/MESOS-3839
> Project: Mesos
>  Issue Type: Documentation
>  Components: fetcher, slave
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3856) Add mtime-related fetcher tests

2015-11-10 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3856:

Sprint: Mesosphere Sprint 22

> Add mtime-related fetcher tests
> ---
>
> Key: MESOS-3856
> URL: https://issues.apache.org/jira/browse/MESOS-3856
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3839) Update documentation for FetcherCache mtime-related changes

2015-11-06 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-3839:
---

 Summary: Update documentation for FetcherCache mtime-related 
changes
 Key: MESOS-3839
 URL: https://issues.apache.org/jira/browse/MESOS-3839
 Project: Mesos
  Issue Type: Documentation
  Components: fetcher, slave
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3754) Update the style guide with a rule regarding the use of default case in switch statements

2015-10-19 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963000#comment-14963000
 ] 

Benjamin Bannier commented on MESOS-3754:
-

For this to really pay off one probably would also want to request that 
{{enum}} values shall preferably (always?) be branched on with {{switch}} 
instead of e.g. plain {{if}}.

Also, when switching over plain integer, non-{{enum}} types one should probably 
*always* add a {{default}} case to allow reasoning about the code locally.

> Update the style guide with a rule regarding the use of default case in 
> switch statements
> -
>
> Key: MESOS-3754
> URL: https://issues.apache.org/jira/browse/MESOS-3754
> Project: Mesos
>  Issue Type: Bug
>  Components: documentation
>Reporter: Michael Park
>
> This is the continuation of the initial discussions started on MESOS-2664.
> The motivation is to rely on the compiler for compile-time errors for cases 
> missing in the {{switch}} statement, rather than aborting in the {{default}} 
> case at runtime.
> The pattern we want to avoid is a {{switch}} statement that fully enumerates 
> the cases of an {{enum}} *and* having a {{default}} case that aborts at 
> runtime. The preferred approach is to omit the {{default}} case.
> This pattern can be seen across the codebase, one example is:
> {code}
> switch (volume.mode()) {
>   case Volume::RW: volumeConfig += ":rw"; break;
>   case Volume::RO: volumeConfig += ":ro"; break;
>   default:
> LOG(FATAL) << "Unknown Volume mode: " << volume.mode();
> break;
> }
> {code}
> The proposal is not to disallow uses of {{default}} cases, but to only use 
> them when it actually carries some meaningful fallback behavior.
> This use of {{default}} leads to the following advantages:
> 1. If we miss any of the cases, you get notified via a compiler-error.
> 2. If a new value is added to an {{enum}}, the compiler reports all of the 
> places that needs to be updated to handle the new value.
> Ideally, the compiler would also prevent us from covering all enumerations 
> *and* providing a {{default}}. Since this is not an available option, we aim 
> to capture this as a style guideline.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3709) Modulize the containerizer interface.

2015-10-19 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14963303#comment-14963303
 ] 

Benjamin Bannier commented on MESOS-3709:
-

There seem to be an straight-forward and a more involved part in moving 
{{Containerizer}} to the public interface:

# streamline {{Containerizer}} interface: demote {{Containerizer::create}} and 
{{Containerizer::resources}} from member functions to free functions, and
# decide on a publishable signature for {{Containerizer::recover}}.


Currently we have {{Containerizer::recover(const Option&)}} 
where {{SlaveState}} is not part of the public interface. The various concrete 
containerizers use {{state}} to obtain mostly {{FrameworkStates}}, or 
{{ExecutorStates}} from which they get {{ContainerIDs}}. Often the {{RunState}} 
belonging to a {{ContainerID}} is looked up in an {{ExecutorState}}.
The {{DockerContainerizer}} additionally uses the {{StateIDs}} (i.e. 
{{SlaveState::id}}) which it stores as {{Container::slaveIDs}}.

It seems we wouldn't want to publish all the dependencies of {{SlaveState}} and 
maintain migration code for their internal serialization via protobuf. Instead 
we should probably decide on a less open set using public types a caller needs 
to provide when invoking {{Containerizer::recover}}; we could then refactor 
existing containerizers to that interface.  

> Modulize the containerizer interface.
> -
>
> Key: MESOS-3709
> URL: https://issues.apache.org/jira/browse/MESOS-3709
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Benjamin Bannier
>
> So that people can implement their own containerizer as a module. That's more 
> efficient than having an external containerizer and shell out. The module 
> system also provides versioning support, this is definitely better than 
> unversioned external containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3741) stout containers inherit from STL containers

2015-10-15 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-3741:
---

 Summary: stout containers inherit from STL containers
 Key: MESOS-3741
 URL: https://issues.apache.org/jira/browse/MESOS-3741
 Project: Mesos
  Issue Type: Bug
  Components: stout
Reporter: Benjamin Bannier


stout exposes a number of containers ({{hashmap}}, {{hashset}}, {{list}}, 
{{multihashmap}}, and {{set}}) which {{publicly}} inherit from their STL 
counterparts since MESOS-3217. This code being in stout means these custom 
containers live in the global namespace.

Classes inherited publicly from STL containers are not generally safe to use as 
the STL containers lack {{virtual}} destructors, so that deleting through a 
ptr-to-base will not invoke the base dtr and leak memory. It appears this is 
being made worse by e.g. putting the stout containers (which are often named 
like their STL counterparts) in the global namespace which makes it easy to 
confuse the actual type being used (at least in messy user code containing 
{{using namespace std;}} which is not allowed for good reasons like this in 
mesos code).

It would seem better to (1) decide what minimal set of containers still needs 
to be provided now that C++11 can be used, (2) fix the inheritance for the 
stout containers (e.g. inherit {{privately}}), or at least (3) use a dedicated 
namespace for these custom containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3741) stout containers inherit from STL containers

2015-10-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3741:

Description: 
stout exposes a number of containers ({{hashmap}}, {{hashset}}, {{list}}, 
{{multihashmap}}, and {{set}}) which {{publicly}} inherit from their STL 
counterparts since MESOS-3217. This code being in stout means these custom 
containers live in the global namespace.

Classes inherited publicly from STL containers are not generally safe to use as 
the STL containers lack {{virtual}} destructors, so that deleting through a 
ptr-to-base will not invoke the base dtr and leak memory. It appears this is 
being made worse by e.g. putting the stout containers (which are often named 
like their STL counterparts) in the global namespace which makes it easy to 
confuse the actual type being used (at least in messy user code containing 
{{using namespace std;}} which is not allowed for good reasons like this in 
mesos code).

It would seem better to (1) decide what minimal set of containers still needs 
to be provided now that C++11 can be used, (2) fix the inheritance for the 
stout containers (e.g. inherit {{privately}} or just compose), or at least (3) 
use a dedicated namespace for these custom containers. 

  was:
stout exposes a number of containers ({{hashmap}}, {{hashset}}, {{list}}, 
{{multihashmap}}, and {{set}}) which {{publicly}} inherit from their STL 
counterparts since MESOS-3217. This code being in stout means these custom 
containers live in the global namespace.

Classes inherited publicly from STL containers are not generally safe to use as 
the STL containers lack {{virtual}} destructors, so that deleting through a 
ptr-to-base will not invoke the base dtr and leak memory. It appears this is 
being made worse by e.g. putting the stout containers (which are often named 
like their STL counterparts) in the global namespace which makes it easy to 
confuse the actual type being used (at least in messy user code containing 
{{using namespace std;}} which is not allowed for good reasons like this in 
mesos code).

It would seem better to (1) decide what minimal set of containers still needs 
to be provided now that C++11 can be used, (2) fix the inheritance for the 
stout containers (e.g. inherit {{privately}}), or at least (3) use a dedicated 
namespace for these custom containers. 


> stout containers inherit from STL containers
> 
>
> Key: MESOS-3741
> URL: https://issues.apache.org/jira/browse/MESOS-3741
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Benjamin Bannier
>
> stout exposes a number of containers ({{hashmap}}, {{hashset}}, {{list}}, 
> {{multihashmap}}, and {{set}}) which {{publicly}} inherit from their STL 
> counterparts since MESOS-3217. This code being in stout means these custom 
> containers live in the global namespace.
> Classes inherited publicly from STL containers are not generally safe to use 
> as the STL containers lack {{virtual}} destructors, so that deleting through 
> a ptr-to-base will not invoke the base dtr and leak memory. It appears this 
> is being made worse by e.g. putting the stout containers (which are often 
> named like their STL counterparts) in the global namespace which makes it 
> easy to confuse the actual type being used (at least in messy user code 
> containing {{using namespace std;}} which is not allowed for good reasons 
> like this in mesos code).
> It would seem better to (1) decide what minimal set of containers still needs 
> to be provided now that C++11 can be used, (2) fix the inheritance for the 
> stout containers (e.g. inherit {{privately}} or just compose), or at least 
> (3) use a dedicated namespace for these custom containers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3607) mesos::internal::tests::execute contains non-async-safe code when it shouldn't

2015-10-08 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-3607:
---

 Summary: mesos::internal::tests::execute contains non-async-safe 
code when it shouldn't
 Key: MESOS-3607
 URL: https://issues.apache.org/jira/browse/MESOS-3607
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Bannier
Priority: Minor


The function {{mesos::internal::tests::execute}} is used to to fork test 
scripts via {{TEST_SCRIPT}} and contains non-async-safe code in the 
{{fork}}/{{exec}} bracket.

In fact most of the functions used there are not async-safe; we have at least 
the following:

* {{freopen}},
* {{os::setenv}} which calls {{::setenv}} under the covers,
* {{malloc}} for storage of temporary {{std::strings}}, also via 
{{CHECK_SOME}}, or
* potentially {{malloc}} for internal protobuf allocations triggered by 
{{\*.add_*}} functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3551) Replace use of strerror with thread-safe alternatives strerror_r / strerror_l.

2015-10-07 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14946733#comment-14946733
 ] 

Benjamin Bannier commented on MESOS-3551:
-

OK, the 0th implementation brought some open ends to the surface.

h5. strerror_r vs. strerror_l

To not break OS X compatibility {{strerror_l}} seems out of reach and we need 
to resort to {{strerror_r}}.

h5. Glibc provides a non-compliant and potentially broken {{strerror_r}}

Since {{strerror}} uses are currently all across stout (a few), libprocess (a 
handful), and mesos (plenty), a natural place to implement a wrapper should 
probably be in stout.

Since stout is header-only a reusable wrapper implementation would probably 
under the covers use any available implementation (or: _Should this be a reason 
to (a) implement the wrapper higher up, e.g. in libprocess, or (b) make stout 
include compiled components, or (c) no, leave it in stout?_).


Assuming we decide to implement this wrapper in a header we would also decide 
on how to deal with a bug in glibc-2.15:

bq. https://sourceware.org/bugzilla/show_bug.cgi?id=12204

Here glibc's {{strerror_r}} might set the global {{errno}} should it run into 
errors itself (e.g. because the passed errnum was invalid) which is not 
compliant and probably unexpected. Fixed versions where shipped e.g. starting 
with Debian8, CentOS7, Ubuntu14.04.

Since the mesos {{configure.ac}} already requires at least gcc-4.8.0 which is 
not satisfied by stock Debian7 (gcc-4.7.2), CentOS6 (gcc-4.4.7), or Ubuntu12.04 
(gcc-4.6.3) _we could provide an implementation for either (a) only 
glibc-2.16+, or (b) introduce a workaround if we are using an old version_. If 
we decide on (a) it appears that adding a configure check wouldn't be 
sufficient to prevent someone from including the header from stout so we would 
need to add checks and potentially {{#errors}} in the implementation itself.


h5. Localized error messages

We cannot know the maximal length of error messages since they might be 
localized. We could either

  (a) implement an algorithm growing the buffer used by {{strerror_r}} until 
the message fits in, or
  (b) use a fixed buffer size with an educated guess about the maximal error 
message length (say 2000 char like used in {{llvm::sys::StrError}}).

Given the complexity workarounds for glibc non-conformance might introduce I 
feel option (b) might be good enough for now.


Any input welcome.


> Replace use of strerror with thread-safe alternatives strerror_r / strerror_l.
> --
>
> Key: MESOS-3551
> URL: https://issues.apache.org/jira/browse/MESOS-3551
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, stout
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: newbie, tech-debt
>
> {{strerror()}} is not required to be thread safe by POSIX and is listed as 
> unsafe on Linux:
> http://pubs.opengroup.org/onlinepubs/9699919799/
> http://man7.org/linux/man-pages/man3/strerror.3.html
> I don't believe we've seen any issues reported due to this. We should replace 
> occurrences of strerror accordingly, possibly offering a wrapper in stout to 
> simplify callsites.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3517) Building mesos from source fails when OS language is not English

2015-10-07 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-3517:
---

Assignee: Benjamin Bannier

> Building mesos from source fails when OS language is not English
> 
>
> Key: MESOS-3517
> URL: https://issues.apache.org/jira/browse/MESOS-3517
> Project: Mesos
>  Issue Type: Bug
>  Components: build
> Environment: Dutch locale on Ubuntu 15.04
>Reporter: Wessel Nieboer
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Line 963 of mesos/3rdparty/libprocess/3rdparty/stout/tests/os_tests.cpp 
> contains the following:
>   EXPECT_TRUE(strings::contains(result.get(), "No such file or directory"));
> But this does not match when your locale is not English. When changing it to 
> what my terminal gives me: "Bestand of map bestaat niet" then it works just 
> fine. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3517) Building mesos from source fails when OS language is not English

2015-10-07 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3517:

Shepherd: Till Toenshoff

> Building mesos from source fails when OS language is not English
> 
>
> Key: MESOS-3517
> URL: https://issues.apache.org/jira/browse/MESOS-3517
> Project: Mesos
>  Issue Type: Bug
>  Components: build
> Environment: Dutch locale on Ubuntu 15.04
>Reporter: Wessel Nieboer
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Line 963 of mesos/3rdparty/libprocess/3rdparty/stout/tests/os_tests.cpp 
> contains the following:
>   EXPECT_TRUE(strings::contains(result.get(), "No such file or directory"));
> But this does not match when your locale is not English. When changing it to 
> what my terminal gives me: "Bestand of map bestaat niet" then it works just 
> fine. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3271) SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.

2015-10-13 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3271:

Description: 
Test failure on Ubuntu 14 configured with {{--disable-java --disable-python 
--enable-ssl --enable-libevent --enable-optimize --enable-network-isolation}}

Commit: {{9b78b301469667b5a44f0a351de5f3a71edae499}}

{code}
[ RUN  ] SlaveRecoveryTest/0.NonCheckpointingFramework
I0815 06:41:47.413602 17091 exec.cpp:133] Version: 0.24.0
I0815 06:41:47.416780 17111 exec.cpp:207] Executor registered on slave 
20150815-064146-544909504-51064-12195-S0
Registered executor on slave1-ubuntu12
Starting task 044bd49e-2f38-4671-802a-ac6524d61a85
Forked command at 17114
sh -c 'sleep 1000'
[err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 
0x2, fd: 21, flags: 0x80)
*** Aborted at 1439646107 (unix time) try "date -d @1439646107" if you are 
using GNU date ***
PC: @ 0x7f6ba512d0d5 (unknown)
*** SIGABRT (@0x2fa3) received by PID 12195 (TID 0x7f6b9d613700) from PID 
12195; stack trace: ***
@ 0x7f6ba54c4cb0 (unknown)
@ 0x7f6ba512d0d5 (unknown)
@ 0x7f6ba513083b (unknown)
@ 0x7f6ba448e1ba (unknown)
@ 0x7f6ba448e52b (unknown)
@ 0x7f6ba447dcc9 (unknown)
@   0x4c4033 process::internal::run<>()
@ 0x7f6ba72642ab process::Future<>::discard()
@ 0x7f6ba72643be process::internal::discard<>()
@ 0x7f6ba7262298 
_ZNSt17_Function_handlerIFvvEZNK7process6FutureImE9onDiscardISt5_BindIFPFvNS1_10WeakFutureIsEEES7_RKS3_OT_EUlvE_E9_M_invokeERKSt9_Any_data
@   0x4c4033 process::internal::run<>()
@   0x6fa0cb process::Future<>::discard()
@ 0x7f6ba6fb5736 cgroups::event::Listener::finalize()
@ 0x7f6ba728fb11 process::ProcessManager::resume()
@ 0x7f6ba728fe0f process::internal::schedule()
@ 0x7f6ba5c9d490 (unknown)
@ 0x7f6ba54bce9a start_thread
@ 0x7f6ba51ea38d (unknown)
+ /bin/true
{code}

  was:
Test failure on Ubuntu 14 configured with --disable-java --disable-python 
--enable-ssl --enable-libevent --enable-optimize --enable-network-isolation

Commit: 9b78b301469667b5a44f0a351de5f3a71edae499

[ RUN  ] SlaveRecoveryTest/0.NonCheckpointingFramework
I0815 06:41:47.413602 17091 exec.cpp:133] Version: 0.24.0
I0815 06:41:47.416780 17111 exec.cpp:207] Executor registered on slave 
20150815-064146-544909504-51064-12195-S0
Registered executor on slave1-ubuntu12
Starting task 044bd49e-2f38-4671-802a-ac6524d61a85
Forked command at 17114
sh -c 'sleep 1000'
[err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 
0x2, fd: 21, flags: 0x80)
*** Aborted at 1439646107 (unix time) try "date -d @1439646107" if you are 
using GNU date ***
PC: @ 0x7f6ba512d0d5 (unknown)
*** SIGABRT (@0x2fa3) received by PID 12195 (TID 0x7f6b9d613700) from PID 
12195; stack trace: ***
@ 0x7f6ba54c4cb0 (unknown)
@ 0x7f6ba512d0d5 (unknown)
@ 0x7f6ba513083b (unknown)
@ 0x7f6ba448e1ba (unknown)
@ 0x7f6ba448e52b (unknown)
@ 0x7f6ba447dcc9 (unknown)
@   0x4c4033 process::internal::run<>()
@ 0x7f6ba72642ab process::Future<>::discard()
@ 0x7f6ba72643be process::internal::discard<>()
@ 0x7f6ba7262298 
_ZNSt17_Function_handlerIFvvEZNK7process6FutureImE9onDiscardISt5_BindIFPFvNS1_10WeakFutureIsEEES7_RKS3_OT_EUlvE_E9_M_invokeERKSt9_Any_data
@   0x4c4033 process::internal::run<>()
@   0x6fa0cb process::Future<>::discard()
@ 0x7f6ba6fb5736 cgroups::event::Listener::finalize()
@ 0x7f6ba728fb11 process::ProcessManager::resume()
@ 0x7f6ba728fe0f process::internal::schedule()
@ 0x7f6ba5c9d490 (unknown)
@ 0x7f6ba54bce9a start_thread
@ 0x7f6ba51ea38d (unknown)
+ /bin/true


> SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.
> ---
>
> Key: MESOS-3271
> URL: https://issues.apache.org/jira/browse/MESOS-3271
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Paul Brett
> Attachments: build.txt
>
>
> Test failure on Ubuntu 14 configured with {{--disable-java --disable-python 
> --enable-ssl --enable-libevent --enable-optimize --enable-network-isolation}}
> Commit: {{9b78b301469667b5a44f0a351de5f3a71edae499}}
> {code}
> [ RUN  ] SlaveRecoveryTest/0.NonCheckpointingFramework
> I0815 06:41:47.413602 17091 exec.cpp:133] Version: 0.24.0
> I0815 06:41:47.416780 17111 exec.cpp:207] Executor registered on slave 
> 20150815-064146-544909504-51064-12195-S0
> Registered executor on slave1-ubuntu12
> Starting task 044bd49e-2f38-4671-802a-ac6524d61a85
> Forked command at 17114
> sh -c 'sleep 1000'
> [err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 
> 

[jira] [Updated] (MESOS-3709) Modulize the containerizer interface.

2015-10-13 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3709:

Shepherd: Till Toenshoff

> Modulize the containerizer interface.
> -
>
> Key: MESOS-3709
> URL: https://issues.apache.org/jira/browse/MESOS-3709
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Benjamin Bannier
>
> So that people can implement their own containerizer as a module. That's more 
> efficient than having an external containerizer and shell out. The module 
> system also provides versioning support, this is definitely better than 
> unversioned external containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3271) SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.

2015-10-13 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14954826#comment-14954826
 ] 

Benjamin Bannier commented on MESOS-3271:
-

I wasn't able to reproduce this at all in a vagrant container (6 cpus, 1O GB 
ram) on an OS X host, can you provide any guidance on how to increase the 
failure rate [~pbrett]? What is the approximate failure rate you are seeing? 

> SlaveRecoveryTest/0.NonCheckpointingFramework is flaky.
> ---
>
> Key: MESOS-3271
> URL: https://issues.apache.org/jira/browse/MESOS-3271
> Project: Mesos
>  Issue Type: Bug
>  Components: slave
>Reporter: Paul Brett
> Attachments: build.txt
>
>
> Test failure on Ubuntu 14 configured with {{--disable-java --disable-python 
> --enable-ssl --enable-libevent --enable-optimize --enable-network-isolation}}
> Commit: {{9b78b301469667b5a44f0a351de5f3a71edae499}}
> {code}
> [ RUN  ] SlaveRecoveryTest/0.NonCheckpointingFramework
> I0815 06:41:47.413602 17091 exec.cpp:133] Version: 0.24.0
> I0815 06:41:47.416780 17111 exec.cpp:207] Executor registered on slave 
> 20150815-064146-544909504-51064-12195-S0
> Registered executor on slave1-ubuntu12
> Starting task 044bd49e-2f38-4671-802a-ac6524d61a85
> Forked command at 17114
> sh -c 'sleep 1000'
> [err] event_active called on a non-initialized event 0x7f6b740232d0 (events: 
> 0x2, fd: 21, flags: 0x80)
> *** Aborted at 1439646107 (unix time) try "date -d @1439646107" if you are 
> using GNU date ***
> PC: @ 0x7f6ba512d0d5 (unknown)
> *** SIGABRT (@0x2fa3) received by PID 12195 (TID 0x7f6b9d613700) from PID 
> 12195; stack trace: ***
> @ 0x7f6ba54c4cb0 (unknown)
> @ 0x7f6ba512d0d5 (unknown)
> @ 0x7f6ba513083b (unknown)
> @ 0x7f6ba448e1ba (unknown)
> @ 0x7f6ba448e52b (unknown)
> @ 0x7f6ba447dcc9 (unknown)
> @   0x4c4033 process::internal::run<>()
> @ 0x7f6ba72642ab process::Future<>::discard()
> @ 0x7f6ba72643be process::internal::discard<>()
> @ 0x7f6ba7262298 
> _ZNSt17_Function_handlerIFvvEZNK7process6FutureImE9onDiscardISt5_BindIFPFvNS1_10WeakFutureIsEEES7_RKS3_OT_EUlvE_E9_M_invokeERKSt9_Any_data
> @   0x4c4033 process::internal::run<>()
> @   0x6fa0cb process::Future<>::discard()
> @ 0x7f6ba6fb5736 cgroups::event::Listener::finalize()
> @ 0x7f6ba728fb11 process::ProcessManager::resume()
> @ 0x7f6ba728fe0f process::internal::schedule()
> @ 0x7f6ba5c9d490 (unknown)
> @ 0x7f6ba54bce9a start_thread
> @ 0x7f6ba51ea38d (unknown)
> + /bin/true
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.

2015-11-18 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011268#comment-15011268
 ] 

Benjamin Bannier commented on MESOS-3581:
-

After discussion with [~benjaminhindman] decided that using C++-style comments 
{{// ..}} instead of C-style block comments {{/** .. */}} would reduce the 
number of used comment styles mostly to C++-style comments and Doxygen comments.

Also added a linter to check license headers in C++, C, and protobuf files, 
review is here https://reviews.apache.org/r/40445/.

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>  Labels: mesosphere
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3799) Compilation warning with Ubuntu wily: auto_ptr is deprecated

2015-11-14 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005490#comment-15005490
 ] 

Benjamin Bannier commented on MESOS-3799:
-

[~neilc], there is also:

* disable warnings from Boost headers, e.g. with by replacing the responsible 
`-I` with `-isystem`.

> Compilation warning with Ubuntu wily: auto_ptr is deprecated
> 
>
> Key: MESOS-3799
> URL: https://issues.apache.org/jira/browse/MESOS-3799
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> Variants of this message are printed many times during compilation (Wily on 
> AMD64):
> {noformat}
>   CXX  libprocess_la-pid.lo
>   CXX  libprocess_la-poll_socket.lo
>   CXX  libprocess_la-profiler.lo
> In file included from 
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/hashmap.hpp:23:0,
>  from 
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/stringify.hpp:26,
>  from 
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp:59,
>  from 
> /mesos/3rdparty/libprocess/include/process/address.hpp:34,
>  from /mesos/3rdparty/libprocess/include/process/pid.hpp:26,
>  from /mesos/3rdparty/libprocess/src/pid.cpp:28:
> 3rdparty/boost-1.53.0/boost/get_pointer.hpp:27:40: warning: ‘template 
> class std::auto_ptr’ is deprecated [-Wdeprecated-declarations]
>  template T * get_pointer(std::auto_ptr const& p)
> ^
> In file included from /usr/include/c++/5/memory:81:0,
>  from 
> 3rdparty/boost-1.53.0/boost/functional/hash/extensions.hpp:32,
>  from 
> 3rdparty/boost-1.53.0/boost/functional/hash/hash.hpp:529,
>  from 3rdparty/boost-1.53.0/boost/functional/hash.hpp:6,
>  from /mesos/3rdparty/libprocess/include/process/pid.hpp:24,
>  from /mesos/3rdparty/libprocess/src/pid.cpp:28:
> /usr/include/c++/5/bits/unique_ptr.h:49:28: note: declared here
>template class auto_ptr;
> ^
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3799) Compilation warning with Ubuntu wily: auto_ptr is deprecated

2015-11-14 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005490#comment-15005490
 ] 

Benjamin Bannier edited comment on MESOS-3799 at 11/14/15 5:16 PM:
---

[~neilc], there is also:

* disable warnings from Boost headers, e.g. with by replacing the responsible 
{{-I}} with {{-isystem}}.


was (Author: bbannier):
[~neilc], there is also:

* disable warnings from Boost headers, e.g. with by replacing the responsible 
`-I` with `-isystem`.

> Compilation warning with Ubuntu wily: auto_ptr is deprecated
> 
>
> Key: MESOS-3799
> URL: https://issues.apache.org/jira/browse/MESOS-3799
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> Variants of this message are printed many times during compilation (Wily on 
> AMD64):
> {noformat}
>   CXX  libprocess_la-pid.lo
>   CXX  libprocess_la-poll_socket.lo
>   CXX  libprocess_la-profiler.lo
> In file included from 
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/hashmap.hpp:23:0,
>  from 
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/stringify.hpp:26,
>  from 
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp:59,
>  from 
> /mesos/3rdparty/libprocess/include/process/address.hpp:34,
>  from /mesos/3rdparty/libprocess/include/process/pid.hpp:26,
>  from /mesos/3rdparty/libprocess/src/pid.cpp:28:
> 3rdparty/boost-1.53.0/boost/get_pointer.hpp:27:40: warning: ‘template 
> class std::auto_ptr’ is deprecated [-Wdeprecated-declarations]
>  template T * get_pointer(std::auto_ptr const& p)
> ^
> In file included from /usr/include/c++/5/memory:81:0,
>  from 
> 3rdparty/boost-1.53.0/boost/functional/hash/extensions.hpp:32,
>  from 
> 3rdparty/boost-1.53.0/boost/functional/hash/hash.hpp:529,
>  from 3rdparty/boost-1.53.0/boost/functional/hash.hpp:6,
>  from /mesos/3rdparty/libprocess/include/process/pid.hpp:24,
>  from /mesos/3rdparty/libprocess/src/pid.cpp:28:
> /usr/include/c++/5/bits/unique_ptr.h:49:28: note: declared here
>template class auto_ptr;
> ^
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3581) Adjust license headers comment format for doxygen style

2015-10-05 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-3581:
---

 Summary: Adjust license headers comment format for doxygen style
 Key: MESOS-3581
 URL: https://issues.apache.org/jira/browse/MESOS-3581
 Project: Mesos
  Issue Type: Documentation
  Components: documentation
Affects Versions: 0.24.1
Reporter: Benjamin Bannier
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3581) Adjust license headers comment format for doxygen style

2015-10-05 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3581:

Description: 
Currently license headers are commented in something resembling Javadoc style,

{code}
/**
* Licensed ...
{code}

Since we use Javadoc-style comment blocks for doxygen documentation all license 
headers appear in the generated documentation, potentially and likely hiding 
the actual documentation.

Using {{/*}} to start the comment blocks would be enough to hide them from 
doxygen, but would likely also result in a largish (though mostly 
uninteresting) patch.

> Adjust license headers comment format for doxygen style
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3581) Adjust license headers comment format for doxygen style

2015-10-05 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943093#comment-14943093
 ] 

Benjamin Bannier commented on MESOS-3581:
-

I was just about to add that [~haosd...@gmail.com].

> Adjust license headers comment format for doxygen style
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3581) Adjust license headers comment format for doxygen style

2015-10-05 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3581:

Shepherd: Bernd Mathiske

> Adjust license headers comment format for doxygen style
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3581) Adjust license headers comment format for doxygen style

2015-10-05 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-3581:
---

Assignee: Benjamin Bannier

> Adjust license headers comment format for doxygen style
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3581) License headers show up all over doxygen documentation.

2015-10-05 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3581:

Summary: License headers show up all over doxygen documentation.  (was: 
Adjust license headers comment format for doxygen style)

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3551) Replace use of strerror with thread-safe alternatives strerror_r / strerror_l.

2015-10-05 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-3551:
---

Assignee: Benjamin Bannier

> Replace use of strerror with thread-safe alternatives strerror_r / strerror_l.
> --
>
> Key: MESOS-3551
> URL: https://issues.apache.org/jira/browse/MESOS-3551
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, stout
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: newbie, tech-debt
>
> {{strerror()}} is not required to be thread safe by POSIX and is listed as 
> unsafe on Linux:
> http://pubs.opengroup.org/onlinepubs/9699919799/
> http://man7.org/linux/man-pages/man3/strerror.3.html
> I don't believe we've seen any issues reported due to this. We should replace 
> occurrences of strerror accordingly, possibly offering a wrapper in stout to 
> simplify callsites.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3581) License headers show up all over doxygen documentation.

2015-10-05 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943234#comment-14943234
 ] 

Benjamin Bannier commented on MESOS-3581:
-

A possible solution would be to use an `INPUT_FILTER` in the `Doxyfile` to 
preprocess the opening part of the license header comments; this would have 
minimal impact on the source tree, but should probably be implemented in some 
way that minimalizes external deps.

Another solution would be to simply fix the openers like shown in the 
description; the possibility of this creating merge problems is small as only a 
single line at the beginning of files needs to be touched. It would however 
somewhat pollute the history of 800+ files with non-functional changes.

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3581) License headers show up all over doxygen documentation.

2015-10-05 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943234#comment-14943234
 ] 

Benjamin Bannier edited comment on MESOS-3581 at 10/5/15 11:09 AM:
---

A possible solution would be to use an {{INPUT_FILTER}} in the {{Doxyfile}} to 
preprocess the opening part of the license header comments; this would have 
minimal impact on the source tree, but should probably be implemented in some 
way that minimalizes external deps.

Another solution would be to simply fix the openers like shown in the 
description; the possibility of this creating merge problems is small as only a 
single line at the beginning of files needs to be touched. It would however 
somewhat pollute the history of 800+ files with non-functional changes.


was (Author: bbannier):
A possible solution would be to use an `INPUT_FILTER` in the `Doxyfile` to 
preprocess the opening part of the license header comments; this would have 
minimal impact on the source tree, but should probably be implemented in some 
way that minimalizes external deps.

Another solution would be to simply fix the openers like shown in the 
description; the possibility of this creating merge problems is small as only a 
single line at the beginning of files needs to be touched. It would however 
somewhat pollute the history of 800+ files with non-functional changes.

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3551) Replace use of strerror with thread-safe alternatives strerror_r / strerror_l.

2015-10-05 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3551:

Shepherd: Till Toenshoff

> Replace use of strerror with thread-safe alternatives strerror_r / strerror_l.
> --
>
> Key: MESOS-3551
> URL: https://issues.apache.org/jira/browse/MESOS-3551
> Project: Mesos
>  Issue Type: Bug
>  Components: libprocess, stout
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: newbie, tech-debt
>
> {{strerror()}} is not required to be thread safe by POSIX and is listed as 
> unsafe on Linux:
> http://pubs.opengroup.org/onlinepubs/9699919799/
> http://man7.org/linux/man-pages/man3/strerror.3.html
> I don't believe we've seen any issues reported due to this. We should replace 
> occurrences of strerror accordingly, possibly offering a wrapper in stout to 
> simplify callsites.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4040) Clang's address sanitizer reports heap-use-after-free in ExamplesTest.EventCallFramework

2015-12-02 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4040:

Attachment: asan.log

> Clang's address sanitizer reports heap-use-after-free in 
> ExamplesTest.EventCallFramework
> 
>
> Key: MESOS-4040
> URL: https://issues.apache.org/jira/browse/MESOS-4040
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
>Reporter: Benjamin Bannier
> Attachments: asan.log
>
>
> For a libevent- and ssl-enabled debug build under ubuntu14.04 with clang-3.6 
> and {{CXXFLAGS=-fsanitize=address}} address sanitizer reports a 
> use-after-free from {{ExamplesTest.EventCallFramework}} (log attached).
> If this is not a false positive in could lead to all kinds of issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4040) Clang's address sanitizer reports heap-use-after-free in ExamplesTest.EventCallFramework

2015-12-02 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4040:
---

 Summary: Clang's address sanitizer reports heap-use-after-free in 
ExamplesTest.EventCallFramework
 Key: MESOS-4040
 URL: https://issues.apache.org/jira/browse/MESOS-4040
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.26.0
Reporter: Benjamin Bannier
 Attachments: asan.log

For a libevent- and ssl-enabled debug build under ubuntu14.04 with clang-3.6 
and {{CXXFLAGS=-fsanitize=address}} address sanitizer reports a use-after-free 
from {{ExamplesTest.EventCallFramework}} (log attached).

If this is not a false positive in could lead to all kinds of issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4055) SSL-related test fail reliably in optimized build

2015-12-03 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4055:

Component/s: libprocess

> SSL-related test fail reliably in optimized build
> -
>
> Key: MESOS-4055
> URL: https://issues.apache.org/jira/browse/MESOS-4055
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Affects Versions: 0.26.0
>Reporter: Benjamin Bannier
>
> Under ubuntu14.04 building {{5c0e4dc}} using {{gcc-4.8.4-2ubuntu1~14.04}} with
> {code}
> % ../configure --enable-ssl --enable-libevent --enable-optimized
> {code}
> most SSL-related tests fail reliably with SIGSEV. The full list of failing 
> tests is
> {code}
> SSL.Disabled
> SSLTest.BasicSameProcess
> SSLTest.SSLSocket
> SSLTest.NonSSLSocket
> SSLTest.NoVerifyBadCA
> SSLTest.RequireBadCA
> SSLTest.VerifyBadCA
> SSLTest.VerifyCertificate
> SSLTest.RequireCertificate
> SSLTest.ProtocolMismatch
> SSLTest.ValidDowngrade
> SSLtest.NoValidDowngrade
> SSLTest.NoValidDowngrade
> SSLTest.ValidDowngradeEachProtocol
> SSLTest.NoValidDowngradeEachProtocol
> SSLTest.PeerAddress
> SSLTest.HTTPSGet
> SSLTest.HTTPSPost
> {code}
> The test fail with {{SIGSEV}} or similarly worrisome reasons, e.g.,
> {code}
> [ RUN  ] SSLTest.SSLSocket
> *** Aborted at 1449135851 (unix time) try "date -d @1449135851" if you are 
> using GNU date ***
> PC: @   0x4418f4 Try<>::~Try()
> *** SIGSEGV (@0x5acce6) received by PID 29976 (TID 0x7fe601eb5780) from PID 
> 5950694; stack trace: ***
> @ 0x7fe601a9a340 (unknown)
> @   0x4418f4 Try<>::~Try()
> @   0x5a843c SSLTest::setup_server()
> @   0x595162 SSLTest_SSLSocket_Test::TestBody()
> @   0x5f2428 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @   0x5ec880 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @   0x5cd0ff testing::Test::Run()
> @   0x5cd882 testing::TestInfo::Run()
> @   0x5cdec8 testing::TestCase::Run()
> @   0x5d4610 testing::internal::UnitTestImpl::RunAllTests()
> @   0x5f3203 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @   0x5ed5f4 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @   0x5d33ac testing::UnitTest::Run()
> @   0x40fd70 main
> @ 0x7fe600024ec5 (unknown)
> @   0x413eb1 (unknown)
> Segmentation fault
> {code}
> Even though we do not typically release optimized builds we should still look 
> into these as optimizations tend to expose fragile constructs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4057) RegistryClientTest suite fails reliably in optimized build

2015-12-03 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4057:
---

 Summary: RegistryClientTest suite fails reliably in optimized build
 Key: MESOS-4057
 URL: https://issues.apache.org/jira/browse/MESOS-4057
 Project: Mesos
  Issue Type: Bug
  Components: test
Affects Versions: 0.26.0
Reporter: Benjamin Bannier


Under ubuntu14.04 building 5c0e4dc using gcc-4.8.4-2ubuntu1~14.04 with
{code}
% ../configure --enable-ssl --enable-libevent --enable-optimized
{code}

all six tests from the {{RegistryClientTest}} suite fail with SIGSEV. The full 
list of failing tests is
{code}
RegistryClientTest.SimpleGetToken
RegistryClientTest.BadTokenResponse
RegistryClientTest.SimpleGetManifest
RegistryClientTest.SimpleGetBlob
RegistryClientTest.BadRequest
RegistryClientTest.SimpleRegistryPuller
{code}

The failure messages are similar, e.g..
{code}
[ RUN  ] RegistryClientTest.BadTokenResponse
*** Aborted at 1449146245 (unix time) try "date -d @1449146245" if you are 
using GNU date ***
PC: @ 0x7f1c5c5ba6ad (unknown)
*** SIGSEGV (@0xa24888) received by PID 21542 (TID 0x7f1c61f24800) from PID 
10635400; stack trace: ***
@ 0x7f1c5be35340 (unknown)
@ 0x7f1c5c5ba6ad (unknown)
@ 0x7f1c5c61932f (unknown)
@  0x14067aa Try<>::~Try()
@  0x1406ab0 SSLTest::setup_server()
@  0x140869b mesos::internal::tests::RegistryClientTest::getServer()
@  0x13f315a 
mesos::internal::tests::RegistryClientTest_BadTokenResponse_Test::TestBody()
@  0x14ec3b0 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@  0x14e728a 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@  0x14c8993 testing::Test::Run()
@  0x14c9116 testing::TestInfo::Run()
@  0x14c975c testing::TestCase::Run()
@  0x14cfea4 testing::internal::UnitTestImpl::RunAllTests()
@  0x14ecfd5 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@  0x14e7e00 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@  0x14cec40 testing::UnitTest::Run()
@   0xd045c4 RUN_ALL_TESTS()
@   0xd041b1 main
@ 0x7f1c5ba81ec5 (unknown)
@   0x930bb9 (unknown)
Segmentation fault
{code}

Even though we do not typically release optimized builds we should still look 
into these as optimizations tend to expose fragile constructs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4055) SSL-related test fail reliably in optimized build

2015-12-03 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4055:
---

 Summary: SSL-related test fail reliably in optimized build
 Key: MESOS-4055
 URL: https://issues.apache.org/jira/browse/MESOS-4055
 Project: Mesos
  Issue Type: Task
  Components: test
Affects Versions: 0.26.0
Reporter: Benjamin Bannier


Under ubuntu14.04 building {{5c0e4dc}} using {{gcc-4.8.4-2ubuntu1~14.04}} with
{code}
% ../configure --enable-ssl --enable-libevent --enable-optimized
{code}

most SSL-related tests fail reliably with SIGSEV. The full list of failing 
tests is
{code}
SSL.Disabled
SSLTest.BasicSameProcess
SSLTest.SSLSocket
SSLTest.NonSSLSocket
SSLTest.NoVerifyBadCA
SSLTest.RequireBadCA
SSLTest.VerifyBadCA
SSLTest.VerifyCertificate
SSLTest.RequireCertificate
SSLTest.ProtocolMismatch
SSLTest.ValidDowngrade
SSLtest.NoValidDowngrade
SSLTest.NoValidDowngrade
SSLTest.ValidDowngradeEachProtocol
SSLTest.NoValidDowngradeEachProtocol
SSLTest.PeerAddress
SSLTest.HTTPSGet
SSLTest.HTTPSPost
{code}

The test fail with {{SIGSEV}} or similarly worrisome reasons, e.g.,
{code}
[ RUN  ] SSLTest.SSLSocket
*** Aborted at 1449135851 (unix time) try "date -d @1449135851" if you are 
using GNU date ***
PC: @   0x4418f4 Try<>::~Try()
*** SIGSEGV (@0x5acce6) received by PID 29976 (TID 0x7fe601eb5780) from PID 
5950694; stack trace: ***
@ 0x7fe601a9a340 (unknown)
@   0x4418f4 Try<>::~Try()
@   0x5a843c SSLTest::setup_server()
@   0x595162 SSLTest_SSLSocket_Test::TestBody()
@   0x5f2428 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@   0x5ec880 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@   0x5cd0ff testing::Test::Run()
@   0x5cd882 testing::TestInfo::Run()
@   0x5cdec8 testing::TestCase::Run()
@   0x5d4610 testing::internal::UnitTestImpl::RunAllTests()
@   0x5f3203 
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@   0x5ed5f4 
testing::internal::HandleExceptionsInMethodIfSupported<>()
@   0x5d33ac testing::UnitTest::Run()
@   0x40fd70 main
@ 0x7fe600024ec5 (unknown)
@   0x413eb1 (unknown)
Segmentation fault
{code}

Even though we do not typically release optimized builds we should still look 
into these as optimizations tend to expose fragile constructs.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4054) Evaluate python-compatibility of python bindings and utilities

2015-12-03 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4054:
---

 Summary: Evaluate python-compatibility of python bindings and 
utilities
 Key: MESOS-4054
 URL: https://issues.apache.org/jira/browse/MESOS-4054
 Project: Mesos
  Issue Type: Task
Reporter: Benjamin Bannier
Priority: Minor


In many places our python tools do not enforce a particular python version but 
call the first {{python}} from {{env}} (likely python2).

We likely rely on some python2 constructs which can be easily migrated to a 
python3 equivalent (e.g., {{print}} vs. {{print_function}}). Let's evaluate 
what else we use and how much effort it would be to support both python2 and 
python3. If we can collect information on what minimal python version is 
required right we should also collect that to see what is available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4057) RegistryClientTest suite fails reliably in optimized build

2015-12-04 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041394#comment-15041394
 ] 

Benjamin Bannier commented on MESOS-4057:
-

Triggered by the discussion in MESOS-4055 I tried to reproduce this, and cannot 
reproduce this myself. I guess we can resolve this as {{CANNOT_REPRO}}.

I tried to compile & link libs and tests with different optimization seeting, 
but that wasn't the cause.

> RegistryClientTest suite fails reliably in optimized build
> --
>
> Key: MESOS-4057
> URL: https://issues.apache.org/jira/browse/MESOS-4057
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
>Reporter: Benjamin Bannier
>
> Under ubuntu14.04 building 5c0e4dc using gcc-4.8.4-2ubuntu1~14.04 with
> {code}
> % ../configure --enable-ssl --enable-libevent --enable-optimized
> {code}
> all six tests from the {{RegistryClientTest}} suite fail with SIGSEV. The 
> full list of failing tests is
> {code}
> RegistryClientTest.SimpleGetToken
> RegistryClientTest.BadTokenResponse
> RegistryClientTest.SimpleGetManifest
> RegistryClientTest.SimpleGetBlob
> RegistryClientTest.BadRequest
> RegistryClientTest.SimpleRegistryPuller
> {code}
> The failure messages are similar, e.g..
> {code}
> [ RUN  ] RegistryClientTest.BadTokenResponse
> *** Aborted at 1449146245 (unix time) try "date -d @1449146245" if you are 
> using GNU date ***
> PC: @ 0x7f1c5c5ba6ad (unknown)
> *** SIGSEGV (@0xa24888) received by PID 21542 (TID 0x7f1c61f24800) from PID 
> 10635400; stack trace: ***
> @ 0x7f1c5be35340 (unknown)
> @ 0x7f1c5c5ba6ad (unknown)
> @ 0x7f1c5c61932f (unknown)
> @  0x14067aa Try<>::~Try()
> @  0x1406ab0 SSLTest::setup_server()
> @  0x140869b 
> mesos::internal::tests::RegistryClientTest::getServer()
> @  0x13f315a 
> mesos::internal::tests::RegistryClientTest_BadTokenResponse_Test::TestBody()
> @  0x14ec3b0 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x14e728a 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14c8993 testing::Test::Run()
> @  0x14c9116 testing::TestInfo::Run()
> @  0x14c975c testing::TestCase::Run()
> @  0x14cfea4 testing::internal::UnitTestImpl::RunAllTests()
> @  0x14ecfd5 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @  0x14e7e00 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @  0x14cec40 testing::UnitTest::Run()
> @   0xd045c4 RUN_ALL_TESTS()
> @   0xd041b1 main
> @ 0x7f1c5ba81ec5 (unknown)
> @   0x930bb9 (unknown)
> Segmentation fault
> {code}
> Even though we do not typically release optimized builds we should still look 
> into these as optimizations tend to expose fragile constructs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4055) SSL-related test fail reliably in optimized build

2015-12-04 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15041392#comment-15041392
 ] 

Benjamin Bannier commented on MESOS-4055:
-

I cannot reproduce this myself, I guess we can resolve this as {{CANNOT_REPRO}}.

I tried to compile & link libs and tests with different optimization seeting, 
but that wasn't the cause.

> SSL-related test fail reliably in optimized build
> -
>
> Key: MESOS-4055
> URL: https://issues.apache.org/jira/browse/MESOS-4055
> Project: Mesos
>  Issue Type: Task
>  Components: libprocess, test
>Affects Versions: 0.26.0
>Reporter: Benjamin Bannier
>Assignee: Joseph Wu
>
> Under ubuntu14.04 building {{5c0e4dc}} using {{gcc-4.8.4-2ubuntu1~14.04}} with
> {code}
> % ../configure --enable-ssl --enable-libevent --enable-optimize
> {code}
> most SSL-related tests fail reliably with SIGSEV. The full list of failing 
> tests is
> {code}
> SSL.Disabled
> SSLTest.BasicSameProcess
> SSLTest.SSLSocket
> SSLTest.NonSSLSocket
> SSLTest.NoVerifyBadCA
> SSLTest.RequireBadCA
> SSLTest.VerifyBadCA
> SSLTest.VerifyCertificate
> SSLTest.RequireCertificate
> SSLTest.ProtocolMismatch
> SSLTest.ValidDowngrade
> SSLtest.NoValidDowngrade
> SSLTest.NoValidDowngrade
> SSLTest.ValidDowngradeEachProtocol
> SSLTest.NoValidDowngradeEachProtocol
> SSLTest.PeerAddress
> SSLTest.HTTPSGet
> SSLTest.HTTPSPost
> {code}
> The test fail with {{SIGSEV}} or similarly worrisome reasons, e.g.,
> {code}
> [ RUN  ] SSLTest.SSLSocket
> *** Aborted at 1449135851 (unix time) try "date -d @1449135851" if you are 
> using GNU date ***
> PC: @   0x4418f4 Try<>::~Try()
> *** SIGSEGV (@0x5acce6) received by PID 29976 (TID 0x7fe601eb5780) from PID 
> 5950694; stack trace: ***
> @ 0x7fe601a9a340 (unknown)
> @   0x4418f4 Try<>::~Try()
> @   0x5a843c SSLTest::setup_server()
> @   0x595162 SSLTest_SSLSocket_Test::TestBody()
> @   0x5f2428 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @   0x5ec880 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @   0x5cd0ff testing::Test::Run()
> @   0x5cd882 testing::TestInfo::Run()
> @   0x5cdec8 testing::TestCase::Run()
> @   0x5d4610 testing::internal::UnitTestImpl::RunAllTests()
> @   0x5f3203 
> testing::internal::HandleSehExceptionsInMethodIfSupported<>()
> @   0x5ed5f4 
> testing::internal::HandleExceptionsInMethodIfSupported<>()
> @   0x5d33ac testing::UnitTest::Run()
> @   0x40fd70 main
> @ 0x7fe600024ec5 (unknown)
> @   0x413eb1 (unknown)
> Segmentation fault
> {code}
> Even though we do not typically release optimized builds we should still look 
> into these as optimizations tend to expose fragile constructs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4072) The lt-mesos-master will coredump in some situation.

2015-12-07 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044831#comment-15044831
 ] 

Benjamin Bannier commented on MESOS-4072:
-

Note that this is an intentional hard exit: you specified a {{work_dir}} which 
is not writable, so there is no way we can continue after emitting an error 
message (which we did).

However, we do not need to show a stack trace or dump core here (i.e. replace 
the use of {{CHECK}} with something like {{EXIT}}).

> The lt-mesos-master will coredump in some situation.
> 
>
> Key: MESOS-4072
> URL: https://issues.apache.org/jira/browse/MESOS-4072
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0
>Reporter: Nan Xiao
>
>  I find  lt-mesos-master  will coredump when following conditions are met:  
> (1) The user doesn't have write permission of /var/lib/mesos directory:
> nan@ubuntu:~/mesos-0.25.0/build$ ls -lt /var/lib/
> total 176
> dr-xr-xr-x 2 rootroot4096 Dec  7 03:08 mesos
> ..
> (2) the /var/lib/mesos is an empty folder:
> nan@ubuntu:~/mesos-0.25.0/build$ ls -lt /var/lib/mesos/
> total 0
> Executing following command will core dump:
> nan@ubuntu:~/mesos-0.25.0/build$ ./bin/mesos-master.sh --ip=16.187.250.141 
> --work_dir=/var/lib/mesos
> I1207 03:18:36.431015 22951 main.cpp:229] Build: 2015-12-07 00:11:18 by nan
> I1207 03:18:36.431154 22951 main.cpp:231] Version: 0.25.0
> I1207 03:18:36.431388 22951 main.cpp:252] Using 'HierarchicalDRF' allocator
> F1207 03:18:36.431807 22951 replica.cpp:724] CHECK_SOME(state): IO error: 
> /var/lib/mesos/replicated_log/LOCK: No such file or directory Failed to 
> recover the log
> *** Check failure stack trace: ***
> @ 0x7f076bc208ca  google::LogMessage::Fail()
> @ 0x7f076bc20816  google::LogMessage::SendToLog()
> @ 0x7f076bc20218  google::LogMessage::Flush()
> @ 0x7f076bc2312c  google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f076adf8f30  _CheckFatal::~_CheckFatal()
> @ 0x7f076baa4939  mesos::internal::log::ReplicaProcess::restore()
> @ 0x7f076baa0f8c  
> mesos::internal::log::ReplicaProcess::ReplicaProcess()
> @ 0x7f076baa4c95  mesos::internal::log::Replica::Replica()
> @ 0x7f076b9cf819  mesos::internal::log::LogProcess::LogProcess()
> @ 0x7f076b9d576c  mesos::internal::log::Log::Log()
> @   0x46d21f  main
> @ 0x7f0766f69ec5  (unknown)
> @   0x46b979  (unknown)
> Aborted (core dumped)
> Use gdb to analyze it:
> nan@ubuntu:~/mesos-0.25.0/build$ gdb 
> /home/nan/mesos-0.25.0/build/src/.libs/lt-mesos-master core
> GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
> Copyright (C) 2014 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from 
> /home/nan/mesos-0.25.0/build/src/.libs/lt-mesos-master...done.
> [New LWP 22065]
> [New LWP 22087]
> [New LWP 22085]
> [New LWP 22089]
> [New LWP 22084]
> [New LWP 22086]
> [New LWP 22091]
> [New LWP 22088]
> [New LWP 22092]
> [New LWP 22090]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Core was generated by `/home/nan/mesos-0.25.0/build/src/.libs/lt-mesos-master 
> --ip=127.0.0.1 --work_di'.
> Program terminated with signal SIGABRT, Aborted.
> #0  0x7fe917810cc9 in __GI_raise (sig=sig@entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> 56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> Traceback (most recent call last):
>   File 
> "/usr/share/gdb/auto-load/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19-gdb.py",
>  line 63, in 
> from libstdcxx.v6.printers import register_libstdcxx_printers
> ImportError: No module named 'libstdcxx'
> (gdb) bt
> #0  0x7fe917810cc9 in __GI_raise (sig=sig@entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x7fe9178140d8 in __GI_abort () at abort.c:89
> #2  0x7fe91c4b8c1b in DumpStackTraceAndExit () from 
> /home/nan/mesos-0.25.0/build/src/.libs/libmesos-0.25.0.so
> #3  0x7fe91c4b28ca in google::LogMessage::Fail () from 
> /home/nan/mesos-0.25.0/build/src/.libs/libmesos-0.25.0.so
> #4  0x7fe91c4b2816 in 

[jira] [Commented] (MESOS-3799) Compilation warning with Ubuntu wily: auto_ptr is deprecated

2015-12-03 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15037891#comment-15037891
 ] 

Benjamin Bannier commented on MESOS-3799:
-

The current reviews have garnered some ship-its, any reason why this is not 
proceeding and merged?

> Compilation warning with Ubuntu wily: auto_ptr is deprecated
> 
>
> Key: MESOS-3799
> URL: https://issues.apache.org/jira/browse/MESOS-3799
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Priority: Minor
>  Labels: mesosphere
>
> Variants of this message are printed many times during compilation (Wily on 
> AMD64):
> {noformat}
>   CXX  libprocess_la-pid.lo
>   CXX  libprocess_la-poll_socket.lo
>   CXX  libprocess_la-profiler.lo
> In file included from 
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/hashmap.hpp:23:0,
>  from 
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/stringify.hpp:26,
>  from 
> /mesos/3rdparty/libprocess/3rdparty/stout/include/stout/ip.hpp:59,
>  from 
> /mesos/3rdparty/libprocess/include/process/address.hpp:34,
>  from /mesos/3rdparty/libprocess/include/process/pid.hpp:26,
>  from /mesos/3rdparty/libprocess/src/pid.cpp:28:
> 3rdparty/boost-1.53.0/boost/get_pointer.hpp:27:40: warning: ‘template 
> class std::auto_ptr’ is deprecated [-Wdeprecated-declarations]
>  template T * get_pointer(std::auto_ptr const& p)
> ^
> In file included from /usr/include/c++/5/memory:81:0,
>  from 
> 3rdparty/boost-1.53.0/boost/functional/hash/extensions.hpp:32,
>  from 
> 3rdparty/boost-1.53.0/boost/functional/hash/hash.hpp:529,
>  from 3rdparty/boost-1.53.0/boost/functional/hash.hpp:6,
>  from /mesos/3rdparty/libprocess/include/process/pid.hpp:24,
>  from /mesos/3rdparty/libprocess/src/pid.cpp:28:
> /usr/include/c++/5/bits/unique_ptr.h:49:28: note: declared here
>template class auto_ptr;
> ^
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4042) Complete LevelDBStateTest suite fails in optimized build

2015-12-02 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4042:
---

 Summary: Complete LevelDBStateTest suite fails in optimized build
 Key: MESOS-4042
 URL: https://issues.apache.org/jira/browse/MESOS-4042
 Project: Mesos
  Issue Type: Bug
Reporter: Benjamin Bannier


Building and checking {{5c0e4dc974014b0afd1f2752ff60a61c651de478}} in a 
ubuntu14.04 virtualbox with {{--enable-optimized}} in a virtualbox shared 
folder fails with
{code}
[ RUN  ] LevelDBStateTest.FetchAndStoreAndFetch
../../src/tests/state_tests.cpp:90: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndFetch (15 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndStoreAndFetch
../../src/tests/state_tests.cpp:120: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndStoreAndFetch (13 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndStoreFailAndFetch
../../src/tests/state_tests.cpp:156: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndStoreFailAndFetch (10 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndExpungeAndFetch
../../src/tests/state_tests.cpp:198: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndExpungeAndFetch (10 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndExpungeAndExpunge
../../src/tests/state_tests.cpp:233: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndExpungeAndExpunge (10 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndExpungeAndStoreAndFetch
../../src/tests/state_tests.cpp:264: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndExpungeAndStoreAndFetch (12 ms)
[ RUN  ] LevelDBStateTest.Names
../../src/tests/state_tests.cpp:304: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.Names (10 ms)
{code}

At least for me not optimized builds seem unaffected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4042) Complete LevelDBStateTest suite fails in optimized build

2015-12-02 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4042:

Description: 
Building and checking {{5c0e4dc974014b0afd1f2752ff60a61c651de478}} in a 
ubuntu14.04 virtualbox with {{--enable-optimized}} in a virtualbox shared 
folder fails with
{code}
[ RUN  ] LevelDBStateTest.FetchAndStoreAndFetch
../../src/tests/state_tests.cpp:90: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndFetch (15 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndStoreAndFetch
../../src/tests/state_tests.cpp:120: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndStoreAndFetch (13 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndStoreFailAndFetch
../../src/tests/state_tests.cpp:156: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndStoreFailAndFetch (10 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndExpungeAndFetch
../../src/tests/state_tests.cpp:198: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndExpungeAndFetch (10 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndExpungeAndExpunge
../../src/tests/state_tests.cpp:233: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndExpungeAndExpunge (10 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndExpungeAndStoreAndFetch
../../src/tests/state_tests.cpp:264: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndExpungeAndStoreAndFetch (12 ms)
[ RUN  ] LevelDBStateTest.Names
../../src/tests/state_tests.cpp:304: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.Names (10 ms)
{code}

The identical error occurs for a non-optimized build.

  was:
Building and checking {{5c0e4dc974014b0afd1f2752ff60a61c651de478}} in a 
ubuntu14.04 virtualbox with {{--enable-optimized}} in a virtualbox shared 
folder fails with
{code}
[ RUN  ] LevelDBStateTest.FetchAndStoreAndFetch
../../src/tests/state_tests.cpp:90: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndFetch (15 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndStoreAndFetch
../../src/tests/state_tests.cpp:120: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndStoreAndFetch (13 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndStoreFailAndFetch
../../src/tests/state_tests.cpp:156: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndStoreFailAndFetch (10 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndExpungeAndFetch
../../src/tests/state_tests.cpp:198: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndExpungeAndFetch (10 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndExpungeAndExpunge
../../src/tests/state_tests.cpp:233: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndExpungeAndExpunge (10 ms)
[ RUN  ] LevelDBStateTest.FetchAndStoreAndExpungeAndStoreAndFetch
../../src/tests/state_tests.cpp:264: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.FetchAndStoreAndExpungeAndStoreAndFetch (12 ms)
[ RUN  ] LevelDBStateTest.Names
../../src/tests/state_tests.cpp:304: Failure
(future1).failure(): IO error: /vagrant/mesos/build/.state/MANIFEST-01: 
Invalid argument
[  FAILED  ] LevelDBStateTest.Names (10 ms)
{code}

At least for me not optimized builds seem unaffected.


> Complete LevelDBStateTest suite fails in optimized build
> 
>
> Key: MESOS-4042
> URL: https://issues.apache.org/jira/browse/MESOS-4042
> Project: Mesos
>  Issue Type: Bug
>Reporter: Benjamin Bannier
>
> Building and checking {{5c0e4dc974014b0afd1f2752ff60a61c651de478}} in a 
> ubuntu14.04 virtualbox with {{--enable-optimized}} in a virtualbox shared 
> folder fails with
> {code}
> [ RUN  ] LevelDBStateTest.FetchAndStoreAndFetch
> ../../src/tests/state_tests.cpp:90: Failure
> 

[jira] [Comment Edited] (MESOS-4106) The health checker may fail to inform the executor to kill an unhealthy task after max_consecutive_failures.

2015-12-10 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050358#comment-15050358
 ] 

Benjamin Bannier edited comment on MESOS-4106 at 12/10/15 8:50 AM:
---

Late to the party as this already went in.

Just sleeping here to have the message out is a very weak guarantee (it does 
not guarantee that the message was actually sent). What one should probably do 
instead to make this robust is block until a state change in {{executor}} 
happens (with a timeout), e.g., observe change of state of {{taskID}} via 
querying the {{executor}}.


was (Author: bbannier):
Late to the party as this already went in.

Just {{sleep}}ing here to have the message out is a very weak guarantee (it 
does not guarantee that the message was actually sent). What one should 
probably do instead to make this robust is block until a state change in 
{{executor}} happens (with a timeout), e.g., observe change of state of 
{{taskID}} via querying the {{executor}}.

> The health checker may fail to inform the executor to kill an unhealthy task 
> after max_consecutive_failures.
> 
>
> Key: MESOS-4106
> URL: https://issues.apache.org/jira/browse/MESOS-4106
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.20.0, 0.20.1, 0.21.1, 0.21.2, 0.22.1, 0.22.2, 0.23.0, 
> 0.23.1, 0.24.0, 0.24.1, 0.25.0
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Blocker
> Fix For: 0.27.0
>
>
> This was reported by [~tan] experimenting with health checks. Many tasks were 
> launched with the following health check, taken from the container 
> stdout/stderr:
> {code}
> Launching health check process: /usr/local/libexec/mesos/mesos-health-check 
> --executor=(1)@127.0.0.1:39629 
> --health_check_json={"command":{"shell":true,"value":"false"},"consecutive_failures":1,"delay_seconds":0.0,"grace_period_seconds":1.0,"interval_seconds":1.0,"timeout_seconds":1.0}
>  --task_id=sleepy-2
> {code}
> This should have led to all tasks getting killed due to 
> {{\-\-consecutive_failures}} being set, however, only some tasks get killed, 
> while other remain running.
> It turns out that the health check binary does a {{send}} and promptly exits. 
> Unfortunately, this may lead to a message drop since libprocess may not have 
> sent this message over the socket by the time the process exits.
> We work around this in the command executor with a manual sleep, which has 
> been around since the svn days. See 
> [here|https://github.com/apache/mesos/blob/0.14.0/src/launcher/executor.cpp#L288-L290].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4106) The health checker may fail to inform the executor to kill an unhealthy task after max_consecutive_failures.

2015-12-10 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050358#comment-15050358
 ] 

Benjamin Bannier commented on MESOS-4106:
-

Late to the party as this already went in.

Just {{sleep}}ing here to have the message out is a very weak guarantee (it 
does not guarantee that the message was actually sent). What one should 
probably do instead to make this robust is block until a state change in 
{{executor}} happens (with a timeout), e.g., observe change of state of 
{{taskID}} via querying the {{executor}}.

> The health checker may fail to inform the executor to kill an unhealthy task 
> after max_consecutive_failures.
> 
>
> Key: MESOS-4106
> URL: https://issues.apache.org/jira/browse/MESOS-4106
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.20.0, 0.20.1, 0.21.1, 0.21.2, 0.22.1, 0.22.2, 0.23.0, 
> 0.23.1, 0.24.0, 0.24.1, 0.25.0
>Reporter: Benjamin Mahler
>Assignee: Benjamin Mahler
>Priority: Blocker
> Fix For: 0.27.0
>
>
> This was reported by [~tan] experimenting with health checks. Many tasks were 
> launched with the following health check, taken from the container 
> stdout/stderr:
> {code}
> Launching health check process: /usr/local/libexec/mesos/mesos-health-check 
> --executor=(1)@127.0.0.1:39629 
> --health_check_json={"command":{"shell":true,"value":"false"},"consecutive_failures":1,"delay_seconds":0.0,"grace_period_seconds":1.0,"interval_seconds":1.0,"timeout_seconds":1.0}
>  --task_id=sleepy-2
> {code}
> This should have led to all tasks getting killed due to 
> {{\-\-consecutive_failures}} being set, however, only some tasks get killed, 
> while other remain running.
> It turns out that the health check binary does a {{send}} and promptly exits. 
> Unfortunately, this may lead to a message drop since libprocess may not have 
> sent this message over the socket by the time the process exits.
> We work around this in the command executor with a manual sleep, which has 
> been around since the svn days. See 
> [here|https://github.com/apache/mesos/blob/0.14.0/src/launcher/executor.cpp#L288-L290].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-12-18 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063883#comment-15063883
 ] 

Benjamin Bannier commented on MESOS-2857:
-

[~vi...@twitter.com], we still cannot reproduce what you reported above, and 
since the code triggering this is in effect sequential, it still seems likely 
to be triggered by something independent of our code.

Could you please try to reproduce this on the affected machine with verbose 
libprocess logging, or otherwise close this ticket?

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.607466 24594 

[jira] [Commented] (MESOS-4129) Ignore some files from eclipse

2015-12-11 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052473#comment-15052473
 ] 

Benjamin Bannier commented on MESOS-4129:
-

Just wondering if we do plan to add e.g., backup files or project configs from 
vim, emacs and IDE/editor X as well? Right now {{/.gitignore-template}} only 
deals with files shared by all users (OK, assuming everybody is using autotools 
to build), and this addition would deviate from that.

I personally have been happy enough with {{.git/info/exclude}} for my personal 
blacklist.

> Ignore some files from eclipse
> --
>
> Key: MESOS-4129
> URL: https://issues.apache.org/jira/browse/MESOS-4129
> Project: Mesos
>  Issue Type: Bug
>Reporter: Guangya Liu
>Assignee: Guangya Liu
>
> When using eclipse to edit mesos code and the "git status"
> command always show some eclipse system files, it is better put those
> files to gitignore so that "git status" will not show those
> files, the developer can simply use "git add ." to add all modified files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-12-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-2857:

Comment: was deleted

(was: Comparing with the original log in this report, this appears to be a 
different issue.

>From the log it appears as if everything happened as expected, only that the 
>test ran into our default timeout when waiting for a status update; without 
>verbose libprocess logs I am tempted to attribute this issue to very high 
>system load.)

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.607466 24594 

[jira] [Updated] (MESOS-4151) GMock warning in SlaveTest.ContainerizerUsageFailure

2015-12-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4151:

Sprint: Mesosphere Sprint 24

> GMock warning in SlaveTest.ContainerizerUsageFailure
> 
>
> Key: MESOS-4151
> URL: https://issues.apache.org/jira/browse/MESOS-4151
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Benjamin Bannier
>  Labels: mesosphere, tech-debt
> Attachments: gmock_warning_containerizer.txt
>
>
> {noformat}
> [ RUN  ] SlaveTest.ContainerizerUsageFailure
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
> Function call: shutdown(0x7f920271dfd0)
> Stack trace:
> [   OK ] SlaveTest.ContainerizerUsageFailure (94 ms)
> [--] 1 test from SlaveTest (95 ms total)
> {noformat}
> Occurs deterministically for me on OSX 10.10



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-4151) GMock warning in SlaveTest.ContainerizerUsageFailure

2015-12-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-4151:
---

Assignee: Benjamin Bannier

> GMock warning in SlaveTest.ContainerizerUsageFailure
> 
>
> Key: MESOS-4151
> URL: https://issues.apache.org/jira/browse/MESOS-4151
> Project: Mesos
>  Issue Type: Bug
>Reporter: Neil Conway
>Assignee: Benjamin Bannier
>  Labels: mesosphere, tech-debt
> Attachments: gmock_warning_containerizer.txt
>
>
> {noformat}
> [ RUN  ] SlaveTest.ContainerizerUsageFailure
> GMOCK WARNING:
> Uninteresting mock function call - returning directly.
> Function call: shutdown(0x7f920271dfd0)
> Stack trace:
> [   OK ] SlaveTest.ContainerizerUsageFailure (94 ms)
> [--] 1 test from SlaveTest (95 ms total)
> {noformat}
> Occurs deterministically for me on OSX 10.10



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-12-15 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-2857:

Comment: was deleted

(was: Comparing with the original log in this report, this appears to be a 
different issue.

>From the log it appears as if everything happened as expected, only that the 
>test ran into our default timeout when waiting for a status update; without 
>verbose libprocess logs I am tempted to attribute this issue to very high 
>system load.)

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.607466 24594 

[jira] [Updated] (MESOS-4271) Consider replacing libtool with dolt to speed up build

2016-01-04 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4271:

Sprint: Mesosphere Sprint 26

> Consider replacing libtool with dolt to speed up build
> --
>
> Key: MESOS-4271
> URL: https://issues.apache.org/jira/browse/MESOS-4271
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>  Labels: build
>
> Mesos uses a pretty standard autotools setup for the build so that 
> {{libtool}} is used extensively to abstract away the aspects of library 
> creation (both compiling source files, and creating the libraries). For some 
> versions of {{libtool}} its invocation can add considerably to the overall 
> build time.
> Dolt provides a much more condensed implementation of {{libtool}}'s 
> functionality for modern platforms (<100 locs vs ~10 klocs), so that it can 
> run much faster. We should investigate whether activating dolt makes sense.
> I tested dolt under OS X 10.10.5. I first primed ccache and then rebuilt 
> mesos-related objects,
> {code}
> ./configure --disable-python --disable-java  # benchmark mostly C & C++ file 
> compile and link
> make check GTEST_FILTER=''   # prime ccache
> make mostlyclean # remove most mesos objects and 
> libs
> make -jN check GTEST_FILTER=''   # rebuild
> {code}
> |||  user [s] | real [s]| sys [s]||
> | make -j10 (dolt)|  42.8±0.1 |  54.3±0.2 |  34.1±0.2 |
> | make -j10 (libtool) |  65.6±0.3 | 148.7±1.1 | 108.5±1.0 |
> | make -j1 (dolt) |  76.9±0.3 |  45.5±0.1 |  27.1±0.1 |
> | make -j1 (libtool)  | 168.2±2.3 |  97.5±1.5 |  75.8±1.3 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4277) Provide constexpr Duration::min() and max()

2016-01-04 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4277:

Sprint: Mesosphere Sprint 26

> Provide constexpr Duration::min() and max()
> ---
>
> Key: MESOS-4277
> URL: https://issues.apache.org/jira/browse/MESOS-4277
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout, technical debt
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> {{Duration}} could be implemented so that it can provide {{constexpr}} 
> {{min}} and {{max}} functions.
> This addresses an existing {{TODO}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4273) Replace variadic List constructor with one taking a initializer_list

2016-01-04 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4273:

Sprint: Mesosphere Sprint 26

> Replace variadic List constructor with one taking a initializer_list
> 
>
> Key: MESOS-4273
> URL: https://issues.apache.org/jira/browse/MESOS-4273
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> {{List}} provides a variadic constructor currently implemented with some 
> preprocessor magic. Given that we already require C++11 we can replace that 
> one with a much simpler one just taking a {{std::initializer_list}}. This 
> would change the invocations,
> {code}
> auto l1 = List(1, 2, 3);// now
> auto l2 = List({1, 2, 3});  // proposed
> {code}
> This addresses an existing {{TODO}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4275) Duration uses fixed-width types inconsistently

2016-01-04 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4275:

Sprint: Mesosphere Sprint 26

> Duration uses fixed-width types inconsistently
> --
>
> Key: MESOS-4275
> URL: https://issues.apache.org/jira/browse/MESOS-4275
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> The implementation of the {{Duration}} class correctly uses fixed-width types 
> (here {{int64_t}}) for portability internally, but uses {{long}} types in a 
> few places (in particular {{LLONG_MIN}} and {{LLONG_MAX}}). This is 
> inconsistent on 64-bit platforms, and probably incorrect on 32-bit as there 
> {{long}} is 32 bit wide.
> Additionally, the longer {{Duration}} types ({{Minutes}}, {{Hours}}, 
> {{Days}}, and {{Weeks}}) construct from {{int32_t}}, while shorter ones take 
> {{int64_t}}. Probably as a left-over this is matched with a redundant 
> {{Duration}} constructor taking an {{int32_t}} value where the other one 
> taking an {{int64_t}} value would be sufficient. It should be safe to just 
> construct from {{int64_t}} in all places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4278) Constrain types used to instantiate Flags objects

2016-01-04 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4278:

Sprint: Mesosphere Sprint 26

> Constrain types used to instantiate Flags objects
> -
>
> Key: MESOS-4278
> URL: https://issues.apache.org/jira/browse/MESOS-4278
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout, technical debt
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> stout's {{Flags}} can be instantiated with a number of base flags provided by 
> the caller as template arguments; these are then inherited from by the 
> created {{Flags}} instance.
> To ensure the expected semantics we could constrain the template arguments to 
> ones derived from {{FlagsBase}}.
> This addresses an existing {{TODO}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4276) Remove dupicate Mesos constructor

2016-01-04 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4276:

Sprint: Mesosphere Sprint 26

> Remove dupicate Mesos constructor
> -
>
> Key: MESOS-4276
> URL: https://issues.apache.org/jira/browse/MESOS-4276
> Project: Mesos
>  Issue Type: Improvement
>  Components: technical debt
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> {{Mesos}} offers two almost-identical constructors
> {code}
> // TODO(vinod): Remove this in favor of the below constructor.
> Mesos(const std::string& master,
>   const std::function& connected,
>   const std::function& disconnected,
>   const std::function& received);
> Mesos(const std::string& master,
>   ContentType contentType,
>   const std::function& connected,
>   const std::function& disconnected,
>   const std::function& received);
> {code}
> Here invocations of the first constructor can replaced trivially with
> invocations of the second one with {{contentType = ContentType::PROTOBUF}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4272) DurationTest.Arithmetic performs inexact float calculation in test

2016-01-04 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4272:

Sprint: Mesosphere Sprint 26

> DurationTest.Arithmetic performs inexact float calculation in test
> --
>
> Key: MESOS-4272
> URL: https://issues.apache.org/jira/browse/MESOS-4272
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>
> {{DurationTest.Arithmetic}} does a calculation with not exactly representable 
> floating point values and also performs an equality check,
> {code}
> EXPECT_EQ(Duration::create(3.3).get(), Seconds(10) * 0.33);
> {code}
> Here neither the value {{3.3}} nor {{0.33}} cannot be represented exactly as 
> a floating point number so the check might fail incorrectly (as it does e.g. 
> when compiling and executing the test under 32-bit on Debian8).
> Instead we should just use exactly representable values to make sure the test 
> will succeed as long as the implementation behaves as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4276) Remove dupicate Mesos constructor

2016-01-04 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4276:
---

 Summary: Remove dupicate Mesos constructor
 Key: MESOS-4276
 URL: https://issues.apache.org/jira/browse/MESOS-4276
 Project: Mesos
  Issue Type: Improvement
  Components: technical debt
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier


{{Mesos}} offers two almost-identical constructors
{code}
  // TODO(vinod): Remove this in favor of the below constructor.
  Mesos(const std::string& master,
const std::function& connected,
const std::function& disconnected,
const std::function& received);

  Mesos(const std::string& master,
ContentType contentType,
const std::function& connected,
const std::function& disconnected,
const std::function& received);
{code}

Here invocations of the first constructor can replaced trivially with
invocations of the second one with {{contentType = ContentType::PROTOBUF}}.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4277) Provide constexpr Duration::min() and max()

2016-01-04 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4277:
---

 Summary: Provide constexpr Duration::min() and max()
 Key: MESOS-4277
 URL: https://issues.apache.org/jira/browse/MESOS-4277
 Project: Mesos
  Issue Type: Improvement
  Components: stout
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier
Priority: Minor


{{Duration}} could be implemented so that it can provide {{constexpr}} {{min}} 
and {{max}} functions.

This addresses an existing {{TODO}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4278) Constrain types used to instantiate Flags objects

2016-01-04 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4278:
---

 Summary: Constrain types used to instantiate Flags objects
 Key: MESOS-4278
 URL: https://issues.apache.org/jira/browse/MESOS-4278
 Project: Mesos
  Issue Type: Improvement
  Components: stout, technical debt
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier


stout's {{Flags}} can be instantiated with a number of base flags provided by 
the caller as template arguments; these are then inherited from by the created 
{{Flags}} instance.

To ensure the expected semantics we could constrain the template arguments to 
ones derived from {{FlagsBase}}.

This addresses an existing {{TODO}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4272) DurationTest.Arithmetic performs inexact float calculation in test

2016-01-04 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4272:
---

 Summary: DurationTest.Arithmetic performs inexact float 
calculation in test
 Key: MESOS-4272
 URL: https://issues.apache.org/jira/browse/MESOS-4272
 Project: Mesos
  Issue Type: Bug
  Components: test
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier
Priority: Minor


{{DurationTest.Arithmetic}} does a calculation with not exactly representable 
floating point values and also performs an equality check,
{code}
EXPECT_EQ(Duration::create(3.3).get(), Seconds(10) * 0.33);
{code}
Here neither the value {{3.3}} nor {{0.33}} cannot be represented exactly as a 
floating point number so the check might fail incorrectly (as it does e.g. when 
compiling and executing the test under 32-bit on Debian8).

Instead we should just use exactly representable values to make sure the test 
will succeed as long as the implementation behaves as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4271) Consider replacing libtool with dolt to speed up build

2016-01-04 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4271:
---

 Summary: Consider replacing libtool with dolt to speed up build
 Key: MESOS-4271
 URL: https://issues.apache.org/jira/browse/MESOS-4271
 Project: Mesos
  Issue Type: Improvement
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier
Priority: Minor


Mesos uses a pretty standard autotools setup for the build so that {{libtool}} 
is used extensively to abstract away the aspects of library creation (both 
compiling source files, and creating the libraries). For some versions of 
{{libtool}} its invocation can add considerably to the overall build time.

Dolt provides a much more condensed implementation of {{libtool}}'s 
functionality for modern platforms (<100 locs vs ~10 klocs), so that it can run 
much faster. We should investigate whether activating dolt makes sense.

I tested dolt under OS X 10.10.5. I first primed ccache and then rebuilt 
mesos-related objects,
{code}
./configure --disable-python --disable-java  # benchmark mostly C & C++ file 
compile and link
make check GTEST_FILTER=''   # prime ccache
make mostlyclean # remove most mesos objects and 
libs
make -jN check GTEST_FILTER=''   # rebuild
{code}

|||  user [s] | real [s]| sys [s]||
| make -j10 (dolt)|  42.8±0.1 |  54.3±0.2 |  34.1±0.2 |
| make -j10 (libtool) |  65.6±0.3 | 148.7±1.1 | 108.5±1.0 |
| make -j1 (dolt) |  76.9±0.3 |  45.5±0.1 |  27.1±0.1 |
| make -j1 (libtool)  | 168.2±2.3 |  97.5±1.5 |  75.8±1.3 |




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4273) Replace variadic List constructor with one taking a initializer_list

2016-01-04 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4273:
---

 Summary: Replace variadic List constructor with one taking a 
initializer_list
 Key: MESOS-4273
 URL: https://issues.apache.org/jira/browse/MESOS-4273
 Project: Mesos
  Issue Type: Improvement
  Components: stout
Reporter: Benjamin Bannier
Assignee: Benjamin Bannier
Priority: Minor


{{List}} provides a variadic constructor currently implemented with some 
preprocessor magic. Given that we already require C++11 we can replace that one 
with a much simpler one just taking a {{std::initializer_list}}. This would 
change the invocations,
{code}
auto l1 = List(1, 2, 3);// now
auto l2 = List({1, 2, 3});  // proposed
{code}

This addresses an existing {{TODO}}.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4275) Duration uses fixed-width types inconsistently

2016-01-05 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4275:

Shepherd: Michael Park

> Duration uses fixed-width types inconsistently
> --
>
> Key: MESOS-4275
> URL: https://issues.apache.org/jira/browse/MESOS-4275
> Project: Mesos
>  Issue Type: Bug
>  Components: stout
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>
> The implementation of the {{Duration}} class correctly uses fixed-width types 
> (here {{int64_t}}) for portability internally, but uses {{long}} types in a 
> few places (in particular {{LLONG_MIN}} and {{LLONG_MAX}}). This is 
> inconsistent on 64-bit platforms, and probably incorrect on 32-bit as there 
> {{long}} is 32 bit wide.
> Additionally, the longer {{Duration}} types ({{Minutes}}, {{Hours}}, 
> {{Days}}, and {{Weeks}}) construct from {{int32_t}}, while shorter ones take 
> {{int64_t}}. Probably as a left-over this is matched with a redundant 
> {{Duration}} constructor taking an {{int32_t}} value where the other one 
> taking an {{int64_t}} value would be sufficient. It should be safe to just 
> construct from {{int64_t}} in all places.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3795) process::io::write takes parameter as void* which could be const

2015-11-25 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3795:

Attachment: (was: ubuntu14_clang-3.6_FAILED.log)

> process::io::write takes parameter as void* which could be const
> 
>
> Key: MESOS-3795
> URL: https://issues.apache.org/jira/browse/MESOS-3795
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Bannier
>  Labels: mesosphere, tech-debt
>
> In libprocess we have
> {code}
> Future write(int fd, void* data, size_t size);
> {code}
> which expects a non-{{const}} {{void*}} for its {{data}} parameter. Under the 
> covers {{data}} appears to be handled as a {{const}} (like one would expect 
> from the signature its inspiration {{::write}}).
> This function is not used too often, but since it expects a non-{{const}} 
> value for {{data}} automatic conversions to {{void*}} from other pointer 
> types are disabled; instead callers seem cast manually to {{void*}} -- often 
> with C-style casts.
> We should sync this method's signature with that of {{::write}}.
> In addition to following the expected semantics of {{::write}}, having this 
> work without casts with any pointer value {{data}} would make it easier to 
> interface this with character literals, or raw data ptrs from STL containers 
> (e.g. {{Container::data}}). It would probably also indirectly eliminate 
> temptation to use C-casts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3579) FetcherCacheTest.LocalUncachedExtract is flaky

2015-11-27 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3579:

Attachment: mesos-fetcher-test-archive.tgz

This failed test is due to a bug in GNU tar-1.28 and earlier, and was fixed in 
GNU tar commit {{1847ec67cec36a17354115374954fea211d1f0da}}. BSD tar seems not 
affected.

The problem seems to have been that an algorithm for recognition of compressed 
tar archives was used which had false negatives (i.e. a compressed file was not 
classified as compressed). Since gzip embeds archive creation timestamps in the 
header this only failed sporadically.

A possible workaround could be to switch to BSD tar, or for admins to test 
decompressibility of archives before starting executors (suggest command 
mirroring what we use: {{tar -C /some/destition/directory -xf 
/path/to/archive.tgz}}).



A tar user can always work around this by explicitly setting the decompression 
algorithm to use which disables the autoclassification. The matter is less 
simple for us since we explicitly rely on tar for picking the decompression 
algorithm automatically. With this approach there is e.g., no correlation 
between filename extention and used compression algorithms, so that a fix on 
our side which would want to be more explicit would need to e.g., examine the 
asset header and go from there.



I attached a sample archive an unfixed tar fails to recognize, confirm faulty 
version with
{code}
% tar -C . -xf mesos-fetcher-test-archive.tgz
{code}

> FetcherCacheTest.LocalUncachedExtract is flaky
> --
>
> Key: MESOS-3579
> URL: https://issues.apache.org/jira/browse/MESOS-3579
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Anand Mazumdar
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
> Attachments: mesos-fetcher-test-archive.tgz, 
> ubuntu14_clang-3.6_FAILED.log
>
>
> From ASF CI:
> https://builds.apache.org/job/Mesos/866/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] FetcherCacheTest.LocalUncachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA'
> I0925 19:15:39.541198 27410 leveldb.cpp:176] Opened db in 3.43934ms
> I0925 19:15:39.542362 27410 leveldb.cpp:183] Compacted db in 1.136184ms
> I0925 19:15:39.542428 27410 leveldb.cpp:198] Created db iterator in 35866ns
> I0925 19:15:39.542448 27410 leveldb.cpp:204] Seeked to beginning of db in 
> 8807ns
> I0925 19:15:39.542459 27410 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6325ns
> I0925 19:15:39.542505 27410 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0925 19:15:39.543143 27438 recover.cpp:449] Starting replica recovery
> I0925 19:15:39.543393 27438 recover.cpp:475] Replica is in EMPTY status
> I0925 19:15:39.544373 27436 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0925 19:15:39.544791 27433 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0925 19:15:39.545284 27433 recover.cpp:566] Updating replica status to 
> STARTING
> I0925 19:15:39.546155 27436 master.cpp:376] Master 
> c8bf1c95-50f4-4832-a570-c560f0b466ae (f57fd4291168) started on 
> 172.17.1.195:41781
> I0925 19:15:39.546257 27433 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 747249ns
> I0925 19:15:39.546288 27433 replica.cpp:323] Persisted replica status to 
> STARTING
> I0925 19:15:39.546483 27434 recover.cpp:475] Replica is in STARTING status
> I0925 19:15:39.546187 27436 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/master" 
> --zk_session_timeout="10secs"
> I0925 19:15:39.546567 27436 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0925 19:15:39.546617 27436 master.cpp:428] Master only allowing 
> 

[jira] [Updated] (MESOS-3579) FetcherCacheTest.LocalUncachedExtract is flaky

2015-11-27 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3579:

Shepherd: Till Toenshoff

> FetcherCacheTest.LocalUncachedExtract is flaky
> --
>
> Key: MESOS-3579
> URL: https://issues.apache.org/jira/browse/MESOS-3579
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Anand Mazumdar
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
> Attachments: mesos-fetcher-test-archive.tgz, 
> ubuntu14_clang-3.6_FAILED.log
>
>
> From ASF CI:
> https://builds.apache.org/job/Mesos/866/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] FetcherCacheTest.LocalUncachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA'
> I0925 19:15:39.541198 27410 leveldb.cpp:176] Opened db in 3.43934ms
> I0925 19:15:39.542362 27410 leveldb.cpp:183] Compacted db in 1.136184ms
> I0925 19:15:39.542428 27410 leveldb.cpp:198] Created db iterator in 35866ns
> I0925 19:15:39.542448 27410 leveldb.cpp:204] Seeked to beginning of db in 
> 8807ns
> I0925 19:15:39.542459 27410 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6325ns
> I0925 19:15:39.542505 27410 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0925 19:15:39.543143 27438 recover.cpp:449] Starting replica recovery
> I0925 19:15:39.543393 27438 recover.cpp:475] Replica is in EMPTY status
> I0925 19:15:39.544373 27436 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0925 19:15:39.544791 27433 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0925 19:15:39.545284 27433 recover.cpp:566] Updating replica status to 
> STARTING
> I0925 19:15:39.546155 27436 master.cpp:376] Master 
> c8bf1c95-50f4-4832-a570-c560f0b466ae (f57fd4291168) started on 
> 172.17.1.195:41781
> I0925 19:15:39.546257 27433 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 747249ns
> I0925 19:15:39.546288 27433 replica.cpp:323] Persisted replica status to 
> STARTING
> I0925 19:15:39.546483 27434 recover.cpp:475] Replica is in STARTING status
> I0925 19:15:39.546187 27436 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/master" 
> --zk_session_timeout="10secs"
> I0925 19:15:39.546567 27436 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0925 19:15:39.546617 27436 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0925 19:15:39.546632 27436 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials'
> I0925 19:15:39.546931 27436 master.cpp:467] Using default 'crammd5' 
> authenticator
> I0925 19:15:39.547044 27436 master.cpp:504] Authorization enabled
> I0925 19:15:39.547276 27441 whitelist_watcher.cpp:79] No whitelist given
> I0925 19:15:39.547320 27434 hierarchical.hpp:468] Initialized hierarchical 
> allocator process
> I0925 19:15:39.547471 27438 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0925 19:15:39.548318 27443 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0925 19:15:39.549067 27435 recover.cpp:566] Updating replica status to VOTING
> I0925 19:15:39.549115 27440 master.cpp:1603] The newly elected leader is 
> master@172.17.1.195:41781 with id c8bf1c95-50f4-4832-a570-c560f0b466ae
> I0925 19:15:39.549162 27440 master.cpp:1616] Elected as the leading master!
> I0925 19:15:39.549190 27440 master.cpp:1376] Recovering from registrar
> I0925 19:15:39.549342 27434 registrar.cpp:309] Recovering registrar
> I0925 19:15:39.549666 27430 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 418187ns
> I0925 19:15:39.549753 27430 replica.cpp:323] Persisted replica 

[jira] [Commented] (MESOS-4012) Update documentation to reflect the addition of installable tests.

2015-11-25 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15026880#comment-15026880
 ] 

Benjamin Bannier commented on MESOS-4012:
-

In addition to adding information on how a user can check conformance of a 
machine this would also give us the opportunity to cleanly separate what is 
_needed for to build mesos_ and what is _needed to run it_.

> Update documentation to reflect the addition of installable tests.  
> 
>
> Key: MESOS-4012
> URL: https://issues.apache.org/jira/browse/MESOS-4012
> Project: Mesos
>  Issue Type: Documentation
>Reporter: Till Toenshoff
>
> We may want to add the needed steps for administrators to create and run the 
> test-suite on anything other than the build machine. 
> One possible location could be {{docs/gettings-started.md}} for validating 
> the pre-requisites as described in that document. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3795) process::io::write takes parameter as void* which could be const

2015-11-30 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3795:

Shepherd: Bernd Mathiske
  Sprint: Mesosphere Sprint 23

> process::io::write takes parameter as void* which could be const
> 
>
> Key: MESOS-3795
> URL: https://issues.apache.org/jira/browse/MESOS-3795
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere, tech-debt
>
> In libprocess we have
> {code}
> Future write(int fd, void* data, size_t size);
> {code}
> which expects a non-{{const}} {{void*}} for its {{data}} parameter. Under the 
> covers {{data}} appears to be handled as a {{const}} (like one would expect 
> from the signature its inspiration {{::write}}).
> This function is not used too often, but since it expects a non-{{const}} 
> value for {{data}} automatic conversions to {{void*}} from other pointer 
> types are disabled; instead callers seem cast manually to {{void*}} -- often 
> with C-style casts.
> We should sync this method's signature with that of {{::write}}.
> In addition to following the expected semantics of {{::write}}, having this 
> work without casts with any pointer value {{data}} would make it easier to 
> interface this with character literals, or raw data ptrs from STL containers 
> (e.g. {{Container::data}}). It would probably also indirectly eliminate 
> temptation to use C-casts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-12-01 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033398#comment-15033398
 ] 

Benjamin Bannier commented on MESOS-2857:
-

Comparing with the original log in this report, this appears to be a different 
issue.

>From the log it appears as if everything happened as expected, only that the 
>test ran into our default timeout when waiting for a status update; without 
>verbose libprocess logs I am tempted to attribute this issue to very high 
>system load.

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.607466 24594 

[jira] [Commented] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-12-01 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033400#comment-15033400
 ] 

Benjamin Bannier commented on MESOS-2857:
-

Comparing with the original log in this report, this appears to be a different 
issue.

>From the log it appears as if everything happened as expected, only that the 
>test ran into our default timeout when waiting for a status update; without 
>verbose libprocess logs I am tempted to attribute this issue to very high 
>system load.

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.607466 24594 

[jira] [Commented] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-12-01 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033399#comment-15033399
 ] 

Benjamin Bannier commented on MESOS-2857:
-

Comparing with the original log in this report, this appears to be a different 
issue.

>From the log it appears as if everything happened as expected, only that the 
>test ran into our default timeout when waiting for a status update; without 
>verbose libprocess logs I am tempted to attribute this issue to very high 
>system load.

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.607466 24594 

[jira] [Assigned] (MESOS-4030) DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky

2015-12-01 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-4030:
---

Assignee: Benjamin Bannier

> DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky
> ---
>
> Key: MESOS-4030
> URL: https://issues.apache.org/jira/browse/MESOS-4030
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
> Environment: [Ubuntu 
> 14|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], 
> 0.26.0 RC (wip) enable-ssl & enable-libevent, root test-run 
>Reporter: Till Toenshoff
>Assignee: Benjamin Bannier
>  Labels: flaky, flaky-test
>
> {noformat}
> [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping
> I1201 02:18:00.325283 18931 leveldb.cpp:176] Opened db in 3.877576ms
> I1201 02:18:00.326195 18931 leveldb.cpp:183] Compacted db in 831923ns
> I1201 02:18:00.326288 18931 leveldb.cpp:198] Created db iterator in 21460ns
> I1201 02:18:00.326305 18931 leveldb.cpp:204] Seeked to beginning of db in 
> 1431ns
> I1201 02:18:00.326316 18931 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 178ns
> I1201 02:18:00.326354 18931 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 02:18:00.327128 18952 recover.cpp:449] Starting replica recovery
> I1201 02:18:00.327481 18948 recover.cpp:475] Replica is in EMPTY status
> I1201 02:18:00.328354 18945 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (88123)@127.0.1.1:45788
> I1201 02:18:00.328660 18950 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1201 02:18:00.329139 18951 recover.cpp:566] Updating replica status to 
> STARTING
> I1201 02:18:00.330413 18949 master.cpp:367] Master 
> 9577131b-f0b1-47bd-8f88-f5edbf2f026d (ubuntu14) started on 127.0.1.1:45788
> I1201 02:18:00.330474 18949 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/dHFLJX/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/dHFLJX/master" 
> --zk_session_timeout="10secs"
> I1201 02:18:00.330662 18949 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1201 02:18:00.330670 18949 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1201 02:18:00.330682 18949 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/dHFLJX/credentials'
> I1201 02:18:00.330950 18945 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.585892ms
> I1201 02:18:00.331248 18945 replica.cpp:323] Persisted replica status to 
> STARTING
> I1201 02:18:00.330968 18949 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1201 02:18:00.331681 18949 master.cpp:495] Authorization enabled
> I1201 02:18:00.331717 18945 recover.cpp:475] Replica is in STARTING status
> I1201 02:18:00.332875 18947 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (88124)@127.0.1.1:45788
> I1201 02:18:00.44 18947 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1201 02:18:00.333760 18950 recover.cpp:566] Updating replica status to VOTING
> I1201 02:18:00.333875 18945 master.cpp:1606] The newly elected leader is 
> master@127.0.1.1:45788 with id 9577131b-f0b1-47bd-8f88-f5edbf2f026d
> I1201 02:18:00.334624 18951 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 307292ns
> I1201 02:18:00.334652 18951 replica.cpp:323] Persisted replica status to 
> VOTING
> I1201 02:18:00.334656 18945 master.cpp:1619] Elected as the leading master!
> I1201 02:18:00.334758 18951 recover.cpp:580] Successfully joined the Paxos 
> group
> I1201 02:18:00.334933 18945 master.cpp:1379] Recovering from registrar
> I1201 02:18:00.335108 18951 recover.cpp:464] Recover process terminated
> I1201 02:18:00.335183 18951 registrar.cpp:309] Recovering registrar
> I1201 02:18:00.335577 18950 log.cpp:661] Attempting to start the writer
> I1201 02:18:00.336777 18952 replica.cpp:496] Replica received implicit 
> promise 

[jira] [Commented] (MESOS-2512) FetcherTest.ExtractNotExecutable is flaky

2015-11-30 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15031536#comment-15031536
 ] 

Benjamin Bannier commented on MESOS-2512:
-

This appears to have the identical cause as MESOS-3579: {{mesos-fetcher}} can 
sometimes fail to unpack a compressed tar archive since {{tar}} fails to 
recognize it as actually compressed. We implemented a work-around for this test 
there as well.

> FetcherTest.ExtractNotExecutable is flaky
> -
>
> Key: MESOS-2512
> URL: https://issues.apache.org/jira/browse/MESOS-2512
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.23.0
>Reporter: Vinod Kone
>Assignee: Bernd Mathiske
>  Labels: mesosphere
>
> Observed in our internal CI.
> {code}
> [ RUN  ] FetcherTest.ExtractNotExecutable
> Using temporary directory '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn'
> tar: Removing leading `/' from member names
> I0316 18:55:48.509306 14678 fetcher.cpp:155] Starting to fetch URIs for 
> container: de1e5165-82b4-434b-9149-8667cf652c64, directory: 
> /tmp/FetcherTest_ExtractNotExecutable_R5R7Cn
> I0316 18:55:48.509845 14678 fetcher.cpp:238] Fetching URIs using command 
> '/var/jenkins/workspace/mesos-fedora-20-gcc/src/mesos-fetcher'
> I0316 18:55:48.568611 15028 logging.cpp:177] Logging to STDERR
> I0316 18:55:48.574928 15028 fetcher.cpp:214] Fetching URI '/tmp/DIjmjV.tar.gz'
> I0316 18:55:48.575166 15028 fetcher.cpp:194] Copying resource from 
> '/tmp/DIjmjV.tar.gz' to '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn'
> tar: This does not look like a tar archive
> tar: Exiting with failure status due to previous errors
> Failed to extract 
> /tmp/FetcherTest_ExtractNotExecutable_R5R7Cn/DIjmjV.tar.gz:Failed to extract: 
> command tar -C '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn' -xf 
> '/tmp/FetcherTest_ExtractNotExecutable_R5R7Cn/DIjmjV.tar.gz' exited with 
> status: 512
> tests/fetcher_tests.cpp:686: Failure
> (fetch).failure(): Failed to fetch URIs for container 
> 'de1e5165-82b4-434b-9149-8667cf652c64'with exit status: 256
> [  FAILED  ] FetcherTest.ExtractNotExecutable (208 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-11-30 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-2857:

  Sprint: Mesosphere Sprint 23
Story Points: 1  (was: 0)

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.607466 24594 master.cpp:1489] Elected as the leading master!
> I0610 20:04:48.607481 24594 master.cpp:1259] Recovering from registrar
> I0610 20:04:48.607712 24594 registrar.cpp:313] Recovering registrar
> I0610 20:04:48.608543 24588 log.cpp:661] Attempting to start the writer
> I0610 20:04:48.610231 24588 replica.cpp:477] 

[jira] [Updated] (MESOS-3579) FetcherCacheTest.LocalUncachedExtract is flaky

2015-11-30 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3579:

Story Points: 2

> FetcherCacheTest.LocalUncachedExtract is flaky
> --
>
> Key: MESOS-3579
> URL: https://issues.apache.org/jira/browse/MESOS-3579
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Anand Mazumdar
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
> Attachments: mesos-fetcher-test-archive.tgz, 
> ubuntu14_clang-3.6_FAILED.log
>
>
> From ASF CI:
> https://builds.apache.org/job/Mesos/866/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] FetcherCacheTest.LocalUncachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA'
> I0925 19:15:39.541198 27410 leveldb.cpp:176] Opened db in 3.43934ms
> I0925 19:15:39.542362 27410 leveldb.cpp:183] Compacted db in 1.136184ms
> I0925 19:15:39.542428 27410 leveldb.cpp:198] Created db iterator in 35866ns
> I0925 19:15:39.542448 27410 leveldb.cpp:204] Seeked to beginning of db in 
> 8807ns
> I0925 19:15:39.542459 27410 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6325ns
> I0925 19:15:39.542505 27410 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0925 19:15:39.543143 27438 recover.cpp:449] Starting replica recovery
> I0925 19:15:39.543393 27438 recover.cpp:475] Replica is in EMPTY status
> I0925 19:15:39.544373 27436 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0925 19:15:39.544791 27433 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0925 19:15:39.545284 27433 recover.cpp:566] Updating replica status to 
> STARTING
> I0925 19:15:39.546155 27436 master.cpp:376] Master 
> c8bf1c95-50f4-4832-a570-c560f0b466ae (f57fd4291168) started on 
> 172.17.1.195:41781
> I0925 19:15:39.546257 27433 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 747249ns
> I0925 19:15:39.546288 27433 replica.cpp:323] Persisted replica status to 
> STARTING
> I0925 19:15:39.546483 27434 recover.cpp:475] Replica is in STARTING status
> I0925 19:15:39.546187 27436 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/master" 
> --zk_session_timeout="10secs"
> I0925 19:15:39.546567 27436 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0925 19:15:39.546617 27436 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0925 19:15:39.546632 27436 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials'
> I0925 19:15:39.546931 27436 master.cpp:467] Using default 'crammd5' 
> authenticator
> I0925 19:15:39.547044 27436 master.cpp:504] Authorization enabled
> I0925 19:15:39.547276 27441 whitelist_watcher.cpp:79] No whitelist given
> I0925 19:15:39.547320 27434 hierarchical.hpp:468] Initialized hierarchical 
> allocator process
> I0925 19:15:39.547471 27438 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0925 19:15:39.548318 27443 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0925 19:15:39.549067 27435 recover.cpp:566] Updating replica status to VOTING
> I0925 19:15:39.549115 27440 master.cpp:1603] The newly elected leader is 
> master@172.17.1.195:41781 with id c8bf1c95-50f4-4832-a570-c560f0b466ae
> I0925 19:15:39.549162 27440 master.cpp:1616] Elected as the leading master!
> I0925 19:15:39.549190 27440 master.cpp:1376] Recovering from registrar
> I0925 19:15:39.549342 27434 registrar.cpp:309] Recovering registrar
> I0925 19:15:39.549666 27430 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 418187ns
> I0925 19:15:39.549753 27430 replica.cpp:323] Persisted replica status to 

[jira] [Updated] (MESOS-3581) License headers show up all over doxygen documentation.

2015-12-01 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3581:

Shepherd: Michael Park  (was: Bernd Mathiske)

> License headers show up all over doxygen documentation.
> ---
>
> Key: MESOS-3581
> URL: https://issues.apache.org/jira/browse/MESOS-3581
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Affects Versions: 0.24.1
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>  Labels: mesosphere
>
> Currently license headers are commented in something resembling Javadoc style,
> {code}
> /**
> * Licensed ...
> {code}
> Since we use Javadoc-style comment blocks for doxygen documentation all 
> license headers appear in the generated documentation, potentially and likely 
> hiding the actual documentation.
> Using {{/*}} to start the comment blocks would be enough to hide them from 
> doxygen, but would likely also result in a largish (though mostly 
> uninteresting) patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4030) DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky

2015-12-01 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15033658#comment-15033658
 ] 

Benjamin Bannier commented on MESOS-4030:
-

This appears to be a race in the test code: we cannot parse the containerizer's 
stdout or continue with the cleanup before the containerizer has finished 
running. We could e.g. insert a capture calls to {{Docker::_run}} to get 
notified once we are ready to proceed.

> DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping is flaky
> ---
>
> Key: MESOS-4030
> URL: https://issues.apache.org/jira/browse/MESOS-4030
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Affects Versions: 0.26.0
> Environment: [Ubuntu 
> 14|https://github.com/tillt/mesos-vagrant-ci/blob/master/ubuntu14/setup.sh], 
> 0.26.0 RC (wip) enable-ssl & enable-libevent, root test-run 
>Reporter: Till Toenshoff
>Assignee: Benjamin Bannier
>  Labels: flaky, flaky-test
>
> {noformat}
> [ RUN  ] DockerContainerizerTest.ROOT_DOCKER_NC_PortMapping
> I1201 02:18:00.325283 18931 leveldb.cpp:176] Opened db in 3.877576ms
> I1201 02:18:00.326195 18931 leveldb.cpp:183] Compacted db in 831923ns
> I1201 02:18:00.326288 18931 leveldb.cpp:198] Created db iterator in 21460ns
> I1201 02:18:00.326305 18931 leveldb.cpp:204] Seeked to beginning of db in 
> 1431ns
> I1201 02:18:00.326316 18931 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 178ns
> I1201 02:18:00.326354 18931 replica.cpp:780] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1201 02:18:00.327128 18952 recover.cpp:449] Starting replica recovery
> I1201 02:18:00.327481 18948 recover.cpp:475] Replica is in EMPTY status
> I1201 02:18:00.328354 18945 replica.cpp:676] Replica in EMPTY status received 
> a broadcasted recover request from (88123)@127.0.1.1:45788
> I1201 02:18:00.328660 18950 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1201 02:18:00.329139 18951 recover.cpp:566] Updating replica status to 
> STARTING
> I1201 02:18:00.330413 18949 master.cpp:367] Master 
> 9577131b-f0b1-47bd-8f88-f5edbf2f026d (ubuntu14) started on 127.0.1.1:45788
> I1201 02:18:00.330474 18949 master.cpp:369] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" --credentials="/tmp/dHFLJX/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/usr/local/share/mesos/webui" --work_dir="/tmp/dHFLJX/master" 
> --zk_session_timeout="10secs"
> I1201 02:18:00.330662 18949 master.cpp:414] Master only allowing 
> authenticated frameworks to register
> I1201 02:18:00.330670 18949 master.cpp:419] Master only allowing 
> authenticated slaves to register
> I1201 02:18:00.330682 18949 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/dHFLJX/credentials'
> I1201 02:18:00.330950 18945 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 1.585892ms
> I1201 02:18:00.331248 18945 replica.cpp:323] Persisted replica status to 
> STARTING
> I1201 02:18:00.330968 18949 master.cpp:458] Using default 'crammd5' 
> authenticator
> I1201 02:18:00.331681 18949 master.cpp:495] Authorization enabled
> I1201 02:18:00.331717 18945 recover.cpp:475] Replica is in STARTING status
> I1201 02:18:00.332875 18947 replica.cpp:676] Replica in STARTING status 
> received a broadcasted recover request from (88124)@127.0.1.1:45788
> I1201 02:18:00.44 18947 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1201 02:18:00.333760 18950 recover.cpp:566] Updating replica status to VOTING
> I1201 02:18:00.333875 18945 master.cpp:1606] The newly elected leader is 
> master@127.0.1.1:45788 with id 9577131b-f0b1-47bd-8f88-f5edbf2f026d
> I1201 02:18:00.334624 18951 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 307292ns
> I1201 02:18:00.334652 18951 replica.cpp:323] Persisted replica status to 
> VOTING
> I1201 02:18:00.334656 18945 master.cpp:1619] Elected as the leading master!
> I1201 02:18:00.334758 18951 recover.cpp:580] Successfully joined the Paxos 
> group
> I1201 02:18:00.334933 18945 master.cpp:1379] Recovering from registrar
> I1201 02:18:00.335108 18951 

[jira] [Updated] (MESOS-3273) EventCall Test Framework is flaky

2015-12-01 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3273:

Attachment: asan.log

Clang address sanitizer reports use-after-free errors for this test which 
appear to come from the libevent bindings; it attached a log. It might be a 
good idea to address that issue first.

> EventCall Test Framework is flaky
> -
>
> Key: MESOS-3273
> URL: https://issues.apache.org/jira/browse/MESOS-3273
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.0
> Environment: 
> https://builds.apache.org/job/Mesos/705/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/consoleFull
>Reporter: Vinod Kone
>  Labels: flaky-test, tech-debt, twitter
> Attachments: asan.log
>
>
> Observed this on ASF CI. h/t [~haosd...@gmail.com]
> Looks like the HTTP scheduler never sent a SUBSCRIBE request to the master.
> {code}
> [ RUN  ] ExamplesTest.EventCallFramework
> Using temporary directory '/tmp/ExamplesTest_EventCallFramework_k4vXkx'
> I0813 19:55:15.643579 26085 exec.cpp:443] Ignoring exited event because the 
> driver is aborted!
> Shutting down
> Sending SIGTERM to process tree at pid 26061
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26062
> Shutting down
> Killing the following process trees:
> [ 
> ]
> Sending SIGTERM to process tree at pid 26063
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26098
> Killing the following process trees:
> [ 
> ]
> Shutting down
> Sending SIGTERM to process tree at pid 26099
> Killing the following process trees:
> [ 
> ]
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0813 19:55:17.161726 26100 process.cpp:1012] libprocess is initialized on 
> 172.17.2.10:60249 for 16 cpus
> I0813 19:55:17.161888 26100 logging.cpp:177] Logging to STDERR
> I0813 19:55:17.163625 26100 scheduler.cpp:157] Version: 0.24.0
> I0813 19:55:17.175302 26100 leveldb.cpp:176] Opened db in 3.167446ms
> I0813 19:55:17.176393 26100 leveldb.cpp:183] Compacted db in 1.047996ms
> I0813 19:55:17.176496 26100 leveldb.cpp:198] Created db iterator in 77155ns
> I0813 19:55:17.176518 26100 leveldb.cpp:204] Seeked to beginning of db in 
> 8429ns
> I0813 19:55:17.176527 26100 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 4219ns
> I0813 19:55:17.176708 26100 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0813 19:55:17.178951 26136 recover.cpp:449] Starting replica recovery
> I0813 19:55:17.179934 26136 recover.cpp:475] Replica is in EMPTY status
> I0813 19:55:17.181970 26126 master.cpp:378] Master 
> 20150813-195517-167907756-60249-26100 (297daca2d01a) started on 
> 172.17.2.10:60249
> I0813 19:55:17.182317 26126 master.cpp:380] Flags at startup: 
> --acls="permissive: false
> register_frameworks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   roles {
> type: SOME
> values: "*"
>   }
> }
> run_tasks {
>   principals {
> type: SOME
> values: "test-principal"
>   }
>   users {
> type: SOME
> values: "mesos"
>   }
> }
> " --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="false" 
> --authenticators="crammd5" 
> --credentials="/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --max_slave_ping_timeouts="5" --quiet="false" 
> --recovery_slave_removal_limit="100%" --registry="replicated_log" 
> --registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
> --registry_strict="false" --root_submissions="true" 
> --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.24.0/src/webui" --work_dir="/tmp/mesos-II8Gua" 
> --zk_session_timeout="10secs"
> I0813 19:55:17.183475 26126 master.cpp:427] Master allowing unauthenticated 
> frameworks to register
> I0813 19:55:17.183536 26126 master.cpp:432] Master allowing unauthenticated 
> slaves to register
> I0813 19:55:17.183615 26126 credentials.hpp:37] Loading credentials for 
> authentication from '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials'
> W0813 19:55:17.183859 26126 credentials.hpp:52] Permissions on credentials 
> file '/tmp/ExamplesTest_EventCallFramework_k4vXkx/credentials' are too open. 
> It is recommended that your credentials file is NOT accessible by others.
> I0813 19:55:17.183969 26123 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0813 

[jira] [Created] (MESOS-4019) Consider enabling (non-fatal) compiler warnings about float equality comparisons

2015-11-26 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4019:
---

 Summary: Consider enabling (non-fatal) compiler warnings about 
float equality comparisons
 Key: MESOS-4019
 URL: https://issues.apache.org/jira/browse/MESOS-4019
 Project: Mesos
  Issue Type: Wish
Reporter: Benjamin Bannier


Comparing floating point numbers for equality does what is naively expected 
only in a very limited number of cases. More often than not one should probably 
instead either choose types allowing exact representation, or switch to a 
domain-specific equality measure.

We should consider enabling compiler warnings on such operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4019) Consider enabling (non-fatal) compiler warnings about float equality comparisons

2015-11-26 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4019:

Attachment: build.log

Build log of {{5e25791}}, configured with
{code}
../configure --disable-java --disable-python --enable-silent-rules CXX=clang++ 
CC=clang CXXFLAGS='-Wfloat-equal -Wno-error=float-equal'
{code}

Clang is {{clang version 3.8.0 http://llvm.org/git/clang 
18607432c62c72101d442b4e24508a250f1913b6) (http://llvm.org/git/llvm/ 
931217997cd3c9696284e73cace5b3a761147b36)}}.

The comparison sides inside mesos code are (see the full log for the path):
{code}
/Users/bbannier/src/mesos/3rdparty/libprocess/3rdparty/stout/include/stout/gtest.hpp|80
 col 18| warning: comparing floating point with == or != is unsafe 
[-Wfloat-equal]
/Users/bbannier/src/mesos/3rdparty/libprocess/3rdparty/stout/include/stout/json.hpp|473
 col 31| warning: comparing floating point with == or != is unsafe 
[-Wfloat-equal]
/Users/bbannier/src/mesos/3rdparty/libprocess/3rdparty/stout/include/stout/json.hpp|475
 col 31| warning: comparing floating point with == or != is unsafe 
[-Wfloat-equal]
/Users/bbannier/src/mesos/3rdparty/libprocess/3rdparty/stout/include/stout/json.hpp|477
 col 31| warning: comparing floating point with == or != is unsafe 
[-Wfloat-equal]
/Users/bbannier/src/mesos/3rdparty/libprocess/3rdparty/stout/include/stout/json.hpp|483
 col 40| warning: comparing floating point with == or != is unsafe 
[-Wfloat-equal]
/Users/bbannier/src/mesos/3rdparty/libprocess/3rdparty/stout/include/stout/json.hpp|495
 col 42| warning: comparing floating point with == or != is unsafe 
[-Wfloat-equal]
/Users/bbannier/src/mesos/3rdparty/libprocess/3rdparty/stout/include/stout/option.hpp|120
 col 39| warning: comparing floating point with == or != is unsafe 
[-Wfloat-equal]
/Users/bbannier/src/mesos/3rdparty/libprocess/include/process/gtest.hpp|191 col 
18| warning: comparing floating point with == or != is unsafe [-Wfloat-equal]
/Users/bbannier/src/mesos/src/common/resources.cpp|619 col 38| warning: 
comparing floating point with == or != is unsafe [-Wfloat-equal]
/Users/bbannier/src/mesos/src/common/type_utils.cpp|349 col 22| warning: 
comparing floating point with == or != is unsafe [-Wfloat-equal]
/Users/bbannier/src/mesos/src/common/values.cpp|54 col 23| warning: comparing 
floating point with == or != is unsafe [-Wfloat-equal]
/Users/bbannier/src/mesos/src/master/allocator/sorter/drf/sorter.cpp|32 col 21| 
warning: comparing floating point with == or != is unsafe [-Wfloat-equal]
/Users/bbannier/src/mesos/src/tests/mesos.hpp|1284 col 30| warning: comparing 
floating point with == or != is unsafe [-Wfloat-equal]
/Users/bbannier/src/mesos/src/tests/mesos.hpp|1284 col 50| warning: comparing 
floating point with == or != is unsafe [-Wfloat-equal]
/Users/bbannier/src/mesos/src/v1/mesos.cpp|345 col 22| warning: comparing 
floating point with == or != is unsafe [-Wfloat-equal]
/Users/bbannier/src/mesos/src/v1/resources.cpp|620 col 38| warning: comparing 
floating point with == or != is unsafe [-Wfloat-equal]
/Users/bbannier/src/mesos/src/v1/values.cpp|55 col 23| warning: comparing 
floating point with == or != is unsafe [-Wfloat-equal]
{code}

> Consider enabling (non-fatal) compiler warnings about float equality 
> comparisons
> 
>
> Key: MESOS-4019
> URL: https://issues.apache.org/jira/browse/MESOS-4019
> Project: Mesos
>  Issue Type: Wish
>Reporter: Benjamin Bannier
> Attachments: build.log
>
>
> Comparing floating point numbers for equality does what is naively expected 
> only in a very limited number of cases. More often than not one should 
> probably instead either choose types allowing exact representation, or switch 
> to a domain-specific equality measure.
> We should consider enabling compiler warnings on such operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3795) process::io::write takes parameter as void* which could be const

2015-11-26 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-3795:
---

Assignee: Benjamin Bannier

> process::io::write takes parameter as void* which could be const
> 
>
> Key: MESOS-3795
> URL: https://issues.apache.org/jira/browse/MESOS-3795
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>  Labels: mesosphere, tech-debt
>
> In libprocess we have
> {code}
> Future write(int fd, void* data, size_t size);
> {code}
> which expects a non-{{const}} {{void*}} for its {{data}} parameter. Under the 
> covers {{data}} appears to be handled as a {{const}} (like one would expect 
> from the signature its inspiration {{::write}}).
> This function is not used too often, but since it expects a non-{{const}} 
> value for {{data}} automatic conversions to {{void*}} from other pointer 
> types are disabled; instead callers seem cast manually to {{void*}} -- often 
> with C-style casts.
> We should sync this method's signature with that of {{::write}}.
> In addition to following the expected semantics of {{::write}}, having this 
> work without casts with any pointer value {{data}} would make it easier to 
> interface this with character literals, or raw data ptrs from STL containers 
> (e.g. {{Container::data}}). It would probably also indirectly eliminate 
> temptation to use C-casts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3579) FetcherCacheTest.LocalUncachedExtract is flaky

2015-11-23 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-3579:
---

Assignee: Benjamin Bannier  (was: Bernd Mathiske)

> FetcherCacheTest.LocalUncachedExtract is flaky
> --
>
> Key: MESOS-3579
> URL: https://issues.apache.org/jira/browse/MESOS-3579
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Anand Mazumdar
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From ASF CI:
> https://builds.apache.org/job/Mesos/866/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] FetcherCacheTest.LocalUncachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA'
> I0925 19:15:39.541198 27410 leveldb.cpp:176] Opened db in 3.43934ms
> I0925 19:15:39.542362 27410 leveldb.cpp:183] Compacted db in 1.136184ms
> I0925 19:15:39.542428 27410 leveldb.cpp:198] Created db iterator in 35866ns
> I0925 19:15:39.542448 27410 leveldb.cpp:204] Seeked to beginning of db in 
> 8807ns
> I0925 19:15:39.542459 27410 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6325ns
> I0925 19:15:39.542505 27410 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0925 19:15:39.543143 27438 recover.cpp:449] Starting replica recovery
> I0925 19:15:39.543393 27438 recover.cpp:475] Replica is in EMPTY status
> I0925 19:15:39.544373 27436 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0925 19:15:39.544791 27433 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0925 19:15:39.545284 27433 recover.cpp:566] Updating replica status to 
> STARTING
> I0925 19:15:39.546155 27436 master.cpp:376] Master 
> c8bf1c95-50f4-4832-a570-c560f0b466ae (f57fd4291168) started on 
> 172.17.1.195:41781
> I0925 19:15:39.546257 27433 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 747249ns
> I0925 19:15:39.546288 27433 replica.cpp:323] Persisted replica status to 
> STARTING
> I0925 19:15:39.546483 27434 recover.cpp:475] Replica is in STARTING status
> I0925 19:15:39.546187 27436 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/master" 
> --zk_session_timeout="10secs"
> I0925 19:15:39.546567 27436 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0925 19:15:39.546617 27436 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0925 19:15:39.546632 27436 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials'
> I0925 19:15:39.546931 27436 master.cpp:467] Using default 'crammd5' 
> authenticator
> I0925 19:15:39.547044 27436 master.cpp:504] Authorization enabled
> I0925 19:15:39.547276 27441 whitelist_watcher.cpp:79] No whitelist given
> I0925 19:15:39.547320 27434 hierarchical.hpp:468] Initialized hierarchical 
> allocator process
> I0925 19:15:39.547471 27438 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0925 19:15:39.548318 27443 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0925 19:15:39.549067 27435 recover.cpp:566] Updating replica status to VOTING
> I0925 19:15:39.549115 27440 master.cpp:1603] The newly elected leader is 
> master@172.17.1.195:41781 with id c8bf1c95-50f4-4832-a570-c560f0b466ae
> I0925 19:15:39.549162 27440 master.cpp:1616] Elected as the leading master!
> I0925 19:15:39.549190 27440 master.cpp:1376] Recovering from registrar
> I0925 19:15:39.549342 27434 registrar.cpp:309] Recovering registrar
> I0925 19:15:39.549666 27430 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 418187ns
> I0925 19:15:39.549753 27430 replica.cpp:323] Persisted replica status to 
> VOTING
> I0925 19:15:39.550089 27442 

[jira] [Assigned] (MESOS-2857) FetcherCacheTest.LocalCachedExtract is flaky.

2015-11-23 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-2857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier reassigned MESOS-2857:
---

Assignee: Benjamin Bannier  (was: Bernd Mathiske)

> FetcherCacheTest.LocalCachedExtract is flaky.
> -
>
> Key: MESOS-2857
> URL: https://issues.apache.org/jira/browse/MESOS-2857
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Benjamin Mahler
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From jenkins:
> {noformat}
> [ RUN  ] FetcherCacheTest.LocalCachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj'
> I0610 20:04:48.591573 24561 leveldb.cpp:176] Opened db in 3.512525ms
> I0610 20:04:48.592456 24561 leveldb.cpp:183] Compacted db in 828630ns
> I0610 20:04:48.592512 24561 leveldb.cpp:198] Created db iterator in 32992ns
> I0610 20:04:48.592531 24561 leveldb.cpp:204] Seeked to beginning of db in 
> 8967ns
> I0610 20:04:48.592545 24561 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 7762ns
> I0610 20:04:48.592604 24561 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0610 20:04:48.593438 24587 recover.cpp:449] Starting replica recovery
> I0610 20:04:48.593698 24587 recover.cpp:475] Replica is in EMPTY status
> I0610 20:04:48.595641 24580 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0610 20:04:48.596086 24590 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0610 20:04:48.596607 24590 recover.cpp:566] Updating replica status to 
> STARTING
> I0610 20:04:48.597507 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 717888ns
> I0610 20:04:48.597535 24590 replica.cpp:323] Persisted replica status to 
> STARTING
> I0610 20:04:48.597697 24590 recover.cpp:475] Replica is in STARTING status
> I0610 20:04:48.599165 24584 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0610 20:04:48.599434 24584 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0610 20:04:48.599915 24590 recover.cpp:566] Updating replica status to VOTING
> I0610 20:04:48.600545 24590 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 432335ns
> I0610 20:04:48.600574 24590 replica.cpp:323] Persisted replica status to 
> VOTING
> I0610 20:04:48.600659 24590 recover.cpp:580] Successfully joined the Paxos 
> group
> I0610 20:04:48.600797 24590 recover.cpp:464] Recover process terminated
> I0610 20:04:48.602905 24594 master.cpp:363] Master 
> 20150610-200448-3875541420-32907-24561 (dbade881e927) started on 
> 172.17.0.231:32907
> I0610 20:04:48.602957 24594 master.cpp:365] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --credentials="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials" 
> --framework_sorter="drf" --help="false" --initialize_driver_logging="true" 
> --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_reregister_timeout="10mins" 
> --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.23.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/master" 
> --zk_session_timeout="10secs"
> I0610 20:04:48.603374 24594 master.cpp:410] Master only allowing 
> authenticated frameworks to register
> I0610 20:04:48.603392 24594 master.cpp:415] Master only allowing 
> authenticated slaves to register
> I0610 20:04:48.603404 24594 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalCachedExtract_Cwdcdj/credentials'
> I0610 20:04:48.603751 24594 master.cpp:454] Using default 'crammd5' 
> authenticator
> I0610 20:04:48.604928 24594 master.cpp:491] Authorization enabled
> I0610 20:04:48.606034 24593 hierarchical.hpp:309] Initialized hierarchical 
> allocator process
> I0610 20:04:48.606106 24593 whitelist_watcher.cpp:79] No whitelist given
> I0610 20:04:48.607430 24594 master.cpp:1476] The newly elected leader is 
> master@172.17.0.231:32907 with id 20150610-200448-3875541420-32907-24561
> I0610 20:04:48.607466 24594 master.cpp:1489] Elected as the leading master!
> I0610 20:04:48.607481 24594 master.cpp:1259] Recovering from registrar
> I0610 20:04:48.607712 24594 registrar.cpp:313] Recovering registrar
> I0610 20:04:48.608543 24588 log.cpp:661] Attempting to start the writer
> I0610 20:04:48.610231 24588 replica.cpp:477] Replica 

[jira] [Updated] (MESOS-3775) MasterAllocatorTest.SlaveLost is slow

2015-11-23 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3775:

Assignee: Jan Schlicht

> MasterAllocatorTest.SlaveLost is slow
> -
>
> Key: MESOS-3775
> URL: https://issues.apache.org/jira/browse/MESOS-3775
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: Jan Schlicht
>Priority: Minor
>  Labels: mesosphere, tech-debt
>
> The {{MasterAllocatorTest.SlaveLost}} takes more that {{5s}} to complete. A 
> brief look into the code hints that the stopped agent does not quit 
> immediately (and hence its resources are not released by the allocator) 
> because [it waits for the executor to 
> terminate|https://github.com/apache/mesos/blob/master/src/tests/master_allocator_tests.cpp#L717].
>  {{5s}} timeout comes from {{EXECUTOR_SHUTDOWN_GRACE_PERIOD}} agent constant.
> Possible solutions:
> * Do not wait until the stopped agent quits (can be flaky, needs deeper 
> analysis).
> * Decrease the agent's {{executor_shutdown_grace_period}} flag.
> * Terminate the executor faster (this may require some refactoring since the 
> executor driver is created in the {{TestContainerizer}} and we do not have 
> direct access to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3579) FetcherCacheTest.LocalUncachedExtract is flaky

2015-11-23 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3579:

Sprint: Mesosphere Sprint 23

> FetcherCacheTest.LocalUncachedExtract is flaky
> --
>
> Key: MESOS-3579
> URL: https://issues.apache.org/jira/browse/MESOS-3579
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Anand Mazumdar
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From ASF CI:
> https://builds.apache.org/job/Mesos/866/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] FetcherCacheTest.LocalUncachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA'
> I0925 19:15:39.541198 27410 leveldb.cpp:176] Opened db in 3.43934ms
> I0925 19:15:39.542362 27410 leveldb.cpp:183] Compacted db in 1.136184ms
> I0925 19:15:39.542428 27410 leveldb.cpp:198] Created db iterator in 35866ns
> I0925 19:15:39.542448 27410 leveldb.cpp:204] Seeked to beginning of db in 
> 8807ns
> I0925 19:15:39.542459 27410 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6325ns
> I0925 19:15:39.542505 27410 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0925 19:15:39.543143 27438 recover.cpp:449] Starting replica recovery
> I0925 19:15:39.543393 27438 recover.cpp:475] Replica is in EMPTY status
> I0925 19:15:39.544373 27436 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0925 19:15:39.544791 27433 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0925 19:15:39.545284 27433 recover.cpp:566] Updating replica status to 
> STARTING
> I0925 19:15:39.546155 27436 master.cpp:376] Master 
> c8bf1c95-50f4-4832-a570-c560f0b466ae (f57fd4291168) started on 
> 172.17.1.195:41781
> I0925 19:15:39.546257 27433 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 747249ns
> I0925 19:15:39.546288 27433 replica.cpp:323] Persisted replica status to 
> STARTING
> I0925 19:15:39.546483 27434 recover.cpp:475] Replica is in STARTING status
> I0925 19:15:39.546187 27436 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/master" 
> --zk_session_timeout="10secs"
> I0925 19:15:39.546567 27436 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0925 19:15:39.546617 27436 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0925 19:15:39.546632 27436 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials'
> I0925 19:15:39.546931 27436 master.cpp:467] Using default 'crammd5' 
> authenticator
> I0925 19:15:39.547044 27436 master.cpp:504] Authorization enabled
> I0925 19:15:39.547276 27441 whitelist_watcher.cpp:79] No whitelist given
> I0925 19:15:39.547320 27434 hierarchical.hpp:468] Initialized hierarchical 
> allocator process
> I0925 19:15:39.547471 27438 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0925 19:15:39.548318 27443 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0925 19:15:39.549067 27435 recover.cpp:566] Updating replica status to VOTING
> I0925 19:15:39.549115 27440 master.cpp:1603] The newly elected leader is 
> master@172.17.1.195:41781 with id c8bf1c95-50f4-4832-a570-c560f0b466ae
> I0925 19:15:39.549162 27440 master.cpp:1616] Elected as the leading master!
> I0925 19:15:39.549190 27440 master.cpp:1376] Recovering from registrar
> I0925 19:15:39.549342 27434 registrar.cpp:309] Recovering registrar
> I0925 19:15:39.549666 27430 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 418187ns
> I0925 19:15:39.549753 27430 replica.cpp:323] Persisted replica status to 
> VOTING
> I0925 19:15:39.550089 27442 recover.cpp:580] Successfully joined 

[jira] [Commented] (MESOS-3579) FetcherCacheTest.LocalUncachedExtract is flaky

2015-11-23 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022170#comment-15022170
 ] 

Benjamin Bannier commented on MESOS-3579:
-

Some extra code was added middle of October to log additional information on 
the fetcher (who looks like the culprit here) in case of failure. We should add 
more information  once it fails again.

> FetcherCacheTest.LocalUncachedExtract is flaky
> --
>
> Key: MESOS-3579
> URL: https://issues.apache.org/jira/browse/MESOS-3579
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, test
>Reporter: Anand Mazumdar
>Assignee: Benjamin Bannier
>  Labels: flaky-test, mesosphere
>
> From ASF CI:
> https://builds.apache.org/job/Mesos/866/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] FetcherCacheTest.LocalUncachedExtract
> Using temporary directory '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA'
> I0925 19:15:39.541198 27410 leveldb.cpp:176] Opened db in 3.43934ms
> I0925 19:15:39.542362 27410 leveldb.cpp:183] Compacted db in 1.136184ms
> I0925 19:15:39.542428 27410 leveldb.cpp:198] Created db iterator in 35866ns
> I0925 19:15:39.542448 27410 leveldb.cpp:204] Seeked to beginning of db in 
> 8807ns
> I0925 19:15:39.542459 27410 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 6325ns
> I0925 19:15:39.542505 27410 replica.cpp:744] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0925 19:15:39.543143 27438 recover.cpp:449] Starting replica recovery
> I0925 19:15:39.543393 27438 recover.cpp:475] Replica is in EMPTY status
> I0925 19:15:39.544373 27436 replica.cpp:641] Replica in EMPTY status received 
> a broadcasted recover request
> I0925 19:15:39.544791 27433 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I0925 19:15:39.545284 27433 recover.cpp:566] Updating replica status to 
> STARTING
> I0925 19:15:39.546155 27436 master.cpp:376] Master 
> c8bf1c95-50f4-4832-a570-c560f0b466ae (f57fd4291168) started on 
> 172.17.1.195:41781
> I0925 19:15:39.546257 27433 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 747249ns
> I0925 19:15:39.546288 27433 replica.cpp:323] Persisted replica status to 
> STARTING
> I0925 19:15:39.546483 27434 recover.cpp:475] Replica is in STARTING status
> I0925 19:15:39.546187 27436 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_slaves="true" --authenticators="crammd5" 
> --authorizers="local" 
> --credentials="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/master" 
> --zk_session_timeout="10secs"
> I0925 19:15:39.546567 27436 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0925 19:15:39.546617 27436 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0925 19:15:39.546632 27436 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/FetcherCacheTest_LocalUncachedExtract_jHBfeA/credentials'
> I0925 19:15:39.546931 27436 master.cpp:467] Using default 'crammd5' 
> authenticator
> I0925 19:15:39.547044 27436 master.cpp:504] Authorization enabled
> I0925 19:15:39.547276 27441 whitelist_watcher.cpp:79] No whitelist given
> I0925 19:15:39.547320 27434 hierarchical.hpp:468] Initialized hierarchical 
> allocator process
> I0925 19:15:39.547471 27438 replica.cpp:641] Replica in STARTING status 
> received a broadcasted recover request
> I0925 19:15:39.548318 27443 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I0925 19:15:39.549067 27435 recover.cpp:566] Updating replica status to VOTING
> I0925 19:15:39.549115 27440 master.cpp:1603] The newly elected leader is 
> master@172.17.1.195:41781 with id c8bf1c95-50f4-4832-a570-c560f0b466ae
> I0925 19:15:39.549162 27440 master.cpp:1616] Elected as the leading master!
> I0925 19:15:39.549190 27440 master.cpp:1376] Recovering from registrar
> I0925 19:15:39.549342 27434 registrar.cpp:309] Recovering registrar
> I0925 19:15:39.549666 27430 leveldb.cpp:306] 

[jira] [Commented] (MESOS-3974) CgroupsAnyHierarchyMemoryPressureTest tests fail on CentOS 6.7.

2015-11-24 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15023985#comment-15023985
 ] 

Benjamin Bannier commented on MESOS-3974:
-

Yes, this counter first appeared with linux' {{v3.10}} tag, so it will not work 
with any earlier kernel (like the default one from centos6.7).

This also means that {{cgroups::CgroupsMemIsolatorProcess}} will not be able to 
get pressure readings on older kernels, and return {{Failures}} for these. 
Right now this isolator process is used only in the {{MesosContainerizer}} 
under {{__linux__}}, so I think selectively disabling this set of tests in the 
stock centos6.7 CI makes sense. Any thoughts [~tillt]?

> CgroupsAnyHierarchyMemoryPressureTest tests fail on CentOS 6.7.
> ---
>
> Key: MESOS-3974
> URL: https://issues.apache.org/jira/browse/MESOS-3974
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.26.0
> Environment: CentOS 6.7, kernel 2.6.32-573.el6.x86_64, gcc 4.8.2, 
> docker 1.7.1
>Reporter: Till Toenshoff
>Assignee: Benjamin Bannier
>  Labels: mesosphere
>
> {noformat}
> GLOG_v=2 sudo ./bin/mesos-tests.sh 
> --gtest_filter="CgroupsAnyHierarchyMemoryPressureTest.*" --verbose
> {noformat}
> {noformat}
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I1120 17:40:40.410383  2467 process.cpp:2426] Spawned process 
> __gc__@127.0.0.1:45300
> I1120 17:40:40.410909  2467 process.cpp:2426] Spawned process 
> help@127.0.0.1:45300
> I1120 17:40:40.410845  2483 process.cpp:2436] Resuming __gc__@127.0.0.1:45300 
> at 2015-11-20 17:40:40.410562048+00:00
> I1120 17:40:40.410970  2467 process.cpp:2426] Spawned process 
> logging@127.0.0.1:45300
> I1120 17:40:40.410995  2467 process.cpp:2426] Spawned process 
> profiler@127.0.0.1:45300
> I1120 17:40:40.411015  2482 process.cpp:2436] Resuming help@127.0.0.1:45300 
> at 2015-11-20 17:40:40.410989056+00:00
> I1120 17:40:40.411063  2467 process.cpp:2426] Spawned process 
> system@127.0.0.1:45300
> I1120 17:40:40.411160  2482 process.cpp:2436] Resuming 
> profiler@127.0.0.1:45300 at 2015-11-20 17:40:40.411155968+00:00
> I1120 17:40:40.411206  2467 process.cpp:2426] Spawned process 
> __limiter__(1)@127.0.0.1:45300
> I1120 17:40:40.411223  2467 process.cpp:2426] Spawned process 
> metrics@127.0.0.1:45300
> I1120 17:40:40.411268  2482 process.cpp:2436] Resuming system@127.0.0.1:45300 
> at 2015-11-20 17:40:40.411266048+00:00
> I1120 17:40:40.411378  2483 process.cpp:2436] Resuming 
> __limiter__(1)@127.0.0.1:45300 at 2015-11-20 17:40:40.411374080+00:00
> I1120 17:40:40.411388  2467 process.cpp:2426] Spawned process 
> __processes__@127.0.0.1:45300
> I1120 17:40:40.411399  2483 process.cpp:2436] Resuming 
> __processes__@127.0.0.1:45300 at 2015-11-20 17:40:40.411397888+00:00
> I1120 17:40:40.411402  2467 process.cpp:965] libprocess is initialized on 
> 127.0.0.1:45300 for 8 cpus
> I1120 17:40:40.411415  2488 process.cpp:2436] Resuming help@127.0.0.1:45300 
> at 2015-11-20 17:40:40.411397888+00:00
> I1120 17:40:40.411432  2467 logging.cpp:177] Logging to STDERR
> I1120 17:40:40.411384  2482 process.cpp:2436] Resuming 
> metrics@127.0.0.1:45300 at 2015-11-20 17:40:40.411379200+00:00
> I1120 17:40:40.411717  2482 process.cpp:2436] Resuming help@127.0.0.1:45300 
> at 2015-11-20 17:40:40.411710976+00:00
> I1120 17:40:40.411813  2487 process.cpp:2436] Resuming 
> logging@127.0.0.1:45300 at 2015-11-20 17:40:40.411789056+00:00
> I1120 17:40:40.411989  2487 process.cpp:2436] Resuming help@127.0.0.1:45300 
> at 2015-11-20 17:40:40.411983872+00:00
> Source directory: /home/vagrant/mesos
> Build directory: /home/vagrant/mesos/build
> -
> We cannot run any cgroups tests that require mounting
> hierarchies because you have the following hierarchies mounted:
> /cgroup/blkio, /cgroup/cpu, /cgroup/cpuacct, /cgroup/cpuset, /cgroup/devices, 
> /cgroup/freezer, /cgroup/memory, /cgroup/net_cls
> We'll disable the CgroupsNoHierarchyTest test fixture for now.
> -
> I1120 17:40:40.414676  2467 process.cpp:2426] Spawned process 
> reaper(1)@127.0.0.1:45300
> I1120 17:40:40.414728  2482 process.cpp:2436] Resuming 
> reaper(1)@127.0.0.1:45300 at 2015-11-20 17:40:40.414701824+00:00
> I1120 17:40:40.415870  2467 process.cpp:2426] Spawned process 
> __latch__(1)@127.0.0.1:45300
> I1120 17:40:40.415913  2483 process.cpp:2436] Resuming __gc__@127.0.0.1:45300 
> at 2015-11-20 17:40:40.415889920+00:00
> I1120 17:40:40.415966  2467 process.cpp:2426] Spawned process 
> __waiter__(1)@127.0.0.1:45300
> I1120 17:40:40.416054  2483 process.cpp:2436] Resuming 
> __latch__(1)@127.0.0.1:45300 at 2015-11-20 17:40:40.416045056+00:00
> I1120 17:40:40.416070  2467 

[jira] [Created] (MESOS-4001) Revisit usage of raw cout and cerr

2015-11-24 Thread Benjamin Bannier (JIRA)
Benjamin Bannier created MESOS-4001:
---

 Summary: Revisit usage of raw cout and cerr
 Key: MESOS-4001
 URL: https://issues.apache.org/jira/browse/MESOS-4001
 Project: Mesos
  Issue Type: Improvement
Reporter: Benjamin Bannier
Priority: Minor


In many instances in mesos and libprocess raw {{std::cout}} and {{std::cerr}} 
is used. This leads to inconsistent logging style, and e.g., missing time 
stamps in log output which can make debugging harder.

Only a few really do need out-of-band logging, since they e.g., display help 
strings.

We should revisit these sites and whenever possible replace with appropriate 
logging macros ({{LOG(\[INFO, WARNING], ..)}}.

Note: ATM none of this applies to code in {{stout}} where we cannot assume that 
glog is available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4271) Consider replacing libtool with dolt to speed up build

2016-01-12 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4271:

Labels: build mesosphere  (was: build)

> Consider replacing libtool with dolt to speed up build
> --
>
> Key: MESOS-4271
> URL: https://issues.apache.org/jira/browse/MESOS-4271
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Benjamin Bannier
>Assignee: Benjamin Bannier
>Priority: Minor
>  Labels: build, mesosphere
>
> Mesos uses a pretty standard autotools setup for the build so that 
> {{libtool}} is used extensively to abstract away the aspects of library 
> creation (both compiling source files, and creating the libraries). For some 
> versions of {{libtool}} its invocation can add considerably to the overall 
> build time.
> Dolt provides a much more condensed implementation of {{libtool}}'s 
> functionality for modern platforms (<100 locs vs ~10 klocs), so that it can 
> run much faster. We should investigate whether activating dolt makes sense.
> I tested dolt under OS X 10.10.5. I first primed ccache and then rebuilt 
> mesos-related objects,
> {code}
> ./configure --disable-python --disable-java  # benchmark mostly C & C++ file 
> compile and link
> make check GTEST_FILTER=''   # prime ccache
> make mostlyclean # remove most mesos objects and 
> libs
> make -jN check GTEST_FILTER=''   # rebuild
> {code}
> |||  user [s] | real [s]| sys [s]||
> | make -j10 (dolt)|  42.8±0.1 |  54.3±0.2 |  34.1±0.2 |
> | make -j10 (libtool) |  65.6±0.3 | 148.7±1.1 | 108.5±1.0 |
> | make -j1 (dolt) |  76.9±0.3 |  45.5±0.1 |  27.1±0.1 |
> | make -j1 (libtool)  | 168.2±2.3 |  97.5±1.5 |  75.8±1.3 |



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4329) SlaveTest.LaunchTaskInfoWithContainerInfo cannot be execute in isolation

2016-01-12 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-4329:

Labels: mesosphere tech-debt  (was: tech-debt)

> SlaveTest.LaunchTaskInfoWithContainerInfo cannot be execute in isolation
> 
>
> Key: MESOS-4329
> URL: https://issues.apache.org/jira/browse/MESOS-4329
> Project: Mesos
>  Issue Type: Bug
>  Components: test
>Reporter: Benjamin Bannier
>  Labels: mesosphere, tech-debt
>
> Executing {{MasterQuotaTest.NoAuthenticationNoAuthorization}} from 
> {{468b8ec}} under OS X 10.10.5 in isolation fails due to missing cleanup,
> {code}
> % ./bin/mesos-tests.sh 
> --gtest_filter=SlaveTest.LaunchTaskInfoWithContainerInfo
> Source directory: /ABC/DEF/src/mesos
> Build directory: /ABC/DEF/src/mesos/build
> -
> We cannot run any Docker tests because:
> Docker tests not supported on non-Linux systems
> -
> /usr/bin/nc
> /usr/bin/curl
> Note: Google Test filter = 
> 

  1   2   3   4   5   6   7   8   9   10   >