[jira] [Commented] (MESOS-4160) Log recover tests are slow

2016-02-10 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140992#comment-15140992
 ] 

Shuai Lin commented on MESOS-4160:
--

The slowness is because replica1 and replica2 is is {{EMPTY}} status and 
retries with random backoff of  {{[0.5 sec, 1 sec]}}. Currently the retry 
interval is hard coded and not configurable. 

https://github.com/apache/mesos/blob/0.27.0/src/log/recover.cpp#L328-L339


> Log recover tests are slow
> --
>
> Key: MESOS-4160
> URL: https://issues.apache.org/jira/browse/MESOS-4160
> Project: Mesos
>  Issue Type: Improvement
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: Shuai Lin
>Priority: Minor
>  Labels: mesosphere, newbie++, tech-debt
>
> On Mac OS 10.10.4, some tests take longer than {{1s}} to finish:
> {code}
> RecoverTest.AutoInitialization (1003 ms)
> RecoverTest.AutoInitializationRetry (1000 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4157) Speed up ZooKeeper-related tests

2016-02-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140773#comment-15140773
 ] 

haosdent commented on MESOS-4157:
-

Most slow zookeeper test cases is because of 
[ZOOKEEPER-770|https://issues.apache.org/jira/browse/ZOOKEEPER-770]

> Speed up ZooKeeper-related tests
> 
>
> Key: MESOS-4157
> URL: https://issues.apache.org/jira/browse/MESOS-4157
> Project: Mesos
>  Issue Type: Epic
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>Priority: Minor
>  Labels: mesosphere, newbie++, tech-debt
>
> Execution times on Mac OS 10.10.4:
> {code}
> ZooKeeperTest.Auth (6688 ms)
> ZooKeeperTest.Create (6690 ms)
> ZooKeeperTest.LeaderContender (3385 ms)
> MasterZooKeeperTest.MasterInfoAddress (11282 ms)
> ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors (10053 ms)
> ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork (3390 
> ms)
> ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSession (3358 
> ms)
> ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster
>  (3359 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4635) CoordinatorTest.AppendDiscarded is flaky

2016-02-10 Thread Greg Mann (JIRA)
Greg Mann created MESOS-4635:


 Summary: CoordinatorTest.AppendDiscarded is flaky
 Key: MESOS-4635
 URL: https://issues.apache.org/jira/browse/MESOS-4635
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.27.0
 Environment: Ubuntu 14.04 with clang
Reporter: Greg Mann


Just saw this failure on the ASF Jenkins CI:

{code}
[ RUN  ] CoordinatorTest.AppendDiscarded
I0210 09:34:39.188288 31550 leveldb.cpp:174] Opened db in 2.043145ms
I0210 09:34:39.189136 31550 leveldb.cpp:181] Compacted db in 811003ns
I0210 09:34:39.189182 31550 leveldb.cpp:196] Created db iterator in 27506ns
I0210 09:34:39.189208 31550 leveldb.cpp:202] Seeked to beginning of db in 
10415ns
I0210 09:34:39.189224 31550 leveldb.cpp:271] Iterated through 0 keys in the db 
in 8230ns
I0210 09:34:39.189260 31550 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0210 09:34:39.190004 31577 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 471666ns
I0210 09:34:39.190028 31577 replica.cpp:320] Persisted replica status to VOTING
I0210 09:34:39.192812 31550 leveldb.cpp:174] Opened db in 2.215995ms
I0210 09:34:39.193488 31550 leveldb.cpp:181] Compacted db in 660244ns
I0210 09:34:39.193528 31550 leveldb.cpp:196] Created db iterator in 23068ns
I0210 09:34:39.193554 31550 leveldb.cpp:202] Seeked to beginning of db in 
10451ns
I0210 09:34:39.193570 31550 leveldb.cpp:271] Iterated through 0 keys in the db 
in 7996ns
I0210 09:34:39.193603 31550 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0210 09:34:39.194510 31569 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 393072ns
I0210 09:34:39.194537 31569 replica.cpp:320] Persisted replica status to VOTING
I0210 09:34:39.196895 31550 leveldb.cpp:174] Opened db in 1.804552ms
I0210 09:34:39.198554 31550 leveldb.cpp:181] Compacted db in 1.642208ms
I0210 09:34:39.198593 31550 leveldb.cpp:196] Created db iterator in 19381ns
I0210 09:34:39.198633 31550 leveldb.cpp:202] Seeked to beginning of db in 
35677ns
I0210 09:34:39.198673 31550 leveldb.cpp:271] Iterated through 1 keys in the db 
in 26460ns
I0210 09:34:39.198703 31550 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0210 09:34:39.200898 31550 leveldb.cpp:174] Opened db in 2.09532ms
I0210 09:34:39.202641 31550 leveldb.cpp:181] Compacted db in 1.7251ms
I0210 09:34:39.202697 31550 leveldb.cpp:196] Created db iterator in 39337ns
I0210 09:34:39.202836 31550 leveldb.cpp:202] Seeked to beginning of db in 
34194ns
I0210 09:34:39.202965 31550 leveldb.cpp:271] Iterated through 1 keys in the db 
in 39383ns
I0210 09:34:39.203088 31550 replica.cpp:779] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I0210 09:34:39.204413 31573 replica.cpp:493] Replica received implicit promise 
request from (2636)@172.17.0.2:58132 with proposal 1
I0210 09:34:39.204494 31572 replica.cpp:493] Replica received implicit promise 
request from (2637)@172.17.0.2:58132 with proposal 1
I0210 09:34:39.204854 31573 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 417201ns
I0210 09:34:39.204880 31573 replica.cpp:342] Persisted promised to 1
I0210 09:34:39.205060 31572 leveldb.cpp:304] Persisting metadata (8 bytes) to 
leveldb took 471800ns
I0210 09:34:39.205087 31572 replica.cpp:342] Persisted promised to 1
I0210 09:34:39.205577 31582 coordinator.cpp:238] Coordinator attempting to fill 
missing positions
I0210 09:34:39.206393 31579 replica.cpp:388] Replica received explicit promise 
request from (2638)@172.17.0.2:58132 for position 0 with proposal 2
I0210 09:34:39.206569 31578 replica.cpp:388] Replica received explicit promise 
request from (2639)@172.17.0.2:58132 for position 0 with proposal 2
I0210 09:34:39.206840 31579 leveldb.cpp:341] Persisting action (8 bytes) to 
leveldb took 335263ns
I0210 09:34:39.206881 31579 replica.cpp:712] Persisted action at 0
I0210 09:34:39.207236 31578 leveldb.cpp:341] Persisting action (8 bytes) to 
leveldb took 442481ns
I0210 09:34:39.207258 31578 replica.cpp:712] Persisted action at 0
I0210 09:34:39.208065 31577 replica.cpp:537] Replica received write request for 
position 0 from (2640)@172.17.0.2:58132
I0210 09:34:39.208160 31568 replica.cpp:537] Replica received write request for 
position 0 from (2641)@172.17.0.2:58132
I0210 09:34:39.208206 31568 leveldb.cpp:436] Reading position from leveldb took 
67699ns
I0210 09:34:39.208117 31577 leveldb.cpp:436] Reading position from leveldb took 
225587ns
I0210 09:34:39.208647 31568 leveldb.cpp:341] Persisting action (14 bytes) to 
leveldb took 374594ns
I0210 09:34:39.208652 31577 leveldb.cpp:341] Persisting action (14 bytes) to 
leveldb took 317146ns
I0210 09:34:39.208673 31568 replica.cpp:712] Persisted action at 0
I0210 09:34:39.208683 31577 replica.cpp:712] Persisted action at 0
I0210 09:34:39.209205 

[jira] [Commented] (MESOS-3833) /help endpoints do not work for nested paths

2016-02-10 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140768#comment-15140768
 ] 

Guangya Liu commented on MESOS-3833:


Thanks [~bmahler] and [~vinodkone] , the patch was updated, can you please help 
take a look again? Thanks ;-)

> /help endpoints do not work for nested paths
> 
>
> Key: MESOS-3833
> URL: https://issues.apache.org/jira/browse/MESOS-3833
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Mesos displays the list of all supported endpoints starting at a given path 
> prefix using the {{/help}} suffix, e.g. {{master:5050/help}}.
> It seems that the {{help}} functionality is broken for URL's having nested 
> paths e.g. {{master:5050/help/master/machine/down}}. The response returned is:
> {quote}
> Malformed URL, expecting '/help/id/name/'
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4157) Speed up ZooKeeper-related tests

2016-02-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141099#comment-15141099
 ] 

haosdent commented on MESOS-4157:
-

After apply that patch:
{code}
[   OK ] ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors (321 ms)
[   OK ] 
ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork (3567 ms)
[   OK ] 
ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSession (3526 
ms)
[   OK ] 
ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster
 (3591 ms)
[   OK ] MasterZooKeeperTest.MasterInfoAddress (447 ms)
[   OK ] ZooKeeperTest.Auth (233 ms)
[   OK ] ZooKeeperTest.Create (275 ms)
[   OK ] ZooKeeperTest.LeaderContender (7233 ms)
{code}

> Speed up ZooKeeper-related tests
> 
>
> Key: MESOS-4157
> URL: https://issues.apache.org/jira/browse/MESOS-4157
> Project: Mesos
>  Issue Type: Epic
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>Priority: Minor
>  Labels: mesosphere, newbie++, tech-debt
>
> Execution times on Mac OS 10.10.4:
> {code}
> ZooKeeperTest.Auth (6688 ms)
> ZooKeeperTest.Create (6690 ms)
> ZooKeeperTest.LeaderContender (3385 ms)
> MasterZooKeeperTest.MasterInfoAddress (11282 ms)
> ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors (10053 ms)
> ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork (3390 
> ms)
> ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSession (3358 
> ms)
> ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster
>  (3359 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4640) Logrotate container logger can die with agent unit on systemd.

2016-02-10 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4640:

Summary: Logrotate container logger can die with agent unit on systemd.  
(was: Logrotate container logger is associated with agent unit on systemd.)

> Logrotate container logger can die with agent unit on systemd.
> --
>
> Key: MESOS-4640
> URL: https://issues.apache.org/jira/browse/MESOS-4640
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.27.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4370) NetworkSettings.IPAddress field is deprecated in Docker

2016-02-10 Thread Kapil Arya (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141205#comment-15141205
 ] 

Kapil Arya commented on MESOS-4370:
---

[~travis.hegner]: Can you take a look at the reviews and address them? It's 
almost there :-).

> NetworkSettings.IPAddress field is deprecated in Docker
> ---
>
> Key: MESOS-4370
> URL: https://issues.apache.org/jira/browse/MESOS-4370
> Project: Mesos
>  Issue Type: Bug
>  Components: containerization, docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
> Environment: Ubuntu 14.04
> Docker 1.9.1
>Reporter: Clint Armstrong
>Assignee: Travis Hegner
>
> The latest docker API deprecates the NetworkSettings.IPAddress field, in 
> favor of the NetworkSettings.Networks field.
> https://docs.docker.com/engine/reference/api/docker_remote_api/#v1-21-api-changes
> With this deprecation, NetworkSettings.IPAddress is not populated for 
> containers running with networks that use new network plugins.
> As a result the mesos API has no data in 
> container_status.network_infos.ip_address or 
> container_status.network_infos.ipaddresses.
> The immediate impact of this is that mesos-dns is unable to retrieve a 
> containers IP from the netinfo interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4638) versioning preprocessor macros

2016-02-10 Thread James Peach (JIRA)
James Peach created MESOS-4638:
--

 Summary: versioning preprocessor macros
 Key: MESOS-4638
 URL: https://issues.apache.org/jira/browse/MESOS-4638
 Project: Mesos
  Issue Type: Bug
  Components: c++ api
Reporter: James Peach


The macros in {{version.hpp}} cannot be used for conditional build because they 
are strings not integers. It would be helpful to have integer versions of these 
for conditionally building code against different versions of the Mesos API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1778) Provide an option to validate flag value in stout/flags.

2016-02-10 Thread Isabel Jimenez (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141129#comment-15141129
 ] 

Isabel Jimenez commented on MESOS-1778:
---

This was added in: 
https://github.com/apache/mesos/commit/5596233382844da05b82a7769c726a8cdd1bfa17

> Provide an option to validate flag value in stout/flags. 
> -
>
> Key: MESOS-1778
> URL: https://issues.apache.org/jira/browse/MESOS-1778
> Project: Mesos
>  Issue Type: Improvement
>  Components: stout
>Reporter: Alexander Rukletsov
>Assignee: Isabel Jimenez
>Priority: Minor
>
> Currently we can provide the default value for a flag, but cannot check if 
> the flag is set to a reasonable value and, e.g., issue a warning. Passing an 
> optional lambda checker to {{FlagBase::add()}} can be a possible solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4639) Posix process executor is associated with agent unit on systemd.

2016-02-10 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-4639:
---

 Summary: Posix process executor is associated with agent unit on 
systemd.
 Key: MESOS-4639
 URL: https://issues.apache.org/jira/browse/MESOS-4639
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.25.0, 0.26.0, 0.27.0
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4640) Logrotate container logger is associated with agent unit on systemd.

2016-02-10 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-4640:
---

 Summary: Logrotate container logger is associated with agent unit 
on systemd.
 Key: MESOS-4640
 URL: https://issues.apache.org/jira/browse/MESOS-4640
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.27.0
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4615) ContainerLoggerTest.DefaultToSandbox is flaky

2016-02-10 Thread Greg Mann (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Mann updated MESOS-4615:
-
Labels: flaky-test logger mesosphere  (was: flaky-test mesosphere)

> ContainerLoggerTest.DefaultToSandbox is flaky
> -
>
> Key: MESOS-4615
> URL: https://issues.apache.org/jira/browse/MESOS-4615
> Project: Mesos
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 0.27.0
> Environment: CentOS 7, gcc, libevent & SSL enabled
>Reporter: Greg Mann
>Assignee: Joseph Wu
>  Labels: flaky-test, logger, mesosphere
>
> Just saw this failure on the ASF CI:
> {code}
> [ RUN  ] ContainerLoggerTest.DefaultToSandbox
> I0206 01:25:03.766458  2824 leveldb.cpp:174] Opened db in 72.979786ms
> I0206 01:25:03.811712  2824 leveldb.cpp:181] Compacted db in 45.162067ms
> I0206 01:25:03.811810  2824 leveldb.cpp:196] Created db iterator in 26090ns
> I0206 01:25:03.811828  2824 leveldb.cpp:202] Seeked to beginning of db in 
> 3173ns
> I0206 01:25:03.811839  2824 leveldb.cpp:271] Iterated through 0 keys in the 
> db in 497ns
> I0206 01:25:03.811900  2824 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0206 01:25:03.812785  2849 recover.cpp:447] Starting replica recovery
> I0206 01:25:03.813043  2849 recover.cpp:473] Replica is in EMPTY status
> I0206 01:25:03.814668  2854 replica.cpp:673] Replica in EMPTY status received 
> a broadcasted recover request from (371)@172.17.0.8:37843
> I0206 01:25:03.815210  2849 recover.cpp:193] Received a recover response from 
> a replica in EMPTY status
> I0206 01:25:03.815732  2854 recover.cpp:564] Updating replica status to 
> STARTING
> I0206 01:25:03.819664  2857 master.cpp:376] Master 
> 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de (74ef606c4063) started on 
> 172.17.0.8:37843
> I0206 01:25:03.819703  2857 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/h5vu5I/credentials" --framework_sorter="drf" 
> --help="false" --hostname_lookup="true" --http_authenticators="basic" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_completed_frameworks="50" 
> --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="100secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.28.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/h5vu5I/master" --zk_session_timeout="10secs"
> I0206 01:25:03.820241  2857 master.cpp:423] Master only allowing 
> authenticated frameworks to register
> I0206 01:25:03.820257  2857 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I0206 01:25:03.820269  2857 credentials.hpp:35] Loading credentials for 
> authentication from '/tmp/h5vu5I/credentials'
> I0206 01:25:03.821110  2857 master.cpp:468] Using default 'crammd5' 
> authenticator
> I0206 01:25:03.821311  2857 master.cpp:537] Using default 'basic' HTTP 
> authenticator
> I0206 01:25:03.821636  2857 master.cpp:571] Authorization enabled
> I0206 01:25:03.821979  2846 hierarchical.cpp:144] Initialized hierarchical 
> allocator process
> I0206 01:25:03.822057  2846 whitelist_watcher.cpp:77] No whitelist given
> I0206 01:25:03.825460  2847 master.cpp:1712] The newly elected leader is 
> master@172.17.0.8:37843 with id 914b62f9-95f6-4c57-a7e3-9b06e2c1c8de
> I0206 01:25:03.825512  2847 master.cpp:1725] Elected as the leading master!
> I0206 01:25:03.825533  2847 master.cpp:1470] Recovering from registrar
> I0206 01:25:03.825835  2847 registrar.cpp:307] Recovering registrar
> I0206 01:25:03.848212  2854 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 32.226093ms
> I0206 01:25:03.848299  2854 replica.cpp:320] Persisted replica status to 
> STARTING
> I0206 01:25:03.848702  2854 recover.cpp:473] Replica is in STARTING status
> I0206 01:25:03.850728  2858 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (373)@172.17.0.8:37843
> I0206 01:25:03.851230  2854 recover.cpp:193] Received a recover response from 
> a replica in STARTING status
> I0206 01:25:03.852018  2854 recover.cpp:564] Updating replica status to VOTING
> I0206 01:25:03.881681  2854 leveldb.cpp:304] Persisting metadata (8 bytes) to 
> leveldb took 29.184163ms
> I0206 01:25:03.881772  2854 replica.cpp:320] Persisted replica status to 

[jira] [Updated] (MESOS-4637) Docker process executor can die with agent unit on systemd.

2016-02-10 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4637:

Summary: Docker process executor can die with agent unit on systemd.  (was: 
Docker process executor is associated with agent unit on systemd.)

> Docker process executor can die with agent unit on systemd.
> ---
>
> Key: MESOS-4637
> URL: https://issues.apache.org/jira/browse/MESOS-4637
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4639) Posix process executor can die with agent unit on systemd.

2016-02-10 Thread Joris Van Remoortere (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joris Van Remoortere updated MESOS-4639:

Summary: Posix process executor can die with agent unit on systemd.  (was: 
Posix process executor is associated with agent unit on systemd.)

> Posix process executor can die with agent unit on systemd.
> --
>
> Key: MESOS-4639
> URL: https://issues.apache.org/jira/browse/MESOS-4639
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.25.0, 0.26.0, 0.27.0
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4636) Add parent hook to subprocess.

2016-02-10 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-4636:
---

 Summary: Add parent hook to subprocess.
 Key: MESOS-4636
 URL: https://issues.apache.org/jira/browse/MESOS-4636
 Project: Mesos
  Issue Type: Improvement
  Components: libprocess
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4637) Docker process executor is associated with agent unit on systemd.

2016-02-10 Thread Joris Van Remoortere (JIRA)
Joris Van Remoortere created MESOS-4637:
---

 Summary: Docker process executor is associated with agent unit on 
systemd.
 Key: MESOS-4637
 URL: https://issues.apache.org/jira/browse/MESOS-4637
 Project: Mesos
  Issue Type: Bug
  Components: docker
Affects Versions: 0.25.0, 0.26.0, 0.27.0
Reporter: Joris Van Remoortere
Assignee: Joris Van Remoortere






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4358) Expose net_cls network handles in agent's state endpoint

2016-02-10 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4358:
--
Sprint: Mesosphere Sprint 28

> Expose net_cls network handles in agent's state endpoint
> 
>
> Key: MESOS-4358
> URL: https://issues.apache.org/jira/browse/MESOS-4358
> Project: Mesos
>  Issue Type: Task
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: container, containerizer, mesosphere
> Fix For: 0.28.0
>
>
> We need to expose net_cls network handles, associated with containers, to 
> operators and network utilities that would use these network handles to 
> enforce network policy. 
> In order to achieve the above we need to add a new field in the `NetworkInfo` 
> protobuf (say NetHandles) and update this field when a container gets 
> assigned to a net_cls cgroup. The `ContainerStatus` protobuf already has the 
> `NetworkInfo` protobuf as a nested message, and the `ContainerStatus` itself 
> is exposed to operators as part of TaskInfo (for tasks associated with the 
> container) in an agent's state.json. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4641) Support Container Network Interface (CNI).

2016-02-10 Thread Jie Yu (JIRA)
Jie Yu created MESOS-4641:
-

 Summary: Support Container Network Interface (CNI).
 Key: MESOS-4641
 URL: https://issues.apache.org/jira/browse/MESOS-4641
 Project: Mesos
  Issue Type: Epic
Reporter: Jie Yu


CoreOS developed the Container Network Interface (CNI), a proposed standard for 
configuring network interfaces for Linux containers. Many CNI plugins (e.g., 
calico) have already been developed.
https://coreos.com/blog/rkt-cni-networking.html
https://github.com/appc/cni/blob/master/SPEC.md

Also, Kubernetes claimed that they'll support CNI as well.
http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-use-libnetwork.html

In the context of Unified Containerizer, it would be nice if we can have a 
'network/cni' isolator which will speak the CNI protocol and prepare the 
network for the container.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3007) Support systemd with Mesos containerizer

2016-02-10 Thread Joris Van Remoortere (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141386#comment-15141386
 ] 

Joris Van Remoortere commented on MESOS-3007:
-

[~dennyhina] Can you provide context / logs for why the systemd existence check 
passes, yet {{systemctl}} is not available or failed?
How are you identifying whether systemd is running?

> Support systemd with Mesos containerizer
> 
>
> Key: MESOS-3007
> URL: https://issues.apache.org/jira/browse/MESOS-3007
> Project: Mesos
>  Issue Type: Epic
>Reporter: Artem Harutyunyan
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3486) Use DROP_PROTOBUF instead of DROP_MESSAGE in tests

2016-02-10 Thread Michael Browning (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141441#comment-15141441
 ] 

Michael Browning commented on MESOS-3486:
-

This seems pretty straightforward -- `DROP_MESSAGE(S)` and `FUTURE_MESSAGE` are 
macros defined in `process/gmock.hpp`, so I'll update the definitions and then 
any further invocations of those macros in the test suite.

> Use DROP_PROTOBUF instead of DROP_MESSAGE in tests
> --
>
> Key: MESOS-3486
> URL: https://issues.apache.org/jira/browse/MESOS-3486
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Michael Browning
>Priority: Trivial
>  Labels: mesosphere, newbie, tests
>
> The tests use DROP_MESSAGE(), DROP_MESSAGES(), and FUTURE_MESSAGE() in 
> various places where it would be more clear and concise to use 
> DROP_PROTOBUF(), DROP_PROTOBUFS(), and FUTURE_PROTOBUF() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3486) Use DROP_PROTOBUF instead of DROP_MESSAGE in tests

2016-02-10 Thread Michael Browning (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Browning reassigned MESOS-3486:
---

Assignee: Michael Browning

> Use DROP_PROTOBUF instead of DROP_MESSAGE in tests
> --
>
> Key: MESOS-3486
> URL: https://issues.apache.org/jira/browse/MESOS-3486
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Neil Conway
>Assignee: Michael Browning
>Priority: Trivial
>  Labels: mesosphere, newbie, tests
>
> The tests use DROP_MESSAGE(), DROP_MESSAGES(), and FUTURE_MESSAGE() in 
> various places where it would be more clear and concise to use 
> DROP_PROTOBUF(), DROP_PROTOBUFS(), and FUTURE_PROTOBUF() instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4642) Mesos Agent Json API can dump binary data from log files out as invalid JSON

2016-02-10 Thread Steven Schlansker (JIRA)
Steven Schlansker created MESOS-4642:


 Summary: Mesos Agent Json API can dump binary data from log files 
out as invalid JSON
 Key: MESOS-4642
 URL: https://issues.apache.org/jira/browse/MESOS-4642
 Project: Mesos
  Issue Type: Bug
  Components: json api, slave
Affects Versions: 0.27.0
Reporter: Steven Schlansker
Priority: Critical


One of our tasks accidentally started logging binary data to stderr.  This was 
not intentional and generally should not happen -- however, it causes severe 
problems with the Mesos Agent "files/read.json" API, since it gladly dumps this 
binary data out as invalid JSON.

{code}
# hexdump -C /path/to/task/stderr | tail
0003d1f0  6f 6e 6e 65 63 74 69 6f  6e 0a 4e 45 54 3a 20 31  |onnection.NET: 1|
0003d200  20 6f 6e 72 65 61 64 20  45 4e 4f 45 4e 54 20 32  | onread ENOENT 2|
0003d210  39 35 34 35 36 20 32 35  31 20 32 39 35 37 30 37  |95456 251 295707|
0003d220  0a 01 00 00 00 00 00 00  ac 57 65 64 2c 20 31 30  |.Wed, 10|
0003d230  20 55 6e 72 65 63 6f 67  6e 69 7a 65 64 20 69 6e  | Unrecognized in|
0003d240  70 75 74 20 68 65 61 64  65 72 0a |put header.|
{code}

{code}
# curl 
'http://agent-host:5051/files/read.json?path=/path/to/task/stderr=220443=9='
 | hexdump -C
7970  6e 65 63 74 69 6f 6e 5c  6e 4e 45 54 3a 20 31 20  |nection\nNET: 1 |
7980  6f 6e 72 65 61 64 20 45  4e 4f 45 4e 54 20 32 39  |onread ENOENT 29|
7990  35 34 35 36 20 32 35 31  20 32 39 35 37 30 37 5c  |5456 251 295707\|
79a0  6e 5c 75 30 30 30 31 5c  75 30 30 30 30 5c 75 30  |n\u0001\u\u0|
79b0  30 30 30 5c 75 30 30 30  30 5c 75 30 30 30 30 5c  |000\u\u\|
79c0  75 30 30 30 30 5c 75 30  30 30 30 ac 57 65 64 2c  |u\u.Wed,|
79d0  20 31 30 20 55 6e 72 65  63 6f 67 6e 69 7a 65 64  | 10 Unrecognized|
79e0  20 69 6e 70 75 74 20 68  65 61 64 65 72 5c 6e 22  | input header\n"|
79f0  2c 22 6f 66 66 73 65 74  22 3a 32 32 30 34 34 33  |,"offset":220443|
7a00  7d|}|
{code}

This causes downstream sadness:
{code}
ERROR [2016-02-10 18:55:12,303] 
io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: 
0ee749630f8b26f1
! com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xac
!  at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@6d69ee8; line: 
1, column: 31181]
! at 
com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1487) 
~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3339)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2360)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2287)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:286)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:29)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:12)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:523)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:381)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1073)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserializeFromObject(SuperSonicBeanDeserializer.java:196)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:142)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserialize(SuperSonicBeanDeserializer.java:117)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3562)
 ~[singularity-0.4.9.jar:0.4.9]
! at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2648) 
~[singularity-0.4.9.jar:0.4.9]
! at com.hubspot.singularity.data.SandboxManager.read(SandboxManager.java:97) 
~[singularity-0.4.9.jar:0.4.9]
{code}


[jira] [Commented] (MESOS-4631) Document how to use custom authentication modules

2016-02-10 Thread Disha Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141378#comment-15141378
 ] 

Disha Singh commented on MESOS-4631:


I would like to take up this issue. Can anyone be the shepherd for this ? 
Thanks.

> Document how to use custom authentication modules
> -
>
> Key: MESOS-4631
> URL: https://issues.apache.org/jira/browse/MESOS-4631
> Project: Mesos
>  Issue Type: Documentation
>  Components: documentation
>Reporter: Neil Conway
>Priority: Minor
>  Labels: authentication, documentation, mesosphere
>
> The authentication doc page talks about custom authentication modules a bit, 
> but doesn't give enough information. For example:
> * What interface does a custom authentication module need to satisfy?
> * Can multiple authentication modules be used?
> * How do I implement a framework that authenticates with a master that uses a 
> non-default authentication module, e.g., one that doesn't use credentials?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4343) Introduce the ability to assign network handles to mesos containers

2016-02-10 Thread Avinash Sridharan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avinash Sridharan updated MESOS-4343:
-
Description: 
Linux provides net_cls as a cgroup subsystem. A net_cls cgroup is associated 
with a 16-bit major handle and a 16-bit minor handle.  When a task is 
associated with a net_cls cgroup, the kernel tags every packet being generated 
by the task with the major and minor handle associated with the net_cls cgroup. 
These tags are then used by network performance shaping and firewall tools such 
as tc (traffic controller) and iptables. 

Currently, mesos agents do not provide any isolator that can enable 
mesos-containers in a net_cls cgroup, or assign network handles to a net_cls 
cgroup. As part of this epic we plan to achieve the following:

a)  Implement net_cls cgroup isolator for mesos agents.
b)  Implement a manager for the net_cls handles.
c)  Allow operators to set a major network handle when launching an agent. 
d)  Expose the net_cls network handle allocated to a container, to entities 
such as operators and frameworks. 

Once the above goals are met operators can learn about network handles 
allocated to containers and apply them to tools such as tc and iptables to 
enforce network policies. 

  was:
Linux provides net_cls as a cgroup subsystem. A net_cls cgroup is associated 
with a 16-bit major handle and a 16-bit minor handle.  When a task is 
associated with a net_cls cgroup, the kernel tags every packet being generated 
by the task with the major and minor handle associated with the net_cls cgroup 
that the task belongs too. These tags are then used by network performance 
shaping and firewall tools such as tc (traffic controller) and iptables. 

Currently, mesos agents do not provide any isolator that can enable 
mesos-containers in a net_cls cgroup, or assign network handles to a net_cls 
cgroup. As part of this epic we plan to achieve the following:

a)  Implement net_cls cgroup isolator for mesos agents.
b)  Implement an net-handles allocator class that can manage.
c)  Allow operators to set a major network handle when launching an agent. 
d)  Expose the net_cls network handle allocated to a container, to entities 
such as operators and frameworks. 

Once the above goals are met operators can learn about network handles 
allocated to containers and apply them to tools such as tc and iptables to 
enforce network policies. 


> Introduce the ability to assign network handles to mesos containers
> ---
>
> Key: MESOS-4343
> URL: https://issues.apache.org/jira/browse/MESOS-4343
> Project: Mesos
>  Issue Type: Epic
>  Components: containerization
>Reporter: Avinash Sridharan
>Assignee: Avinash Sridharan
>  Labels: containers, mesosphere
> Fix For: 0.28.0
>
>
> Linux provides net_cls as a cgroup subsystem. A net_cls cgroup is associated 
> with a 16-bit major handle and a 16-bit minor handle.  When a task is 
> associated with a net_cls cgroup, the kernel tags every packet being 
> generated by the task with the major and minor handle associated with the 
> net_cls cgroup. These tags are then used by network performance shaping and 
> firewall tools such as tc (traffic controller) and iptables. 
> Currently, mesos agents do not provide any isolator that can enable 
> mesos-containers in a net_cls cgroup, or assign network handles to a net_cls 
> cgroup. As part of this epic we plan to achieve the following:
> a)  Implement net_cls cgroup isolator for mesos agents.
> b)  Implement a manager for the net_cls handles.
> c)  Allow operators to set a major network handle when launching an agent. 
> d)  Expose the net_cls network handle allocated to a container, to entities 
> such as operators and frameworks. 
> Once the above goals are met operators can learn about network handles 
> allocated to containers and apply them to tools such as tc and iptables to 
> enforce network policies. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4633) Tests will dereference stack allocated agent objects upon assertion/expectation failure.

2016-02-10 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-4633:
-
Shepherd: Bernd Mathiske
  Labels: flaky mesosphere tech-debt test  (was: flaky mesosphere test)

> Tests will dereference stack allocated agent objects upon 
> assertion/expectation failure.
> 
>
> Key: MESOS-4633
> URL: https://issues.apache.org/jira/browse/MESOS-4633
> Project: Mesos
>  Issue Type: Bug
>Reporter: Joseph Wu
>Assignee: Joseph Wu
>  Labels: flaky, mesosphere, tech-debt, test
>
> Tests that use the {{StartSlave}} test helper are generally fragile when the 
> test fails an assert/expect in the middle of the test.  This is because the 
> {{StartSlave}} helper takes raw pointer arguments, which may be 
> stack-allocated.
> In case of an assert failure, the test immediately exits (destroying stack 
> allocated objects) and proceeds onto test cleanup.  The test cleanup may 
> dereference some of these destroyed objects, leading to a test crash like:
> {code}
> [18:27:36][Step 8/8] F0204 18:27:35.981302 23085 logging.cpp:64] RAW: Pure 
> virtual method called
> [18:27:36][Step 8/8] @ 0x7f7077055e1c  google::LogMessage::Fail()
> [18:27:36][Step 8/8] @ 0x7f707705ba6f  google::RawLog__()
> [18:27:36][Step 8/8] @ 0x7f70760f76c9  __cxa_pure_virtual
> [18:27:36][Step 8/8] @   0xa9423c  
> mesos::internal::tests::Cluster::Slaves::shutdown()
> [18:27:36][Step 8/8] @  0x1074e45  
> mesos::internal::tests::MesosTest::ShutdownSlaves()
> [18:27:36][Step 8/8] @  0x1074de4  
> mesos::internal::tests::MesosTest::Shutdown()
> [18:27:36][Step 8/8] @  0x1070ec7  
> mesos::internal::tests::MesosTest::TearDown()
> {code}
> The {{StartSlave}} helper should take {{shared_ptr}} arguments instead.
> This also means that we can remove the {{Shutdown}} helper from most of these 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4446) Set Docker labels based on TaskInfo labels.

2016-02-10 Thread Martin Evgeniev (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141647#comment-15141647
 ] 

Martin Evgeniev commented on MESOS-4446:


Many thanks for the info [~gyliu]. I'll open a JIRA to request some docs..

> Set Docker labels based on TaskInfo labels.
> ---
>
> Key: MESOS-4446
> URL: https://issues.apache.org/jira/browse/MESOS-4446
> Project: Mesos
>  Issue Type: Story
>  Components: docker
>Reporter: Gennady Feldman
>Assignee: Abhishek Dasgupta
>
> So looks like MESOS-3076 added support for Labels to TaskStatus. Would it be 
> possible to pass those onto the docker container?
> This would really help with doing "docker inspect" on the mesos-slave nodes 
> as well as allow us to better collect docker metrics about the 
> tasks/containers that are currently running on the slave.
> docker supports labels out of the box. See here: 
> https://docs.docker.com/engine/userguide/labels-custom-metadata/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4636) Add parent hook to subprocess.

2016-02-10 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141883#comment-15141883
 ] 

Joseph Wu commented on MESOS-4636:
--

Fix for root tests on systemd platforms:
https://reviews.apache.org/r/43432/

> Add parent hook to subprocess.
> --
>
> Key: MESOS-4636
> URL: https://issues.apache.org/jira/browse/MESOS-4636
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
> Fix For: 0.27.1, 0.28.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4157) Speed up ZooKeeper-related tests

2016-02-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141298#comment-15141298
 ] 

haosdent commented on MESOS-4157:
-

The remain long running tests are because wait for expire session. We could 
advance time here.
{code}
[   OK ] ZooKeeperTest.LeaderContender (7233 ms)
[   OK ] 
ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork (3567 ms)
[   OK ] 
ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSession (3526 
ms)
[   OK ] 
ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster
 (3591 ms)
{code}

> Speed up ZooKeeper-related tests
> 
>
> Key: MESOS-4157
> URL: https://issues.apache.org/jira/browse/MESOS-4157
> Project: Mesos
>  Issue Type: Epic
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Assignee: haosdent
>Priority: Minor
>  Labels: mesosphere, newbie++, tech-debt
>
> Execution times on Mac OS 10.10.4:
> {code}
> ZooKeeperTest.Auth (6688 ms)
> ZooKeeperTest.Create (6690 ms)
> ZooKeeperTest.LeaderContender (3385 ms)
> MasterZooKeeperTest.MasterInfoAddress (11282 ms)
> ZooKeeperMasterContenderDetectorTest.NonRetryableFrrors (10053 ms)
> ZooKeeperMasterContenderDetectorTest.ContenderDetectorShutdownNetwork (3390 
> ms)
> ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSession (3358 
> ms)
> ZooKeeperMasterContenderDetectorTest.MasterDetectorExpireSlaveZKSessionNewMaster
>  (3359 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4542) MasterQuotaTest.AvailableResourcesAfterRescinding is flaky.

2016-02-10 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142051#comment-15142051
 ] 

Michael Park commented on MESOS-4542:
-

{noformat}
commit 84c6b714dd5df836a7943493562683ed41f7f396
Author: Alexander Rukletsov 
Date:   Wed Feb 10 13:38:29 2016 -0800

Fixed a flaky test in quota tests.

The `AvailableResourcesAfterRescinding` test became flaky after we
stopped offering unreserved resources beyond quota in
https://reviews.apache.org/r/42835. Hence the allocator offers
rescinded resources to `framework1` if an allocation happens before
the test finishes, which violates the expectation that `framework1`
receives resources only once. Since we do not really care about
allocations in this test but rather about rescinded resources, the
fix is just to ignore subsequent offers to `framework1`.

Review: https://reviews.apache.org/r/42908/
{noformat}
{noformat}
commit 56f7e011e925a0e96ae4d9f5e3641422d273624e
Author: Alexander Rukletsov 
Date:   Wed Feb 10 13:34:02 2016 -0800

Added missing test finalization.

Review: https://reviews.apache.org/r/43422/
{noformat}

> MasterQuotaTest.AvailableResourcesAfterRescinding is flaky.
> ---
>
> Key: MESOS-4542
> URL: https://issues.apache.org/jira/browse/MESOS-4542
> Project: Mesos
>  Issue Type: Bug
>  Components: allocation, master, test
>Reporter: Alexander Rukletsov
>Assignee: Alexander Rukletsov
>  Labels: flaky-test, mesosphere
> Fix For: 0.28.0
>
>
> Can be reproduced by running {{GLOG_v=1 
> GTEST_FILTER="MasterQuotaTest.AvailableResourcesAfterRescinding" 
> ./bin/mesos-tests.sh --gtest_shuffle --gtest_break_on_failure 
> --gtest_repeat=1000 --verbose}}.
> h5. Verbose log from a bad run:
> {code}
> [ RUN  ] MasterQuotaTest.AvailableResourcesAfterRescinding
> I0128 12:20:27.568657 2080858880 resources.cpp:564] Parsing resources as JSON 
> failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000]
> Trying semicolon-delimited string format instead
> I0128 12:20:27.570142 2080858880 resources.cpp:564] Parsing resources as JSON 
> failed: cpus:2;mem:1024;disk:1024;ports:[31000-32000]
> Trying semicolon-delimited string format instead
> I0128 12:20:27.583225 2080858880 leveldb.cpp:174] Opened db in 6241us
> I0128 12:20:27.584353 2080858880 leveldb.cpp:181] Compacted db in 1026us
> I0128 12:20:27.584429 2080858880 leveldb.cpp:196] Created db iterator in 12us
> I0128 12:20:27.584442 2080858880 leveldb.cpp:202] Seeked to beginning of db 
> in 7us
> I0128 12:20:27.584453 2080858880 leveldb.cpp:271] Iterated through 0 keys in 
> the db in 6us
> I0128 12:20:27.584475 2080858880 replica.cpp:779] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I0128 12:20:27.584918 300445696 recover.cpp:447] Starting replica recovery
> I0128 12:20:27.585113 300445696 recover.cpp:473] Replica is in EMPTY status
> I0128 12:20:27.585916 297226240 replica.cpp:673] Replica in EMPTY status 
> received a broadcasted recover request from (18274)@192.168.178.24:51278
> I0128 12:20:27.586086 297762816 recover.cpp:193] Received a recover response 
> from a replica in EMPTY status
> I0128 12:20:27.586449 297226240 recover.cpp:564] Updating replica status to 
> STARTING
> I0128 12:20:27.587204 300445696 leveldb.cpp:304] Persisting metadata (8 
> bytes) to leveldb took 624us
> I0128 12:20:27.587242 300445696 replica.cpp:320] Persisted replica status to 
> STARTING
> I0128 12:20:27.587376 299372544 recover.cpp:473] Replica is in STARTING status
> I0128 12:20:27.588050 300982272 replica.cpp:673] Replica in STARTING status 
> received a broadcasted recover request from (18275)@192.168.178.24:51278
> I0128 12:20:27.588235 300445696 recover.cpp:193] Received a recover response 
> from a replica in STARTING status
> I0128 12:20:27.588572 297762816 recover.cpp:564] Updating replica status to 
> VOTING
> I0128 12:20:27.588850 297226240 leveldb.cpp:304] Persisting metadata (8 
> bytes) to leveldb took 140us
> I0128 12:20:27.588879 297226240 replica.cpp:320] Persisted replica status to 
> VOTING
> I0128 12:20:27.588975 299909120 recover.cpp:578] Successfully joined the 
> Paxos group
> I0128 12:20:27.589154 299909120 recover.cpp:462] Recover process terminated
> I0128 12:20:27.599486 298835968 master.cpp:374] Master 
> 531344bd-56f4-4e4f-8f6f-a6a9d36058c7 (alexr.fritz.box) started on 
> 192.168.178.24:51278
> I0128 12:20:27.599520 298835968 master.cpp:376] Flags at startup: --acls="" 
> --allocation_interval="50ms" --allocator="HierarchicalDRF" 
> --authenticate="true" --authenticate_http="true" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/private/tmp/NlzPSo/credentials" 

[jira] [Assigned] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.

2016-02-10 Thread Cong Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cong Wang reassigned MESOS-4646:


Assignee: Cong Wang

> PortMappingIsolatorTests get kernel stuck.
> --
>
> Key: MESOS-4646
> URL: https://issues.apache.org/jira/browse/MESOS-4646
> Project: Mesos
>  Issue Type: Bug
> Environment: Linux Kernel 3.19.9-49-generic,
> libnl-3.2.27
>Reporter: Till Toenshoff
>Assignee: Cong Wang
>
> {noformat}
> $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*"
> Source directory: /home/till/scratchpad/mesos
> Build directory: /home/till/scratchpad/mesos/build
> -
> We cannot run any cgroups tests that require mounting
> hierarchies because you have the following hierarchies mounted:
> /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, 
> /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, 
> /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, 
> /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd
> We'll disable the CgroupsNoHierarchyTest test fixture for now.
> -
> WARNING: perf not found for kernel 3.19.0-49
>   You may need to install the following packages for this specific kernel:
> linux-tools-3.19.0-49-generic
> linux-cloud-tools-3.19.0-49-generic
>   You may also want to install one of the following packages to keep up to 
> date:
> linux-tools-generic-lts-
> linux-cloud-tools-generic-lts-
> -
> No 'perf' command found so no 'perf' tests will be run
> -
> WARNING: perf not found for kernel 3.19.0-49
>   You may need to install the following packages for this specific kernel:
> linux-tools-3.19.0-49-generic
> linux-cloud-tools-3.19.0-49-generic
>   You may also want to install one of the following packages to keep up to 
> date:
> linux-tools-generic-lts-
> linux-cloud-tools-generic-lts-
> -
> The 'perf' command wasn't found so tests using it
> to sample the 'cycles' hardware event will not be run.
> -
> /bin/nc
> /usr/local/bin/curl
> Note: Google Test filter = 
> 

[jira] [Commented] (MESOS-4607) Docker image create should not return any error with env var

2016-02-10 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142292#comment-15142292
 ] 

Guangya Liu commented on MESOS-4607:


[~jieyu] can you please help shepherd this? Thanks.

> Docker image create should not return any error with env var
> 
>
> Key: MESOS-4607
> URL: https://issues.apache.org/jira/browse/MESOS-4607
> Project: Mesos
>  Issue Type: Bug
>  Components: docker
>Reporter: Gilbert Song
>Assignee: Guangya Liu
>Priority: Minor
>
> In docker image create behavior, entrypoint and environment variables are 
> read from docker inspect. Error should not be returned from finding any 
> wrong-formatted env var, which may possibly block docker containerizer. 
> Specifically, we may want to just `LOG(WARNING)` for those unexpected env var 
> (Please see 
> https://github.com/apache/mesos/blob/master/src/docker/docker.cpp#L388~#L395).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4641) Support Container Network Interface (CNI).

2016-02-10 Thread Mike Spreitzer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142278#comment-15142278
 ] 

Mike Spreitzer commented on MESOS-4641:
---

The Kuryr project in OpenStack is looking at this too.  One approach they are 
considering is creating a CNI plugin that invokes the Docker CLI.  I think this 
is the right approach, for those of us that want to use Neutron, and am 
prototyping it myself.

> Support Container Network Interface (CNI).
> --
>
> Key: MESOS-4641
> URL: https://issues.apache.org/jira/browse/MESOS-4641
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> CoreOS developed the Container Network Interface (CNI), a proposed standard 
> for configuring network interfaces for Linux containers. Many CNI plugins 
> (e.g., calico) have already been developed.
> https://coreos.com/blog/rkt-cni-networking.html
> https://github.com/appc/cni/blob/master/SPEC.md
> Kubernetes supports CNI as well.
> http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-use-libnetwork.html
> In the context of Unified Containerizer, it would be nice if we can have a 
> 'network/cni' isolator which will speak the CNI protocol and prepare the 
> network for the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4641) Support Container Network Interface (CNI).

2016-02-10 Thread Jie Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jie Yu updated MESOS-4641:
--
Description: 
CoreOS developed the Container Network Interface (CNI), a proposed standard for 
configuring network interfaces for Linux containers. Many CNI plugins (e.g., 
calico) have already been developed.
https://coreos.com/blog/rkt-cni-networking.html
https://github.com/appc/cni/blob/master/SPEC.md

Kubernetes supports CNI as well.
http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-use-libnetwork.html

In the context of Unified Containerizer, it would be nice if we can have a 
'network/cni' isolator which will speak the CNI protocol and prepare the 
network for the container.


  was:
CoreOS developed the Container Network Interface (CNI), a proposed standard for 
configuring network interfaces for Linux containers. Many CNI plugins (e.g., 
calico) have already been developed.
https://coreos.com/blog/rkt-cni-networking.html
https://github.com/appc/cni/blob/master/SPEC.md

Also, Kubernetes claimed that they'll support CNI as well.
http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-use-libnetwork.html

In the context of Unified Containerizer, it would be nice if we can have a 
'network/cni' isolator which will speak the CNI protocol and prepare the 
network for the container.



> Support Container Network Interface (CNI).
> --
>
> Key: MESOS-4641
> URL: https://issues.apache.org/jira/browse/MESOS-4641
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> CoreOS developed the Container Network Interface (CNI), a proposed standard 
> for configuring network interfaces for Linux containers. Many CNI plugins 
> (e.g., calico) have already been developed.
> https://coreos.com/blog/rkt-cni-networking.html
> https://github.com/appc/cni/blob/master/SPEC.md
> Kubernetes supports CNI as well.
> http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-use-libnetwork.html
> In the context of Unified Containerizer, it would be nice if we can have a 
> 'network/cni' isolator which will speak the CNI protocol and prepare the 
> network for the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.

2016-02-10 Thread Cong Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142231#comment-15142231
 ] 

Cong Wang commented on MESOS-4646:
--

This might be the bug I have already fixed:
{noformat}
commit 6bd00b850635abb0044e06101761533c8beba79c
Author: WANG Cong 
Date:   Thu Oct 1 11:37:42 2015 -0700

act_mirred: fix a race condition on mirred_list
{noformat}

Could you try to setup kdump to capture the kernel stack trace? Or try a new 
kernel, 4.3 or above.


> PortMappingIsolatorTests get kernel stuck.
> --
>
> Key: MESOS-4646
> URL: https://issues.apache.org/jira/browse/MESOS-4646
> Project: Mesos
>  Issue Type: Bug
> Environment: Linux Kernel 3.19.9-49-generic,
> libnl-3.2.27
>Reporter: Till Toenshoff
>
> {noformat}
> $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*"
> Source directory: /home/till/scratchpad/mesos
> Build directory: /home/till/scratchpad/mesos/build
> -
> We cannot run any cgroups tests that require mounting
> hierarchies because you have the following hierarchies mounted:
> /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, 
> /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, 
> /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, 
> /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd
> We'll disable the CgroupsNoHierarchyTest test fixture for now.
> -
> WARNING: perf not found for kernel 3.19.0-49
>   You may need to install the following packages for this specific kernel:
> linux-tools-3.19.0-49-generic
> linux-cloud-tools-3.19.0-49-generic
>   You may also want to install one of the following packages to keep up to 
> date:
> linux-tools-generic-lts-
> linux-cloud-tools-generic-lts-
> -
> No 'perf' command found so no 'perf' tests will be run
> -
> WARNING: perf not found for kernel 3.19.0-49
>   You may need to install the following packages for this specific kernel:
> linux-tools-3.19.0-49-generic
> linux-cloud-tools-3.19.0-49-generic
>   You may also want to install one of the following packages to keep up to 
> date:
> linux-tools-generic-lts-
> linux-cloud-tools-generic-lts-
> -
> The 'perf' command wasn't found so tests using it
> to sample the 'cycles' hardware event will not be run.
> -
> /bin/nc
> /usr/local/bin/curl
> Note: Google Test filter = 
> 

[jira] [Created] (MESOS-4648) Backport zookeeper slow add_auth patch

2016-02-10 Thread haosdent (JIRA)
haosdent created MESOS-4648:
---

 Summary: Backport zookeeper slow add_auth patch
 Key: MESOS-4648
 URL: https://issues.apache.org/jira/browse/MESOS-4648
 Project: Mesos
  Issue Type: Improvement
Reporter: haosdent
Assignee: haosdent
Priority: Minor


Backport [ZOOKEEPER-770 Slow add_auth calls with multi-threaded 
client|https://issues.apache.org/jira/browse/ZOOKEEPER-770] to solve c client 
slow add_auth call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MESOS-3833) /help endpoints do not work for nested paths

2016-02-10 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991263#comment-14991263
 ] 

Guangya Liu edited comment on MESOS-3833 at 2/11/16 5:38 AM:
-

RR: 
https://reviews.apache.org/r/39968/
https://reviews.apache.org/r/43469/


was (Author: gyliu):
RR: https://reviews.apache.org/r/39968/

> /help endpoints do not work for nested paths
> 
>
> Key: MESOS-3833
> URL: https://issues.apache.org/jira/browse/MESOS-3833
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Mesos displays the list of all supported endpoints starting at a given path 
> prefix using the {{/help}} suffix, e.g. {{master:5050/help}}.
> It seems that the {{help}} functionality is broken for URL's having nested 
> paths e.g. {{master:5050/help/master/machine/down}}. The response returned is:
> {quote}
> Malformed URL, expecting '/help/id/name/'
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3833) /help endpoints do not work for nested paths

2016-02-10 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140592#comment-15140592
 ] 

Benjamin Mahler commented on MESOS-3833:


Yes sorry for the delay, [~gyliu] please email me at bmah...@apache.org when 
you need reviews :)

Just gave you a review, let me know when you've updated!

> /help endpoints do not work for nested paths
> 
>
> Key: MESOS-3833
> URL: https://issues.apache.org/jira/browse/MESOS-3833
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>Priority: Minor
>  Labels: mesosphere, newbie
>
> Mesos displays the list of all supported endpoints starting at a given path 
> prefix using the {{/help}} suffix, e.g. {{master:5050/help}}.
> It seems that the {{help}} functionality is broken for URL's having nested 
> paths e.g. {{master:5050/help/master/machine/down}}. The response returned is:
> {quote}
> Malformed URL, expecting '/help/id/name/'
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4612) Update vendored ZooKeeper

2016-02-10 Thread haosdent (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140655#comment-15140655
 ] 

haosdent commented on MESOS-4612:
-

Thank you very much, I saw 3.4.8-rc0 already under 
[voting|http://search-hadoop.com/m/JhBoa1vFuw116H8BC1]. I think we don't wait 
too long for 3.4.8

> Update vendored ZooKeeper
> -
>
> Key: MESOS-4612
> URL: https://issues.apache.org/jira/browse/MESOS-4612
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Cody Maloney
>Assignee: haosdent
>  Labels: mesosphere, tech-debt, zookeeper
>
> See: http://zookeeper.apache.org/doc/r3.4.7/releasenotes.html for 
> improvements / bug fixes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4612) Update vendored ZooKeeper to 3.4.8

2016-02-10 Thread haosdent (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

haosdent updated MESOS-4612:

Summary: Update vendored ZooKeeper to 3.4.8  (was: Update vendored 
ZooKeeper)

> Update vendored ZooKeeper to 3.4.8
> --
>
> Key: MESOS-4612
> URL: https://issues.apache.org/jira/browse/MESOS-4612
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Cody Maloney
>Assignee: haosdent
>  Labels: mesosphere, tech-debt, zookeeper
>
> See: http://zookeeper.apache.org/doc/r3.4.7/releasenotes.html for 
> improvements / bug fixes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4641) Support Container Network Interface (CNI).

2016-02-10 Thread Mike Spreitzer (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142197#comment-15142197
 ] 

Mike Spreitzer commented on MESOS-4641:
---

Kubernetes already does support CNI.

> Support Container Network Interface (CNI).
> --
>
> Key: MESOS-4641
> URL: https://issues.apache.org/jira/browse/MESOS-4641
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> CoreOS developed the Container Network Interface (CNI), a proposed standard 
> for configuring network interfaces for Linux containers. Many CNI plugins 
> (e.g., calico) have already been developed.
> https://coreos.com/blog/rkt-cni-networking.html
> https://github.com/appc/cni/blob/master/SPEC.md
> Also, Kubernetes claimed that they'll support CNI as well.
> http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-use-libnetwork.html
> In the context of Unified Containerizer, it would be nice if we can have a 
> 'network/cni' isolator which will speak the CNI protocol and prepare the 
> network for the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4643) PortMappingIsolatorTest fail when no namespaces are set.

2016-02-10 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-4643:
-

 Summary: PortMappingIsolatorTest fail when no namespaces are set.
 Key: MESOS-4643
 URL: https://issues.apache.org/jira/browse/MESOS-4643
 Project: Mesos
  Issue Type: Bug
 Environment: Linux Kernel 3.19.0-49-generic,
libnl-3.2.27
Reporter: Till Toenshoff
Priority: Minor


Currently our network isolator tests fail with the following output on a Ubuntu 
14.04 VM.

{noformat}
[02:10:15][Step 8/8] [ RUN  ] 
PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP
[02:10:15][Step 8/8] ../../src/tests/containerizer/port_mapping_tests.cpp:164: 
Failure
[02:10:15][Step 8/8] entries: Failed to opendir '/var/run/netns': No such file 
or directory
[02:10:15][Step 8/8] ../../src/tests/containerizer/port_mapping_tests.cpp:164: 
Failure
[02:10:15][Step 8/8] entries: Failed to opendir '/var/run/netns': No such file 
or directory
[02:10:15][Step 8/8] [  FAILED  ] 
PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP (4 ms)
{noformat}

The machine has no network namespaces set, hence {{/var/run/netns}} does not 
exist. 

We should help users understanding this prerequisite or maybe even get these 
things in a fixture.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4644) PortMappingIsolatorTest* crashes when ethtool is not installed.

2016-02-10 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-4644:
-

 Summary: PortMappingIsolatorTest* crashes when ethtool is not 
installed.
 Key: MESOS-4644
 URL: https://issues.apache.org/jira/browse/MESOS-4644
 Project: Mesos
  Issue Type: Improvement
Reporter: Till Toenshoff
Priority: Minor


{noformat}
[ RUN  ] PortMappingIsolatorTest.ROOT_NC_ContainerToContainerTCP
sh: 1: ethtool: not found
F0210 15:45:23.543251  3956 port_mapping_tests.cpp:441] CHECK_SOME(isolator): 
Check command 'ethtool' failed: Failed to execute 'ethtool -k lo'; the command 
was either not found or exited with a non-zero exit status: 127
*** Check failure stack trace: ***
@ 0x7fb3b0642a1c  google::LogMessage::Fail()
@ 0x7fb3b0642968  google::LogMessage::SendToLog()
@ 0x7fb3b064236a  google::LogMessage::Flush()
@ 0x7fb3b064527e  google::LogMessageFatal::~LogMessageFatal()
@   0x939020  _CheckFatal::~_CheckFatal()
@  0x1524fc4  
mesos::internal::tests::PortMappingIsolatorTest_ROOT_NC_ContainerToContainerTCP_Test::TestBody()
@  0x15ad006  
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@  0x15a7f26  
testing::internal::HandleExceptionsInMethodIfSupported<>()
@  0x158963b  testing::Test::Run()
@  0x1589dbe  testing::TestInfo::Run()
@  0x158a404  testing::TestCase::Run()
@  0x1590b4c  testing::internal::UnitTestImpl::RunAllTests()
@  0x15adc2b  
testing::internal::HandleSehExceptionsInMethodIfSupported<>()
@  0x15a8ad2  
testing::internal::HandleExceptionsInMethodIfSupported<>()
@  0x158f8e8  testing::UnitTest::Run()
@   0xd6e65c  RUN_ALL_TESTS()
@   0xd6e281  main
@ 0x7fb3aae13ec5  (unknown)
@   0x937669  (unknown)
{noformat}

We might  want to consider adding a test for the availability of {{ethtool}} 
into {{src/tests/containerizer/port_mapping_tests.cpp -- 
PortMappingIsolatorTest::SetUpTestCase}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4646) PortMappingIsolatorTests get stuck.

2016-02-10 Thread Till Toenshoff (JIRA)
Till Toenshoff created MESOS-4646:
-

 Summary: PortMappingIsolatorTests get stuck.
 Key: MESOS-4646
 URL: https://issues.apache.org/jira/browse/MESOS-4646
 Project: Mesos
  Issue Type: Bug
 Environment: Linux Kernel 3.19.9-49-generic,
libnl-3.2.27

Reporter: Till Toenshoff


{noformat}
$ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*"
Source directory: /home/till/scratchpad/mesos
Build directory: /home/till/scratchpad/mesos/build
-
We cannot run any cgroups tests that require mounting
hierarchies because you have the following hierarchies mounted:
/sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, 
/sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, 
/sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, 
/sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd
We'll disable the CgroupsNoHierarchyTest test fixture for now.
-
WARNING: perf not found for kernel 3.19.0-49

  You may need to install the following packages for this specific kernel:
linux-tools-3.19.0-49-generic
linux-cloud-tools-3.19.0-49-generic

  You may also want to install one of the following packages to keep up to date:
linux-tools-generic-lts-
linux-cloud-tools-generic-lts-
-
No 'perf' command found so no 'perf' tests will be run
-
WARNING: perf not found for kernel 3.19.0-49

  You may need to install the following packages for this specific kernel:
linux-tools-3.19.0-49-generic
linux-cloud-tools-3.19.0-49-generic

  You may also want to install one of the following packages to keep up to date:
linux-tools-generic-lts-
linux-cloud-tools-generic-lts-
-
The 'perf' command wasn't found so tests using it
to sample the 'cycles' hardware event will not be run.
-
/bin/nc
/usr/local/bin/curl
Note: Google Test filter = 

[jira] [Assigned] (MESOS-4641) Support Container Network Interface (CNI).

2016-02-10 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang reassigned MESOS-4641:
-

Assignee: Qian Zhang

> Support Container Network Interface (CNI).
> --
>
> Key: MESOS-4641
> URL: https://issues.apache.org/jira/browse/MESOS-4641
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> CoreOS developed the Container Network Interface (CNI), a proposed standard 
> for configuring network interfaces for Linux containers. Many CNI plugins 
> (e.g., calico) have already been developed.
> https://coreos.com/blog/rkt-cni-networking.html
> https://github.com/appc/cni/blob/master/SPEC.md
> Also, Kubernetes claimed that they'll support CNI as well.
> http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-use-libnetwork.html
> In the context of Unified Containerizer, it would be nice if we can have a 
> 'network/cni' isolator which will speak the CNI protocol and prepare the 
> network for the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4641) Support Container Network Interface (CNI).

2016-02-10 Thread Qian Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Qian Zhang updated MESOS-4641:
--
Shepherd: Jie Yu

> Support Container Network Interface (CNI).
> --
>
> Key: MESOS-4641
> URL: https://issues.apache.org/jira/browse/MESOS-4641
> Project: Mesos
>  Issue Type: Epic
>Reporter: Jie Yu
>Assignee: Qian Zhang
>
> CoreOS developed the Container Network Interface (CNI), a proposed standard 
> for configuring network interfaces for Linux containers. Many CNI plugins 
> (e.g., calico) have already been developed.
> https://coreos.com/blog/rkt-cni-networking.html
> https://github.com/appc/cni/blob/master/SPEC.md
> Also, Kubernetes claimed that they'll support CNI as well.
> http://blog.kubernetes.io/2016/01/why-Kubernetes-doesnt-use-libnetwork.html
> In the context of Unified Containerizer, it would be nice if we can have a 
> 'network/cni' isolator which will speak the CNI protocol and prepare the 
> network for the container.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-4646) PortMappingIsolatorTests get kernel stuck.

2016-02-10 Thread Till Toenshoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Toenshoff updated MESOS-4646:
--
Summary: PortMappingIsolatorTests get kernel stuck.  (was: 
PortMappingIsolatorTests get stuck.)

> PortMappingIsolatorTests get kernel stuck.
> --
>
> Key: MESOS-4646
> URL: https://issues.apache.org/jira/browse/MESOS-4646
> Project: Mesos
>  Issue Type: Bug
> Environment: Linux Kernel 3.19.9-49-generic,
> libnl-3.2.27
>Reporter: Till Toenshoff
>
> {noformat}
> $ sudo ./bin/mesos-tests.sh --gtest_filter="*PortMappingIsolatorTest*"
> Source directory: /home/till/scratchpad/mesos
> Build directory: /home/till/scratchpad/mesos/build
> -
> We cannot run any cgroups tests that require mounting
> hierarchies because you have the following hierarchies mounted:
> /sys/fs/cgroup/blkio, /sys/fs/cgroup/cpu, /sys/fs/cgroup/cpuacct, 
> /sys/fs/cgroup/cpuset, /sys/fs/cgroup/devices, /sys/fs/cgroup/freezer, 
> /sys/fs/cgroup/hugetlb, /sys/fs/cgroup/memory, /sys/fs/cgroup/net_cls, 
> /sys/fs/cgroup/net_prio, /sys/fs/cgroup/perf_event, /sys/fs/cgroup/systemd
> We'll disable the CgroupsNoHierarchyTest test fixture for now.
> -
> WARNING: perf not found for kernel 3.19.0-49
>   You may need to install the following packages for this specific kernel:
> linux-tools-3.19.0-49-generic
> linux-cloud-tools-3.19.0-49-generic
>   You may also want to install one of the following packages to keep up to 
> date:
> linux-tools-generic-lts-
> linux-cloud-tools-generic-lts-
> -
> No 'perf' command found so no 'perf' tests will be run
> -
> WARNING: perf not found for kernel 3.19.0-49
>   You may need to install the following packages for this specific kernel:
> linux-tools-3.19.0-49-generic
> linux-cloud-tools-3.19.0-49-generic
>   You may also want to install one of the following packages to keep up to 
> date:
> linux-tools-generic-lts-
> linux-cloud-tools-generic-lts-
> -
> The 'perf' command wasn't found so tests using it
> to sample the 'cycles' hardware event will not be run.
> -
> /bin/nc
> /usr/local/bin/curl
> Note: Google Test filter = 
> 

[jira] [Commented] (MESOS-4636) Add parent hook to subprocess.

2016-02-10 Thread Michael Park (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142074#comment-15142074
 ] 

Michael Park commented on MESOS-4636:
-

{noformat}
commit f2a71af11eb3af6d8d329742962f37a907d9967e
Author: Joseph Wu 
Date:   Wed Feb 10 16:40:02 2016 -0800

Fix CGROUPS_ROOT_* tests on systemd platforms.

Tests do not run with systemd configured, so any dependency on systemd
will fail some checks.

This fixes the `LinuxLauncher` to use the correct systemd-guard function.

Review: https://reviews.apache.org/r/43432/
{noformat}

> Add parent hook to subprocess.
> --
>
> Key: MESOS-4636
> URL: https://issues.apache.org/jira/browse/MESOS-4636
> Project: Mesos
>  Issue Type: Improvement
>  Components: libprocess
>Reporter: Joris Van Remoortere
>Assignee: Joris Van Remoortere
>  Labels: mesosphere
> Fix For: 0.27.1, 0.28.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4645) Mesos agent shutdown on healtcheck timeout rather than lost and recovered

2016-02-10 Thread Cody Maloney (JIRA)
Cody Maloney created MESOS-4645:
---

 Summary: Mesos agent shutdown on healtcheck timeout rather than 
lost and recovered
 Key: MESOS-4645
 URL: https://issues.apache.org/jira/browse/MESOS-4645
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.27.1
Reporter: Cody Maloney


I expected slaves to have to be gone the re-registration timeout before they'd 
be lost to the cluster, not fail 5 healtchecks (Failing the healthchecks 
indicates there is a network partition, not that the agent is gone for good and 
will never come back).

Is there some flag I'm missing here which I should be setting?

>From my perspective I expect frameworks to not get offers for resources on 
>agents which haven't been contacted recently (The framework wouldn't be able 
>to launch anything on the agent). Once the re-registration period times out 
>the slave would be assumed completely lost and the tasks assumed terminated / 
>able to be re-launched if desired. If an agent recovers between the 
>healthcheck timeout and re-registration timeout, it should be able to re-join 
>the cluster with its running tasks kept running.

Note: Some log lines have their start or tail truncated. Critical stuff should 
all be there

Master flags
{noformat}
Feb 11 00:22:19 ip-10-0-4-187.us-west-2.compute.internal mesos-master[1362]: 
I0211 00:22:19.690507  1362 master.cpp:369] Flags at startup: 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" 
--authorizers="local" --cluster="cody-cm52sd-2" --framework_sorter="drf" 
--help="false" --hostname_lookup="false" --initialize_driver_logging="true" 
--ip_discovery_command="/opt/mesosphere/bin/detect_ip" 
--log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" 
--logging_level="INFO" --max_slave_ping_timeouts="5" --port="5050" 
--quiet="false" --quorum="1" --recovery_slave_removal_limit="100%" 
--registry="replicated_log" --registry_fetch_timeout="1mins" 
--registry_store_timeout="5secs" --registry_strict="false" 
--roles="slave_public" --root_submissions="true" --slave_ping_timeout="15secs" 
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
--webui_dir="/opt/mesosphere/packages/mesos--4dd59ec6bde2052f6f2a0a0da415b6c92c3c418a/share/mesos/webui"
 --weights="slave_public=1" --work_dir="/var/lib/mesos/master" 
--zk="zk://127.0.0.1:2181/mesos" --zk_session_timeout="10secs"
{noformat}

Slave flags
{noformat}
Feb 11 00:34:13 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3914]: 
I0211 00:34:13.334395  3914 slave.cpp:192] Flags at startup: 
--appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" 
--cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" 
--cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" 
--cgroups_root="mesos" --container_disk_watch_interval="15secs" 
--containerizers="docker,mesos" --default_role="*" 
--disk_watch_interval="1mins" --docker="docker" 
--docker_auth_server="auth.docker.io" --docker_auth_server_port="443" 
--docker_kill_orphans="true" 
--docker_local_archives_dir="/tmp/mesos/images/docker" --docker_puller="local" 
--docker_puller_timeout="60" --docker_registry="registry-1.docker.io" 
--docker_registry_port="443" --docker_remove_delay="1hrs" 
--docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" 
--docker_store_dir="/tmp/mesos/store/docker" 
--enforce_container_disk_quota="false" 
--executor_environment_variables="{"LD_LIBRARY_PATH":"\/opt\/mesosphere\/lib","PATH":"\/usr\/bin:\/bin","SASL_PATH":"\/opt\/mesosphere\/lib\/sasl2","SHELL":"\/usr\/bin\/bash"}"
 --executor_registration_timeout="5mins" 
--executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" 
--fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="2days" 
--gc_disk_headroom="0.1" --hadoop_home="" --help="false" 
--hostname_lookup="false" --image_provisioner_backend="copy" 
--initialize_driver_logging="true" 
--ip_discovery_command="/opt/mesosphere/bin/detect_ip" 
--isolation="cgroups/cpu,cgroups/mem" 
--launcher_dir="/opt/mesosphere/packages/mesos--4dd59ec6bde2052f6f2a0a0da415b6c92c3c418a/libexec/mesos"
 --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO" 
--master="zk://leader.mesos:2181/mesos" 
--oversubscribed_resources_interval="15secs" --perf_duration="10secs" 
--perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" 
--quiet="false" --recover="reconnect" --recovery_timeout="15mins" 
--registration_backoff_factor="1secs" 
--resources="ports:[1025-2180,2182-3887,3889-5049,5052-8079,8082-8180,8182-32000]"
 --re
Feb 11 00:34:13 ip-10-0-0-52.us-west-2.compute.internal mesos-slave[3914]: 
vocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" 
--slave_subsystems="cpu,memory" --strict="true" --switch_user="true" 

[jira] [Commented] (MESOS-4431) Sharing of persistent volumes via reference counting

2016-02-10 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142117#comment-15142117
 ] 

Klaus Ma commented on MESOS-4431:
-

[~anindya.sinha], would you spit those RR into smaller tasks and patches? 
Please refer to other EPIC on how to split them, e.g.  MESOS-1719 :).

> Sharing of persistent volumes via reference counting
> 
>
> Key: MESOS-4431
> URL: https://issues.apache.org/jira/browse/MESOS-4431
> Project: Mesos
>  Issue Type: Improvement
>  Components: general
>Affects Versions: 0.25.0
>Reporter: Anindya Sinha
>Assignee: Anindya Sinha
>  Labels: external-volumes, persistent-volumes
>
> Add capability for specific resources to be shared amongst tasks within or 
> across frameworks/roles. Enable this functionality for persistent volumes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-4647) Use in_memory as default registry when testing

2016-02-10 Thread haosdent (JIRA)
haosdent created MESOS-4647:
---

 Summary: Use in_memory as default registry when testing
 Key: MESOS-4647
 URL: https://issues.apache.org/jira/browse/MESOS-4647
 Project: Mesos
  Issue Type: Improvement
Reporter: haosdent
Assignee: haosdent


Currently, we use {{replicated_log}} as default registry when testing. This 
cause io operations when testings and slow down test cases. We should change it 
to use {{in_memory}} when testing and only use {{replicated_log}} when 
necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)