[jira] [Comment Edited] (MESOS-2275) Document header include rules in style guide

2015-10-22 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969684#comment-14969684
 ] 

Jan Schlicht edited comment on MESOS-2275 at 10/22/15 7:10 PM:
---

The different levels are already separated by newline. {{clang-format}} will 
order "include blocks" separately.
So its
{code}
#include 

#include 
{code}
already. I try to address this in the comment and the example (see the RR). Any 
suggestions for a better description are always welcome.


was (Author: nfnt):
The different levels are already separated by newline. `clang-format` will 
order "include blocks" separately.
So its
```
#include 

#include 
```
already. I try to address this in the comment and the example (see the RR). Any 
suggestions for a better description are always welcome.

> Document header include rules in style guide
> 
>
> Key: MESOS-2275
> URL: https://issues.apache.org/jira/browse/MESOS-2275
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Niklas Quarfot Nielsen
>Assignee: Jan Schlicht
>Priority: Trivial
>  Labels: beginner, docathon, mesosphere
>
> We have several ways of sorting, grouping and ordering headers includes in 
> Mesos. We should agree on a rule set and do a style scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2275) Document header include rules in style guide

2015-10-22 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968658#comment-14968658
 ] 

Benjamin Bannier commented on MESOS-2275:
-

I think we probably would also want an example that makes it clearer if in each 
component we use pure lex sort, or instead do enforce some residual level of 
logical ordering, e.g. {{clang-format}} (from trunk) prefers lexicographical 
sort

{code}
#include 
#include 
{code}

while one could also imagine the opposite ordering which emphasizes {{foo.hpp}} 
as some sort of "heading header" (currently not supported by {{clang-format}}).

The Google style guide asks for "alphabetical ordering" which isn't helpful 
here.

> Document header include rules in style guide
> 
>
> Key: MESOS-2275
> URL: https://issues.apache.org/jira/browse/MESOS-2275
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Niklas Quarfot Nielsen
>Assignee: Jan Schlicht
>Priority: Trivial
>  Labels: beginner, docathon, mesosphere
>
> We have several ways of sorting, grouping and ordering headers includes in 
> Mesos. We should agree on a rule set and do a style scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3545) Investigate restoring tasks/executors after machine reboot.

2015-10-22 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969546#comment-14969546
 ] 

Neil Conway commented on MESOS-3545:


Cool! Can you post a link to this design doc to the dev mailing list?

> Investigate restoring tasks/executors after machine reboot.
> ---
>
> Key: MESOS-3545
> URL: https://issues.apache.org/jira/browse/MESOS-3545
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Benjamin Hindman
>  Labels: mesosphere
>
> If a task/executor is restartable (see MESOS-3544) it might make sense to 
> force an agent to restart these tasks/executors _before_ after a machine 
> reboot in the event that the machine is network partitioned away from the 
> master (or the master has failed) but we'd like to get these services running 
> again. Assuming the agent(s) running on the machine has not been disconnected 
> from the master for longer than the master's agent re-registration timeout 
> the agent should be able to re-register (i.e., after a network partition is 
> resolved) without a problem. However, in the same way that a framework would 
> be interested in knowing that it's tasks/executors were restarted we'd want 
> to send something like a TASK_RESTARTED status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3793) Cannot start mesos local on a Debian GNU/Linux 8 docker machine

2015-10-22 Thread Matthias Veit (JIRA)
Matthias Veit created MESOS-3793:


 Summary: Cannot start mesos local on a Debian GNU/Linux 8 docker 
machine
 Key: MESOS-3793
 URL: https://issues.apache.org/jira/browse/MESOS-3793
 Project: Mesos
  Issue Type: Bug
Affects Versions: 0.25.0
 Environment: Debian GNU/Linux 8 docker machine
Reporter: Matthias Veit


We updated the mesos version to 0.25.0 in our Marathon docker image, that runs 
our integration tests.
We use mesos local for those tests. This fails with this message:

{noformat}
root@a06e4b4eb776:/marathon# mesos local
I1022 18:42:26.852485   136 leveldb.cpp:176] Opened db in 6.103258ms
I1022 18:42:26.853302   136 leveldb.cpp:183] Compacted db in 765740ns
I1022 18:42:26.853343   136 leveldb.cpp:198] Created db iterator in 9001ns
I1022 18:42:26.853355   136 leveldb.cpp:204] Seeked to beginning of db in 1287ns
I1022 18:42:26.853366   136 leveldb.cpp:273] Iterated through 0 keys in the db 
in ns
I1022 18:42:26.853406   136 replica.cpp:744] Replica recovered with log 
positions 0 -> 0 with 1 holes and 0 unlearned
I1022 18:42:26.853775   141 recover.cpp:449] Starting replica recovery
I1022 18:42:26.853862   141 recover.cpp:475] Replica is in EMPTY status
I1022 18:42:26.854751   138 replica.cpp:641] Replica in EMPTY status received a 
broadcasted recover request
I1022 18:42:26.854856   140 recover.cpp:195] Received a recover response from a 
replica in EMPTY status
I1022 18:42:26.855002   140 recover.cpp:566] Updating replica status to STARTING
I1022 18:42:26.855655   138 master.cpp:376] Master 
a3f39818-1bda-4710-b96b-2a60ed4d12b8 (a06e4b4eb776) started on 172.17.0.14:5050
I1022 18:42:26.855680   138 master.cpp:378] Flags at startup: 
--allocation_interval="1secs" --allocator="HierarchicalDRF" 
--authenticate="false" --authenticate_slaves="false" --authenticators="crammd5" 
--authorizers="local" --framework_sorter="drf" --help="false" 
--hostname_lookup="true" --initialize_driver_logging="true" 
--log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO" 
--max_slave_ping_timeouts="5" --quiet="false" 
--recovery_slave_removal_limit="100%" --registry="replicated_log" 
--registry_fetch_timeout="1mins" --registry_store_timeout="5secs" 
--registry_strict="false" --root_submissions="true" 
--slave_ping_timeout="15secs" --slave_reregister_timeout="10mins" 
--user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" 
--work_dir="/tmp/mesos/local/AK0XpG" --zk_session_timeout="10secs"
I1022 18:42:26.855790   138 master.cpp:425] Master allowing unauthenticated 
frameworks to register
I1022 18:42:26.855803   138 master.cpp:430] Master allowing unauthenticated 
slaves to register
I1022 18:42:26.855815   138 master.cpp:467] Using default 'crammd5' 
authenticator
W1022 18:42:26.855829   138 authenticator.cpp:505] No credentials provided, 
authentication requests will be refused
I1022 18:42:26.855840   138 authenticator.cpp:512] Initializing server SASL
I1022 18:42:26.856442   136 containerizer.cpp:143] Using isolation: 
posix/cpu,posix/mem,filesystem/posix
I1022 18:42:26.856943   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 1.888185ms
I1022 18:42:26.856987   140 replica.cpp:323] Persisted replica status to 
STARTING
I1022 18:42:26.857115   140 recover.cpp:475] Replica is in STARTING status
I1022 18:42:26.857270   140 replica.cpp:641] Replica in STARTING status 
received a broadcasted recover request
I1022 18:42:26.857312   140 recover.cpp:195] Received a recover response from a 
replica in STARTING status
I1022 18:42:26.857368   140 recover.cpp:566] Updating replica status to VOTING
I1022 18:42:26.857781   140 leveldb.cpp:306] Persisting metadata (8 bytes) to 
leveldb took 371121ns
I1022 18:42:26.857841   140 replica.cpp:323] Persisted replica status to VOTING
I1022 18:42:26.857895   140 recover.cpp:580] Successfully joined the Paxos group
I1022 18:42:26.857928   140 recover.cpp:464] Recover process terminated
I1022 18:42:26.862455   137 master.cpp:1603] The newly elected leader is 
master@172.17.0.14:5050 with id a3f39818-1bda-4710-b96b-2a60ed4d12b8
I1022 18:42:26.862498   137 master.cpp:1616] Elected as the leading master!
I1022 18:42:26.862511   137 master.cpp:1376] Recovering from registrar
I1022 18:42:26.862560   137 registrar.cpp:309] Recovering registrar
Failed to create a containerizer: Could not create MesosContainerizer: Failed 
to create launcher: Failed to create Linux launcher: Failed to mount cgroups 
hierarchy at '/sys/fs/cgroup/freezer': 'freezer' is already attached to another 
hierarchy
{noformat}

The setup worked with mesos 0.24.0.
The Dockerfile is here: 
https://github.com/mesosphere/marathon/blob/mv/mesos_0.25/Dockerfile



{noformat}
root@a06e4b4eb776:/marathon# ls /sys/fs/cgroup/
root@a06e4b4eb776:/marathon# 
{noformat}

{noformat}
root@a06e4b4eb776:/marathon# cat /proc/mounts 
none / aufs 

[jira] [Commented] (MESOS-2275) Document header include rules in style guide

2015-10-22 Thread Jan Schlicht (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969684#comment-14969684
 ] 

Jan Schlicht commented on MESOS-2275:
-

The different levels are already separated by newline. `clang-format` will 
order "include blocks" separately.
So its
```
#include 

#include 
```
already. I try to address this in the comment and the example (see the RR). Any 
suggestions for a better description are always welcome.

> Document header include rules in style guide
> 
>
> Key: MESOS-2275
> URL: https://issues.apache.org/jira/browse/MESOS-2275
> Project: Mesos
>  Issue Type: Improvement
>Reporter: Niklas Quarfot Nielsen
>Assignee: Jan Schlicht
>Priority: Trivial
>  Labels: beginner, docathon, mesosphere
>
> We have several ways of sorting, grouping and ordering headers includes in 
> Mesos. We should agree on a rule set and do a style scan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3545) Investigate restoring tasks/executors after machine reboot.

2015-10-22 Thread Megha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969497#comment-14969497
 ] 

Megha commented on MESOS-3545:
--

Here's the first draft of the design for Persistent Tasks. Looking forward to 
feedback and comments.

https://docs.google.com/document/d/1l7goeISpYmCjM03l20lmjZ6_BMfdxBs31znEBRtzsuU/edit?usp=sharing



> Investigate restoring tasks/executors after machine reboot.
> ---
>
> Key: MESOS-3545
> URL: https://issues.apache.org/jira/browse/MESOS-3545
> Project: Mesos
>  Issue Type: Improvement
>  Components: slave
>Reporter: Benjamin Hindman
>  Labels: mesosphere
>
> If a task/executor is restartable (see MESOS-3544) it might make sense to 
> force an agent to restart these tasks/executors _before_ after a machine 
> reboot in the event that the machine is network partitioned away from the 
> master (or the master has failed) but we'd like to get these services running 
> again. Assuming the agent(s) running on the machine has not been disconnected 
> from the master for longer than the master's agent re-registration timeout 
> the agent should be able to re-register (i.e., after a network partition is 
> resolved) without a problem. However, in the same way that a framework would 
> be interested in knowing that it's tasks/executors were restarted we'd want 
> to send something like a TASK_RESTARTED status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-22 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu reassigned MESOS-3771:


Assignee: Joseph Wu

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1, 0.26.0
>Reporter: Steven Schlansker
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-22 Thread Joseph Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Wu updated MESOS-3771:
-
Labels: mesosphere  (was: )

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1, 0.26.0
>Reporter: Steven Schlansker
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2186) Mesos crashes if any configured zookeeper does not resolve.

2015-10-22 Thread Neil Conway (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969565#comment-14969565
 ] 

Neil Conway commented on MESOS-2186:


FWIW, this sounds like pretty weird DNS behavior: a host being down shouldn't 
result in getaddrinfo() returning EAI_NONAME. You could possibly work around 
this by doing your own hostname resolution and passing IPs into Mesos, but I 
think the root problem is that DNS in this environment behaves weirdly.

> Mesos crashes if any configured zookeeper does not resolve.
> ---
>
> Key: MESOS-2186
> URL: https://issues.apache.org/jira/browse/MESOS-2186
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.21.0, 0.26.0
> Environment: Zookeeper:  3.4.5+28-1.cdh4.7.1.p0.13.el6
> Mesos: 0.21.0-1.0.centos65
> CentOS: CentOS release 6.6 (Final)
>Reporter: Daniel Hall
>Priority: Critical
>  Labels: mesosphere
>
> When starting Mesos, if one of the configured zookeeper servers does not 
> resolve in DNS Mesos will crash and refuse to start. We noticed this issue 
> while we were rebuilding one of our zookeeper hosts in Google compute (which 
> bases the DNS on the machines running).
> Here is a log from a failed startup (hostnames and ip addresses have been 
> sanitised).
> {noformat}
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.088835 
> 28627 main.cpp:292] Starting Mesos master
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 2014-12-09 
> 22:54:54,095:28627(0x7fa9f042f700):ZOO_ERROR@getaddrs@599: getaddrinfo: No 
> such file or directory
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: F1209 22:54:54.095239 
> 28642 zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such 
> file or directory [2]
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: *** Check failure stack 
> trace: ***
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 2014-12-09 
> 22:54:54,097:28627(0x7fa9ed22a700):ZOO_ERROR@getaddrs@599: getaddrinfo: No 
> such file or directory
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: F1209 22:54:54.097718 
> 28647 zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such 
> file or directory [2]
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: *** Check failure stack 
> trace: ***
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a0160  
> google::LogMessage::Fail()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a0160  
> google::LogMessage::Fail()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a00b9  
> google::LogMessage::SendToLog()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 2014-12-09 
> 22:54:54,108:28627(0x7fa9ef02d700):ZOO_ERROR@getaddrs@599: getaddrinfo: No 
> such file or directory
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: F1209 22:54:54.097718 
> 28647 zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such 
> file or directory [2]F1209 22:54:54.108422 28644 zookeeper.cpp:113] Failed to 
> create ZooKeeper, zookeeper_init: No such file or directory [2]
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: *** Check failure stack 
> trace: ***
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a0160  
> google::LogMessage::Fail()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 2014-12-09 
> 22:54:54,109:28627(0x7fa9f0e30700):ZOO_ERROR@getaddrs@599: getaddrinfo: No 
> such file or directory
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: F1209 22:54:54.097718 
> 28647 zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such 
> file or directory [2]F1209 22:54:54.108422 28644 zookeeper.cpp:113] Failed to 
> create ZooKeeper, zookeeper_init: No such file or directory [2]F1209 
> 22:54:54.109864 28641 zookeeper.cpp:113] Failed to create ZooKeeper, 
> zookeeper_init: No such file or directory [2]
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: *** Check failure stack 
> trace: ***
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a0160  
> google::LogMessage::Fail()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a00b9  
> google::LogMessage::SendToLog()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a00b9  
> google::LogMessage::SendToLog()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.123208 
> 28640 master.cpp:318] Master 20141209-225454-4155764746-5050-28627 
> (mesosmaster-2.internal) started on 10.x.x.x:5050
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.123306 
> 

[jira] [Commented] (MESOS-3763) Need for http::put request method

2015-10-22 Thread Benjamin Mahler (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969636#comment-14969636
 ] 

Benjamin Mahler commented on MESOS-3763:


I recently made it so that you can directly send a {{Request}} object, but 
didn't expose this in the header. Probably worth just having an 
{{http::request(Request)}} method to avoid having all the functions and 
overloads for the different http methods.

> Need for http::put request method
> -
>
> Key: MESOS-3763
> URL: https://issues.apache.org/jira/browse/MESOS-3763
> Project: Mesos
>  Issue Type: Task
>Reporter: Joerg Schad
>Assignee: Joerg Schad
>Priority: Minor
>
> As we decided to create a more restful api for managing Quota request.
> Therefore we also want to use the HTTP put request and hence need to enable 
> the libprocess/http to send put request besides get and post requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-2186) Mesos crashes if any configured zookeeper does not resolve.

2015-10-22 Thread Steven Schlansker (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968622#comment-14968622
 ] 

Steven Schlansker commented on MESOS-2186:
--

That's a bummer.  Thank you everyone for looking and your time.

> Mesos crashes if any configured zookeeper does not resolve.
> ---
>
> Key: MESOS-2186
> URL: https://issues.apache.org/jira/browse/MESOS-2186
> Project: Mesos
>  Issue Type: Bug
>Affects Versions: 0.21.0, 0.26.0
> Environment: Zookeeper:  3.4.5+28-1.cdh4.7.1.p0.13.el6
> Mesos: 0.21.0-1.0.centos65
> CentOS: CentOS release 6.6 (Final)
>Reporter: Daniel Hall
>Priority: Critical
>  Labels: mesosphere
>
> When starting Mesos, if one of the configured zookeeper servers does not 
> resolve in DNS Mesos will crash and refuse to start. We noticed this issue 
> while we were rebuilding one of our zookeeper hosts in Google compute (which 
> bases the DNS on the machines running).
> Here is a log from a failed startup (hostnames and ip addresses have been 
> sanitised).
> {noformat}
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.088835 
> 28627 main.cpp:292] Starting Mesos master
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 2014-12-09 
> 22:54:54,095:28627(0x7fa9f042f700):ZOO_ERROR@getaddrs@599: getaddrinfo: No 
> such file or directory
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: F1209 22:54:54.095239 
> 28642 zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such 
> file or directory [2]
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: *** Check failure stack 
> trace: ***
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 2014-12-09 
> 22:54:54,097:28627(0x7fa9ed22a700):ZOO_ERROR@getaddrs@599: getaddrinfo: No 
> such file or directory
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: F1209 22:54:54.097718 
> 28647 zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such 
> file or directory [2]
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: *** Check failure stack 
> trace: ***
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a0160  
> google::LogMessage::Fail()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a0160  
> google::LogMessage::Fail()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a00b9  
> google::LogMessage::SendToLog()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 2014-12-09 
> 22:54:54,108:28627(0x7fa9ef02d700):ZOO_ERROR@getaddrs@599: getaddrinfo: No 
> such file or directory
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: F1209 22:54:54.097718 
> 28647 zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such 
> file or directory [2]F1209 22:54:54.108422 28644 zookeeper.cpp:113] Failed to 
> create ZooKeeper, zookeeper_init: No such file or directory [2]
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: *** Check failure stack 
> trace: ***
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a0160  
> google::LogMessage::Fail()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 2014-12-09 
> 22:54:54,109:28627(0x7fa9f0e30700):ZOO_ERROR@getaddrs@599: getaddrinfo: No 
> such file or directory
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: 
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: F1209 22:54:54.097718 
> 28647 zookeeper.cpp:113] Failed to create ZooKeeper, zookeeper_init: No such 
> file or directory [2]F1209 22:54:54.108422 28644 zookeeper.cpp:113] Failed to 
> create ZooKeeper, zookeeper_init: No such file or directory [2]F1209 
> 22:54:54.109864 28641 zookeeper.cpp:113] Failed to create ZooKeeper, 
> zookeeper_init: No such file or directory [2]
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: *** Check failure stack 
> trace: ***
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a0160  
> google::LogMessage::Fail()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a00b9  
> google::LogMessage::SendToLog()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: @ 0x7fa9f56a00b9  
> google::LogMessage::SendToLog()
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.123208 
> 28640 master.cpp:318] Master 20141209-225454-4155764746-5050-28627 
> (mesosmaster-2.internal) started on 10.x.x.x:5050
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.123306 
> 28640 master.cpp:366] Master allowing unauthenticated frameworks to register
> Dec  9 22:54:54 mesosmaster-2 mesos-master[28627]: I1209 22:54:54.123327 
> 28640 master.cpp:371] Master allowing unauthenticated slaves to register
> 

[jira] [Updated] (MESOS-3791) Introduce HTTP endpoints for Role management

2015-10-22 Thread Yong Qiao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yong Qiao Wang updated MESOS-3791:
--
Shepherd: Adam B

> Introduce HTTP endpoints for Role management
> 
>
> Key: MESOS-3791
> URL: https://issues.apache.org/jira/browse/MESOS-3791
> Project: Mesos
>  Issue Type: Bug
>Reporter: Yong Qiao Wang
>Assignee: Yong Qiao Wang
>
> Currently, there is already an endpoint named /roles in Mesos, which is used 
> to query all roles information, in this JIRA, we will extend this endpoint to 
> also support add, remove and update actions. It means that we will have a 
> single REST-like endpoint with multiple http verbs to distinguish between 
> different actions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3791) Introduce HTTP endpoints for Role management

2015-10-22 Thread Yong Qiao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yong Qiao Wang updated MESOS-3791:
--
Description: Currently, there is already an endpoint named /roles in Mesos, 
which is used to query all roles information, in this JIRA, we will extend this 
endpoint to also support add, remove and update actions. It means that we will 
have a single REST-like endpoint with multiple http verbs to distinguish 
between different actions. 

> Introduce HTTP endpoints for Role management
> 
>
> Key: MESOS-3791
> URL: https://issues.apache.org/jira/browse/MESOS-3791
> Project: Mesos
>  Issue Type: Bug
>Reporter: Yong Qiao Wang
>Assignee: Yong Qiao Wang
>
> Currently, there is already an endpoint named /roles in Mesos, which is used 
> to query all roles information, in this JIRA, we will extend this endpoint to 
> also support add, remove and update actions. It means that we will have a 
> single REST-like endpoint with multiple http verbs to distinguish between 
> different actions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3605) hdfs.du() fails on os x due to lack of native-hadoop library.

2015-10-22 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969976#comment-14969976
 ] 

James Peach commented on MESOS-3605:


I looked at this and found that even after fixing the WARN parsing, the 
{{HDFS}} class fails to parse the du output correctly

> hdfs.du() fails on os x due to lack of native-hadoop library.
> -
>
> Key: MESOS-3605
> URL: https://issues.apache.org/jira/browse/MESOS-3605
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher, hadoop
>Affects Versions: 0.23.0
> Environment: os x
>Reporter: alexius ludeman
>Assignee: James Peach
>
> hdfs.du() fails on os x due to lack of native-hadoop library.
> This requires a fix from https://issues.apache.org/jira/browse/MESOS-3602 
> before it's reproducible.
> per 
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html
>  OS X does not have native library support, and I could not readily find a 
> way to disable the message.  This causes hdfs.du() to fail when parsing the 
> output with the extra warning message about the unavailable native-hadoop 
> library.
> {code}
> W1007 21:47:38.362117 250429440 fetcher.cpp:451] Reverting to fetching 
> directly into the sandbox for 
> 'hdfs:///a/path/artifacts/executor/3.3.2-SNAPSHOT/executor-3.3.2-SNAPSHOT-artifact-with-dependencies-archive.tar.gz',
>  due to failure to fetch through the cache, with error: Could not determine 
> size of cache file for 
> 'lexinator@hdfs:///a/path/artifacts/executor/3.3.2-SNAPSHOT/executor-3.3.2-SNAPSHOT-artifact-with-dependencies-archive.tar.gz'
>  with error: Hadoop client could not determine size: HDFS du returned an 
> unexpected number of results: '2015-10-07 21:47:37,474 WARN  [main] 
> util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 10.4 M  
> /a/path/artifacts/executor/3.3.2-SNAPSHOT/executor-3.3.2-SNAPSHOT-artifact-with-dependencies-archive.tar.gz
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3775) MasterAllocatorTest.SlaveLost is slow

2015-10-22 Thread Niklas Quarfot Nielsen (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969952#comment-14969952
 ] 

Niklas Quarfot Nielsen commented on MESOS-3775:
---

[~marco-mesos] Can you help schedule this during the next sprint?

> MasterAllocatorTest.SlaveLost is slow
> -
>
> Key: MESOS-3775
> URL: https://issues.apache.org/jira/browse/MESOS-3775
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> The {{MasterAllocatorTest.SlaveLost}} takes more that {{5s}} to complete. A 
> brief look into the code hints that the stopped agent does not quit 
> immediately (and hence its resources are not released by the allocator) 
> because [it waits for the executor to 
> terminate|https://github.com/apache/mesos/blob/master/src/tests/master_allocator_tests.cpp#L717].
>  {{5s}} timeout comes from {{EXECUTOR_SHUTDOWN_GRACE_PERIOD}} agent constant.
> Possible solutions:
> * Do not wait until the stopped agent quits (can be flaky, needs deeper 
> analysis).
> * Decrease the agent's {{executor_shutdown_grace_period}} flag.
> * Terminate the executor faster (this may require some refactoring since the 
> executor driver is created in the {{TestContainerizer}} and we do not have 
> direct access to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3775) MasterAllocatorTest.SlaveLost is slow

2015-10-22 Thread Niklas Quarfot Nielsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niklas Quarfot Nielsen updated MESOS-3775:
--
Labels: mesosphere  (was: )

> MasterAllocatorTest.SlaveLost is slow
> -
>
> Key: MESOS-3775
> URL: https://issues.apache.org/jira/browse/MESOS-3775
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> The {{MasterAllocatorTest.SlaveLost}} takes more that {{5s}} to complete. A 
> brief look into the code hints that the stopped agent does not quit 
> immediately (and hence its resources are not released by the allocator) 
> because [it waits for the executor to 
> terminate|https://github.com/apache/mesos/blob/master/src/tests/master_allocator_tests.cpp#L717].
>  {{5s}} timeout comes from {{EXECUTOR_SHUTDOWN_GRACE_PERIOD}} agent constant.
> Possible solutions:
> * Do not wait until the stopped agent quits (can be flaky, needs deeper 
> analysis).
> * Decrease the agent's {{executor_shutdown_grace_period}} flag.
> * Terminate the executor faster (this may require some refactoring since the 
> executor driver is created in the {{TestContainerizer}} and we do not have 
> direct access to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3775) MasterAllocatorTest.SlaveLost is slow

2015-10-22 Thread Marco Massenzio (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969987#comment-14969987
 ] 

Marco Massenzio commented on MESOS-3775:


Moved to the top of the backlog (BTW anyone can do this: right-click in the 
Backlog view on our Scrum board).

> MasterAllocatorTest.SlaveLost is slow
> -
>
> Key: MESOS-3775
> URL: https://issues.apache.org/jira/browse/MESOS-3775
> Project: Mesos
>  Issue Type: Bug
>  Components: technical debt, test
>Reporter: Alexander Rukletsov
>Priority: Minor
>  Labels: mesosphere
>
> The {{MasterAllocatorTest.SlaveLost}} takes more that {{5s}} to complete. A 
> brief look into the code hints that the stopped agent does not quit 
> immediately (and hence its resources are not released by the allocator) 
> because [it waits for the executor to 
> terminate|https://github.com/apache/mesos/blob/master/src/tests/master_allocator_tests.cpp#L717].
>  {{5s}} timeout comes from {{EXECUTOR_SHUTDOWN_GRACE_PERIOD}} agent constant.
> Possible solutions:
> * Do not wait until the stopped agent quits (can be flaky, needs deeper 
> analysis).
> * Decrease the agent's {{executor_shutdown_grace_period}} flag.
> * Terminate the executor faster (this may require some refactoring since the 
> executor driver is created in the {{TestContainerizer}} and we do not have 
> direct access to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-22 Thread Steven Schlansker (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970064#comment-14970064
 ] 

Steven Schlansker commented on MESOS-3771:
--

Sounds good to me.  I think removing that field from JSON is fine for us.

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1, 0.26.0
>Reporter: Steven Schlansker
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-22 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970104#comment-14970104
 ] 

Klaus Ma commented on MESOS-3765:
-

But how to define the "allocation chunk"? For example, two framework(f1 and 
f2), each of them have a task acquire 1 CPU; one slave with only 1 CPU. I think 
we need a way to let allocator know the minimal resource requirement: by 
{{requestResources()}} or by {{filter}}. The framework can NOT define the 
resource requirement ahead because it dependent on the pending workload in it.

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970102#comment-14970102
 ] 

Joseph Wu commented on MESOS-3771:
--

Created [MESOS-3794] to track the point #2 above.  That portion will likely be 
a more involved change.

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1, 0.26.0
>Reporter: Steven Schlansker
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-191) Add support for multiple disk resources

2015-10-22 Thread Anindya Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970157#comment-14970157
 ] 

Anindya Sinha commented on MESOS-191:
-

Thanks for the activity/discussion on this JIRA. I had been working on a 
proposal for addressing this issue while looking into MESOS-3421 - sharing of 
persistent volumes (which is now, sharing of resources). I think it is more or 
less on the same lines as the ongoing discussion.
I know I am late to this discussion but I thought I would still share the 
proposal and see if this adds any value to the ongoing discussion:

https://docs.google.com/document/d/1VlUSDrzg7OdBEX6ayRsV6JWRQUDushKfgzi8FxTOXKs/edit?usp=sharing





> Add support for multiple disk resources
> ---
>
> Key: MESOS-191
> URL: https://issues.apache.org/jira/browse/MESOS-191
> Project: Mesos
>  Issue Type: Story
>Reporter: Vinod Kone
>  Labels: mesosphere, persistent-volumes
>
> It would be nice to schedule mesos tasks with fine-grained disk scheduling. 
> The idea is, a slave with multiple spindles, would specify spindle specific 
> config. Mesos would then include this info in its resource offers to 
> frameworks. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3771) Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII handling

2015-10-22 Thread Joseph Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970063#comment-14970063
 ] 

Joseph Wu commented on MESOS-3771:
--

Sync'd with BenH and [~bmahler] about this (offline).

The proposed solution is the following:
# None of the state endpoints should be dumping out the binary {{data}} fields 
in the first place.  This includes {{ExecutorInfo}} ([dumped by 
master|https://github.com/apache/mesos/blob/master/src/common/http.cpp#L317]) 
and {{TaskInfo}} ([dumped by 
agent|https://github.com/apache/mesos/blob/master/src/slave/http.cpp#L103]).  
#* These fields should be removed from the output entirely.  Existing 
frameworks should not be relying on this information.  [~stevenschlansker], can 
you confirm this with Spark?
#* We can easily back-port this patch, if absolutely necessary.
# Master should not be storing the {{data}} fields from {{ExecutorInfo}}.  We 
currently [store the entire 
object|https://github.com/apache/mesos/blob/master/src/master/master.hpp#L262-L271],
 which means master would be as high risk of OOM-ing if a bunch of executors 
were started with big {{data}} blobs.
#* Master should scrub out unneeded bloat from {{ExecutorInfo}} before storing 
it.
#* We can use an alternate internal object, like we do for {{TaskInfo}} vs 
{{Task}}; see 
[this|https://github.com/apache/mesos/blob/master/src/messages/messages.proto#L39-L41].

> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII 
> handling
> ---
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Affects Versions: 0.24.1, 0.26.0
>Reporter: Steven Schlansker
>Assignee: Joseph Wu
>Priority: Critical
>  Labels: mesosphere
>
> Spark encodes some binary data into the ExecutorInfo.data field.  This field 
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without 
> any regards to proper character encoding:
> {code}
> 0006b0b0  2e 73 70 61 72 6b 2e 65  78 65 63 75 74 6f 72 2e  |.spark.executor.|
> 0006b0c0  4d 65 73 6f 73 45 78 65  63 75 74 6f 72 42 61 63  |MesosExecutorBac|
> 0006b0d0  6b 65 6e 64 22 7d 2c 22  64 61 74 61 22 3a 22 ac  |kend"},"data":".|
> 0006b0e0  ed 5c 75 30 30 30 30 5c  75 30 30 30 35 75 72 5c  |.\u\u0005ur\|
> 0006b0f0  75 30 30 30 30 5c 75 30  30 30 66 5b 4c 73 63 61  |u\u000f[Lsca|
> 0006b100  6c 61 2e 54 75 70 6c 65  32 3b 2e cc 5c 75 30 30  |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
>   JSON::Object object;
>   object.values["executor_id"] = executorInfo.executor_id().value();
>   object.values["name"] = executorInfo.name();
>   object.values["data"] = executorInfo.data();
>   object.values["framework_id"] = executorInfo.framework_id().value();
>   object.values["command"] = model(executorInfo.command());
>   object.values["resources"] = model(executorInfo.resources());
>   return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems 
> to not have any idea of what a byte array is.  I'm guessing that some 
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
>   // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
>   // See RFC4627 for the JSON string specificiation.
>   return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here.  Our cluster is currently entirely down -- 
> the frameworks cannot handle parsing the invalid JSON produced (it is not 
> even valid utf-8)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3794) Master should not store arbitrarily sized data in ExecutorInfo

2015-10-22 Thread Joseph Wu (JIRA)
Joseph Wu created MESOS-3794:


 Summary: Master should not store arbitrarily sized data in 
ExecutorInfo
 Key: MESOS-3794
 URL: https://issues.apache.org/jira/browse/MESOS-3794
 Project: Mesos
  Issue Type: Bug
  Components: master
Reporter: Joseph Wu
Assignee: Joseph Wu
Priority: Critical


>From a comment in [MESOS-3771]:

Master should not be storing the {{data}} fields from {{ExecutorInfo}}.  We 
currently [store the entire 
object|https://github.com/apache/mesos/blob/master/src/master/master.hpp#L262-L271],
 which means master would be at high risk of OOM-ing if a bunch of executors 
were started with big {{data}} blobs.
* Master should scrub out unneeded bloat from {{ExecutorInfo}} before storing 
it.
* We can use an alternate internal object, like we do for {{TaskInfo}} vs 
{{Task}}; see 
[this|https://github.com/apache/mesos/blob/master/src/messages/messages.proto#L39-L41].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3733) ContentType/SchedulerTest.Suppress/0 is flaky

2015-10-22 Thread Guangya Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968843#comment-14968843
 ] 

Guangya Liu commented on MESOS-3733:


RR: https://reviews.apache.org/r/39548/

> ContentType/SchedulerTest.Suppress/0 is flaky
> -
>
> Key: MESOS-3733
> URL: https://issues.apache.org/jira/browse/MESOS-3733
> Project: Mesos
>  Issue Type: Bug
>  Components: HTTP API
>Reporter: Anand Mazumdar
>Assignee: Guangya Liu
>  Labels: flaky-test
>
> Showed up on ASF CI:
> https://builds.apache.org/job/Mesos/931/COMPILER=clang,CONFIGURATION=--verbose,OS=ubuntu:14.04,label_exp=docker%7C%7CHadoop/console
> {code}
> [ RUN  ] ContentType/SchedulerTest.Suppress/0
> Using temporary directory '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi'
> I1014 17:34:11.225731 27650 leveldb.cpp:176] Opened db in 2.974504ms
> I1014 17:34:11.226856 27650 leveldb.cpp:183] Compacted db in 980779ns
> I1014 17:34:11.227028 27650 leveldb.cpp:198] Created db iterator in 37641ns
> I1014 17:34:11.227159 27650 leveldb.cpp:204] Seeked to beginning of db in 
> 14959ns
> I1014 17:34:11.227283 27650 leveldb.cpp:273] Iterated through 0 keys in the 
> db in 14672ns
> I1014 17:34:11.227449 27650 replica.cpp:746] Replica recovered with log 
> positions 0 -> 0 with 1 holes and 0 unlearned
> I1014 17:34:11.228469 27680 recover.cpp:449] Starting replica recovery
> I1014 17:34:11.229202 27673 recover.cpp:475] Replica is in EMPTY status
> I1014 17:34:11.231384 27673 replica.cpp:642] Replica in EMPTY status received 
> a broadcasted recover request from (10262)@172.17.2.194:37545
> I1014 17:34:11.231745 27673 recover.cpp:195] Received a recover response from 
> a replica in EMPTY status
> I1014 17:34:11.234242 27680 master.cpp:376] Master 
> 0cc41e7f-8d87-4c2f-9543-3f7198f9fdaf (23af00e0dbe0) started on 
> 172.17.2.194:37545
> I1014 17:34:11.234283 27680 master.cpp:378] Flags at startup: --acls="" 
> --allocation_interval="1secs" --allocator="HierarchicalDRF" 
> --authenticate="false" --authenticate_slaves="true" 
> --authenticators="crammd5" --authorizers="local" 
> --credentials="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials" 
> --framework_sorter="drf" --help="false" --hostname_lookup="true" 
> --initialize_driver_logging="true" --log_auto_initialize="true" 
> --logbufsecs="0" --logging_level="INFO" --max_slave_ping_timeouts="5" 
> --quiet="false" --recovery_slave_removal_limit="100%" 
> --registry="replicated_log" --registry_fetch_timeout="1mins" 
> --registry_store_timeout="25secs" --registry_strict="true" 
> --root_submissions="true" --slave_ping_timeout="15secs" 
> --slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" 
> --webui_dir="/mesos/mesos-0.26.0/_inst/share/mesos/webui" 
> --work_dir="/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/master" 
> --zk_session_timeout="10secs"
> I1014 17:34:11.234679 27680 master.cpp:425] Master allowing unauthenticated 
> frameworks to register
> I1014 17:34:11.234694 27680 master.cpp:428] Master only allowing 
> authenticated slaves to register
> I1014 17:34:11.234705 27680 credentials.hpp:37] Loading credentials for 
> authentication from 
> '/tmp/ContentType_SchedulerTest_Suppress_0_qcnnQi/credentials'
> I1014 17:34:11.235251 27673 recover.cpp:566] Updating replica status to 
> STARTING
> I1014 17:34:11.235857 27680 master.cpp:467] Using default 'crammd5' 
> authenticator
> I1014 17:34:11.236006 27680 master.cpp:504] Authorization enabled
> I1014 17:34:11.236187 27673 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 729504ns
> I1014 17:34:11.236224 27673 replica.cpp:323] Persisted replica status to 
> STARTING
> I1014 17:34:11.236227 27678 whitelist_watcher.cpp:79] No whitelist given
> I1014 17:34:11.236366 27676 hierarchical.cpp:140] Initialized hierarchical 
> allocator process
> I1014 17:34:11.236495 27677 recover.cpp:475] Replica is in STARTING status
> I1014 17:34:11.237670 27678 replica.cpp:642] Replica in STARTING status 
> received a broadcasted recover request from (10263)@172.17.2.194:37545
> I1014 17:34:11.238782 27673 recover.cpp:195] Received a recover response from 
> a replica in STARTING status
> I1014 17:34:11.238916 27672 master.cpp:1609] The newly elected leader is 
> master@172.17.2.194:37545 with id 0cc41e7f-8d87-4c2f-9543-3f7198f9fdaf
> I1014 17:34:11.238993 27672 master.cpp:1622] Elected as the leading master!
> I1014 17:34:11.239013 27672 master.cpp:1382] Recovering from registrar
> I1014 17:34:11.239480 27672 recover.cpp:566] Updating replica status to VOTING
> I1014 17:34:11.239630 27675 registrar.cpp:309] Recovering registrar
> I1014 17:34:11.240074 27673 leveldb.cpp:306] Persisting metadata (8 bytes) to 
> leveldb took 452562ns
> I1014 17:34:11.240137 27673 replica.cpp:323] Persisted replica status 

[jira] [Updated] (MESOS-3177) Dynamic roles/weights configuration at runtime

2015-10-22 Thread Yong Qiao Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yong Qiao Wang updated MESOS-3177:
--
Issue Type: Epic  (was: Improvement)
   Summary: Dynamic roles/weights configuration at runtime  (was: Make 
Mesos own configuration of roles/weights)

> Dynamic roles/weights configuration at runtime
> --
>
> Key: MESOS-3177
> URL: https://issues.apache.org/jira/browse/MESOS-3177
> Project: Mesos
>  Issue Type: Epic
>  Components: master, slave
>Reporter: Cody Maloney
>Assignee: Yong Qiao Wang
>  Labels: mesosphere
>
> All roles and weights must currently be specified up-front when starting 
> Mesos masters currently. In addition, they should be consistent on every 
> master, otherwise unexpected behavior could occur (You can have them be 
> inconsistent for some upgrade paths / changing the set).
> This makes it hard to introduce new groups of machines under new roles 
> dynamically (Have to generate a new master configuration, deploy that, before 
> we can connect slaves with a new role to the cluster).
> Ideally an administrator can manually add / remove / edit roles and have the 
> settings replicated / passed to all masters in the cluster by Mesos. 
> Effectively Mesos takes ownership of the setting, rather than requiring it to 
> be done externally.
> In addition, if a new slave joins the cluster with an unexpected / new role 
> that should just work, making it much easier to introduce machines with new 
> roles. (Policy around whether or not a slave can cause creation of a new 
> role, a given slave can register with a given role, etc. is out of scope, and 
> would be controls in the general registration process).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3791) Introduce HTTP endpoints for Role management

2015-10-22 Thread Yong Qiao Wang (JIRA)
Yong Qiao Wang created MESOS-3791:
-

 Summary: Introduce HTTP endpoints for Role management
 Key: MESOS-3791
 URL: https://issues.apache.org/jira/browse/MESOS-3791
 Project: Mesos
  Issue Type: Bug
Reporter: Yong Qiao Wang
Assignee: Yong Qiao Wang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-3792) flags.acls in /state.json response is not the flag value passed to Mesos master

2015-10-22 Thread James Fisher (JIRA)
James Fisher created MESOS-3792:
---

 Summary: flags.acls in /state.json response is not the flag value 
passed to Mesos master
 Key: MESOS-3792
 URL: https://issues.apache.org/jira/browse/MESOS-3792
 Project: Mesos
  Issue Type: Bug
Reporter: James Fisher


Steps to reproduce: Start Mesos master with the `--acls` flag set to the 
following value:

{code}
{ "run_tasks": [ { "principals": { "values": ["foo", "bar"] }, "users": { 
"values": ["alice"] } } ] }
{code}

Then make a request to {{http://mesosmaster:5050/state.json}} and extract the 
value for key `flags.acls` from the JSON body of the response.

Expected behavior: the value is the same JSON string passed on the command-line.

Actual behavior: the value is this string in some unknown syntax:

{code}
run_tasks {
  principals {
values: "foo"
values: "bar"
  }
  users {
values: "alice"
  }
}
{code}

I don't know what this is, but it's not an ACL expression according to the 
documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3494) Add Test for Docker RemotePuller

2015-10-22 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14970330#comment-14970330
 ] 

Klaus Ma commented on MESOS-3494:
-

Any background to share?

> Add Test for Docker RemotePuller
> 
>
> Key: MESOS-3494
> URL: https://issues.apache.org/jira/browse/MESOS-3494
> Project: Mesos
>  Issue Type: Task
>Reporter: Gilbert Song
>Assignee: Gilbert Song
>
> Add unit test for Docker RemotePuller implementation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3709) Modulize the containerizer interface.

2015-10-22 Thread Benjamin Bannier (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969179#comment-14969179
 ] 

Benjamin Bannier commented on MESOS-3709:
-

Another approach would be to internalize the various {{*ID}} and {{SlaveState}} 
parameters and e.g. supply them on construction of the base class and so keep 
all this inside internal code.

> Modulize the containerizer interface.
> -
>
> Key: MESOS-3709
> URL: https://issues.apache.org/jira/browse/MESOS-3709
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>Assignee: Benjamin Bannier
>
> So that people can implement their own containerizer as a module. That's more 
> efficient than having an external containerizer and shell out. The module 
> system also provides versioning support, this is definitely better than 
> unversioned external containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MESOS-3709) Modulize the containerizer interface.

2015-10-22 Thread Benjamin Bannier (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Bannier updated MESOS-3709:

Assignee: (was: Benjamin Bannier)

> Modulize the containerizer interface.
> -
>
> Key: MESOS-3709
> URL: https://issues.apache.org/jira/browse/MESOS-3709
> Project: Mesos
>  Issue Type: Bug
>Reporter: Jie Yu
>
> So that people can implement their own containerizer as a module. That's more 
> efficient than having an external containerizer and shell out. The module 
> system also provides versioning support, this is definitely better than 
> unversioned external containerizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-22 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969288#comment-14969288
 ] 

Klaus Ma commented on MESOS-3765:
-

For [~gyliu]'s proposal on {{requestResources}}, I think that would be a long 
term solution for us; maybe we can re-visit it in "allocator refactor" ticket. 

[~alexr], for the "granularity", suggest to make the default value to be 
{{total resource/framework #}} of each resources. But one concern is if lots of 
frameworks, the allocator's performance will be downgrade. Maybe we can have a 
try to see the performance. Any comments.

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-22 Thread Klaus Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Klaus Ma updated MESOS-3765:

Comment: was deleted

(was: For [~gyliu]'s proposal on {{requestResources}}, I think that would be a 
long term solution for us; maybe we can re-visit it in "allocator refactor" 
ticket. 

[~alexr], for the "granularity", suggest to make the default value to be 
{{total resource/framework #}} of each resources. But one concern is if lots of 
frameworks, the allocator's performance will be downgrade. Maybe we can have a 
try to see the performance. Any comments.)

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3602) hdfs du fails due to prepended / on path

2015-10-22 Thread James Peach (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969383#comment-14969383
 ] 

James Peach commented on MESOS-3602:


I think the right approach here is to allow the {{HDFS}} wrapper to accept 
{{hdfs://}} URLs, since they are also an absolute path into HDFS. I considered 
allowing the caller to explicitly set the HFDS namenode, but that seems 
burdensome when operators can already express that in the HDFS URL.

> hdfs du fails due to prepended / on path
> 
>
> Key: MESOS-3602
> URL: https://issues.apache.org/jira/browse/MESOS-3602
> Project: Mesos
>  Issue Type: Bug
>  Components: fetcher
>Affects Versions: 0.23.0
>Reporter: alexius ludeman
>Assignee: James Peach
>
> hdfs.hpp du() fails to run.  It appears to prepend "/" but the path passed in 
> is a uri of something like "hdfs:///a/path/to/artifact.tar.gz".
> {code}
> W1007 13:46:25.791894 373116928 fetcher.cpp:436] Reverting to fetching 
> directly into the sandbox for 
> 'hdfs:///a/path/to/3.3.2-SNAPSHOT/executor-3.3.2-SNAPSHOT-artifact-with-dependencies-archive.tar.gz',
>  due to failure to fetch through the cache, with error: Could not determine 
> size of cache file for 
> 'lexinator@hdfs:///a/path/to/3.3.2-SNAPSHOT/executor-3.3.2-SNAPSHOT-artifact-with-dependencies-archive.tar.gz'
>  with error: Hadoop client could not determine size: HDFS du returned an 
> unexpected number of results: '2015-10-07 13:46:21,958 WARN  [main] 
> util.NativeCodeLoader (NativeCodeLoader.java:(62)) - Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> -du: java.net.URISyntaxException: Expected scheme-specific part at index 5: 
> hdfs:
> Usage: hadoop fs [generic options] -du [-s] [-h]  ...
> {code}
> The command it's running is:
> {code}
> /usr/bin/env bash /.../hadoop fs -du -h 
> /hdfs:///a/path/to/3.3.2-SNAPSHOT/executor-3.3.2-SNAPSHOT-artifact-with-dependencies-archive.tar.gz
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-3765) Make offer size adjustable (granularity)

2015-10-22 Thread Klaus Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969382#comment-14969382
 ] 

Klaus Ma commented on MESOS-3765:
-

[~alexr], I think we can use {{requestResources()}} to send granularity; so 
allocator send offer based on granularity in each framework (the granularity 
maybe different within one framework). Here's a workflow in allocator:
1. if there's no available resources to allocate, return
2. for each framework, allocate "granularity" resources in each loop, the 
"granularity" will be a.) "total/framework #" for each resource type 
(cput/mem), if no "granularity" from framework; b.) the "granularity" from 
framework sent by {{requestResources()}}
3. keep running #2 until no available resources.

The frame work will fair-share the resource in the cluster, although greedy 
framework may send several "granularity" to allocator; because allocator will 
give only one "granularity"  to framework in each loop. There is still a case 
that framework overcommit by "granularity" (granularity > required resources ) 
which we can not guarantee, because allocator did not know framework's task 
size.

*Demonstration:*
Two framework(f1 and f2), each of them will acquire 1 CPU; one slave with only 
1 CPU; so the workflow will be:
- each framework will receive 0.5 CPU because no "granularity" from 
framework, the default "granularity" is 1 CPU /2 framework.
- each framework will send "granularity" by {{requestResources}} for 1 CPU, 
because the offer can not meet the requirement (1 vs. 0.5)
- allocator will send the 1 CPU to f1 (or f2) based on its "granularity"

One concern is performance downgrade, we need a perf test for it; and maybe a 
flag to control this behaviour.

If any comments, please let me know.

> Make offer size adjustable (granularity)
> 
>
> Key: MESOS-3765
> URL: https://issues.apache.org/jira/browse/MESOS-3765
> Project: Mesos
>  Issue Type: Improvement
>  Components: allocation
>Reporter: Alexander Rukletsov
>
> The built-in allocator performs "coarse-grained" allocation, meaning that it 
> always allocates the entire remaining agent resources to a single framework. 
> This may heavily impact allocation fairness in some cases, for example in 
> presence of numerous greedy frameworks and a small number of powerful agents.
> A possible solution would be to allow operators explicitly specify 
> granularity via allocator flags. While this can be tricky for non-standard 
> resources, it's pretty straightforward for {{cpus}} and {{mem}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)