[jira] [Commented] (MESOS-4114) Add field VIP to message Port

2015-12-11 Thread Tobi Knaup (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053596#comment-15053596
 ] 

Tobi Knaup commented on MESOS-4114:
---

[~sargun] DiscoveryInfo was meant to be the thing that clients or service 
discovery systems read. The local port seems like an implementation detail that 
should not be visible to clients. Back when this proto was introduced there was 
a lot of debate around whether name etc. should just go into labels or if there 
should be explicit members, and the decision was to be explicit, so I'd 
recommend a new vip member to follow that pattern. There was an assumption that 
each container would have it's own IP and instances of the service would listen 
on the same port that is listed in DiscoveryInfo, so there would be no need to 
call out the local port. In the absence of IP per container I think it makes 
sense to add a new member to Port called localPort or instancePort.

I'm not sure I understand the second point - different services will have 
different TaskInfo/ExecutorInfo and therefore different DiscoveryInfo, so you 
can set different names and IPs.

> Add field VIP to message Port
> -
>
> Key: MESOS-4114
> URL: https://issues.apache.org/jira/browse/MESOS-4114
> Project: Mesos
>  Issue Type: Wish
>Reporter: Sargun Dhillon
>Assignee: Avinash Sridharan
>Priority: Trivial
>  Labels: mesosphere
>
> We would like to extend the Mesos protocol buffer 'Port' to include an 
> optional repeated string named "VIP" - to map it to a well known virtual IP, 
> or virtual hostname for discovery purposes.
> We also want this field exposed in DiscoveryInfo in state.json.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-4114) Add field VIP to message Port

2015-12-11 Thread Tobi Knaup (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053092#comment-15053092
 ] 

Tobi Knaup commented on MESOS-4114:
---

Can you explain why the VIP should go into Port vs. DiscoveryInfo?
If I have an app that has a public port (8080) and an admin port (8081) I'd 
expect to reach them both on the same VIP, so DiscoveryInfo seems like the 
right place.

> Add field VIP to message Port
> -
>
> Key: MESOS-4114
> URL: https://issues.apache.org/jira/browse/MESOS-4114
> Project: Mesos
>  Issue Type: Wish
>Reporter: Sargun Dhillon
>Assignee: Avinash Sridharan
>Priority: Trivial
>  Labels: mesosphere
>
> We would like to extend the Mesos protocol buffer 'Port' to include an 
> optional repeated string named "VIP" - to map it to a well known virtual IP, 
> or virtual hostname for discovery purposes.
> We also want this field exposed in DiscoveryInfo in state.json.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-2785) Slave crashes during checkpointing when no space is left on disk

2015-05-30 Thread Tobi Knaup (JIRA)
Tobi Knaup created MESOS-2785:
-

 Summary: Slave crashes during checkpointing when no space is left 
on disk
 Key: MESOS-2785
 URL: https://issues.apache.org/jira/browse/MESOS-2785
 Project: Mesos
  Issue Type: Bug
  Components: slave
Reporter: Tobi Knaup


This happened on a slave where tasks filled up the disk that work_dir is on.
Slave logs:

{noformat}
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: I0530 
23:36:03.088995  1354 slave.cpp:1144] Got assigned task 
broker-2-fde59f6b-7437-4678-995e-8f9812e4f4bf for framework 
20150530-210001-419692554-5050-1832-0001
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: F0530 
23:36:03.089443  1354 slave.cpp:4136] CHECK_SOME(state::checkpoint(path, 
info)): Failed to write temporary file 
'/var/lib/mesos/slave/meta/slaves/20150530-210001-419692554-5050-1832-S4/frameworks/20150530-210001-419692554-5050-1832-0001/QuDFUs':
 Failed to write size: No space left on device
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: *** Check failure 
stack trace: ***
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb8625c69fd  google::LogMessage::Fail()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb8625c889d  google::LogMessage::SendToLog()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb8625c65ec  google::LogMessage::Flush()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb8625c91be  google::LogMessageFatal::~LogMessageFatal()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb86308291c  mesos::internal::slave::Framework::Framework()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb863085699  mesos::internal::slave::Slave::runTask()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb8630af1fa  ProtobufProcess::handler4()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb86309346e  std::_Function_handler::_M_invoke()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb8630ab34a  ProtobufProcess::visit()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb86342100a  process::ProcessManager::resume()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb8634212cc  process::schedule()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb86186a53d  (unknown)
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 
0x7fb8615a2f7d  (unknown)
{noformat}

One workaround would be to add a command line option to configure a different 
path for the sandbox dir so it can be configured to use a different disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1922) Slave blocks on the fetcher after terminating an executor

2014-10-14 Thread Tobi Knaup (JIRA)
Tobi Knaup created MESOS-1922:
-

 Summary: Slave blocks on the fetcher after terminating an executor
 Key: MESOS-1922
 URL: https://issues.apache.org/jira/browse/MESOS-1922
 Project: Mesos
  Issue Type: Bug
Reporter: Tobi Knaup


When the slave terminates an executor because the registration timeout hits, it 
will hold on to the fetcher process if it is still running, and not send a 
TASK_FAILED until the fetcher exists. Expected behavior would be to terminate 
both the executor and the fetcher, and send then send the status update 
immediately.

Here are some logs:

{code}
I1014 11:36:56.761726 209186816 slave.cpp:1139] Launching task 
download.1370d754-53d1-11e4-9fc2-0a002700 for framework 
20140927-211310-16777343-5050-44274-0001
I1014 11:36:56.766891 205430784 containerizer.cpp:394] Starting container 
'be0f9918-986a-4692-ba9f-8c07871c5226' for executor 
'download.1370d754-53d1-11e4-9fc2-0a002700' of framework 
'20140927-211310-16777343-5050-44274-0001'
I1014 11:36:56.766922 209186816 slave.cpp:1252] Queuing task 
'download.1370d754-53d1-11e4-9fc2-0a002700' for executor 
download.1370d754-53d1-11e4-9fc2-0a002700 of framework 
'20140927-211310-16777343-5050-44274-0001
I1014 11:36:56.768117 205430784 launcher.cpp:137] Forked child with pid '13624' 
for container 'be0f9918-986a-4692-ba9f-8c07871c5226'
I1014 11:36:56.768647 205430784 containerizer.cpp:510] Fetching URIs for 
container 'be0f9918-986a-4692-ba9f-8c07871c5226' using command 
'/usr/local/libexec/mesos/mesos-fetcher'
I1014 11:37:43.375211 207577088 slave.cpp:3132] Current usage 92.50%. Max 
allowed age: 0ns
I1014 11:37:56.768044 205967360 slave.cpp:3089] Terminating executor 
download.1370d754-53d1-11e4-9fc2-0a002700 of framework 
20140927-211310-16777343-5050-44274-0001 because it did not register within 
1mins
I1014 11:37:56.768321 206503936 containerizer.cpp:882] Destroying container 
'be0f9918-986a-4692-ba9f-8c07871c5226'
I1014 11:37:56.817491 207577088 containerizer.cpp:997] Executor for container 
'be0f9918-986a-4692-ba9f-8c07871c5226' has exited
{code}

At this point there is still a running fetcher. After killing it manually I see:

{code}
W1014 11:49:06.310417 207040512 containerizer.cpp:872] Ignoring destroy of 
unknown container: be0f9918-986a-4692-ba9f-8c07871c5226
E1014 11:49:06.310560 208650240 slave.cpp:2564] Container 
'be0f9918-986a-4692-ba9f-8c07871c5226' for executor 
'download.1370d754-53d1-11e4-9fc2-0a002700' of framework 
'20140927-211310-16777343-5050-44274-0001' failed to start: Failed to fetch 
URIs for container 'be0f9918-986a-4692-ba9f-8c07871c5226': exit status 15
E1014 11:49:06.310597 208650240 slave.cpp:2659] Termination of executor 
'download.1370d754-53d1-11e4-9fc2-0a002700' of framework 
'20140927-211310-16777343-5050-44274-0001' failed: Unknown container: 
be0f9918-986a-4692-ba9f-8c07871c5226
E1014 11:49:06.310699 205430784 slave.cpp:2945] Failed to unmonitor container 
for executor download.1370d754-53d1-11e4-9fc2-0a002700 of framework 
20140927-211310-16777343-5050-44274-0001: Not monitored
I1014 11:49:06.315104 208650240 slave.cpp:2115] Handling status update 
TASK_FAILED (UUID: c216eb10-cfdc-4a9e-a687-e260701daed4) for task 
download.1370d754-53d1-11e4-9fc2-0a002700 of framework 
20140927-211310-16777343-5050-44274-0001 from @0.0.0.0:0
W1014 11:49:06.315213 208113664 containerizer.cpp:788] Ignoring update for 
unknown container: be0f9918-986a-4692-ba9f-8c07871c5226
I1014 11:49:06.315398 205967360 status_update_manager.cpp:320] Received status 
update TASK_FAILED (UUID: c216eb10-cfdc-4a9e-a687-e260701daed4) for task 
download.1370d754-53d1-11e4-9fc2-0a002700 of framework 
20140927-211310-16777343-5050-44274-0001
I1014 11:49:06.315489 205967360 status_update_manager.cpp:373] Forwarding 
status update TASK_FAILED (UUID: c216eb10-cfdc-4a9e-a687-e260701daed4) for task 
download.1370d754-53d1-11e4-9fc2-0a002700 of framework 
20140927-211310-16777343-5050-44274-0001 to master@127.0.0.1:5050
I1014 11:49:06.328732 209186816 status_update_manager.cpp:398] Received status 
update acknowledgement (UUID: c216eb10-cfdc-4a9e-a687-e260701daed4) for task 
download.1370d754-53d1-11e4-9fc2-0a002700 of framework 
20140927-211310-16777343-5050-44274-0001
I1014 11:49:06.328951 207040512 slave.cpp:2811] Cleaning up executor 
'download.1370d754-53d1-11e4-9fc2-0a002700' of framework 
20140927-211310-16777343-5050-44274-0001
{code}

To reproduce, just launch a task with a URI that takes longer than executor 
registration timeout to download.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MESOS-1828) The Mesos UI should link to framework UIs

2014-09-23 Thread Tobi Knaup (JIRA)
Tobi Knaup created MESOS-1828:
-

 Summary: The Mesos UI should link to framework UIs
 Key: MESOS-1828
 URL: https://issues.apache.org/jira/browse/MESOS-1828
 Project: Mesos
  Issue Type: Wish
Reporter: Tobi Knaup


Most frameworks have a web UI or HTTP API. It would be nice to show a direct 
link from the Mesos web UI so it's easy to navigate there. Currently a user 
needs to know where schedulers are running, Mesos doesn't have that knowledge.

A framework should provide a URL to its API/UI when it connects, as part of 
FrameworkInfo. This could be explicit proto fields (e.g. gui_url, api_url), or 
more generic key/value pairs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MESOS-1593) Add DockerInfo Configuration

2014-07-15 Thread Tobi Knaup (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062596#comment-14062596
 ] 

Tobi Knaup commented on MESOS-1593:
---

Note that Dockers are typically launched as root (but they don't have to be). 
So defaulting to root is probably a good thing.

 Add DockerInfo Configuration
 

 Key: MESOS-1593
 URL: https://issues.apache.org/jira/browse/MESOS-1593
 Project: Mesos
  Issue Type: Task
Reporter: Timothy Chen
Assignee: Timothy Chen

 We want to add a new proto message to encapsulate all Docker related 
 configurations into DockerInfo.
 Here is the document that describes the design for DockerInfo:
 https://github.com/tnachen/mesos/wiki/DockerInfo-design



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1573) Having Problems running Mesosphere-Docker

2014-07-15 Thread Tobi Knaup (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062616#comment-14062616
 ] 

Tobi Knaup commented on MESOS-1573:
---

The first thing to check is log output from Mesos or Deimos. If you followed 
the tutorial the logs will be in syslog.
The Docker integration code lives at https://github.com/mesosphere/deimos 
Please report Docker related issues there.
Best place for real time help is #mesos.

 Having Problems running Mesosphere-Docker
 -

 Key: MESOS-1573
 URL: https://issues.apache.org/jira/browse/MESOS-1573
 Project: Mesos
  Issue Type: Bug
  Components: containerization, ec2, general
Reporter: Nayeem Syed
Priority: Blocker
  Labels: newbie

 I am not sure this is the best place to ask for but I tried checking the IRC 
 channel which seemed quite empty and didnt find anywhere else to ask general 
 user questions/problems. But would appreciate some pointers and directions on 
 how I can get it up.
 I tried following the instructions set on here: 
 http://mesosphere.io/learn/run-docker-on-mesosphere/
 Instead of using a local machine (I am on osx), I setup a ubuntu 14 m3.large 
 instance on my AWS acount then followed the instructions.
 I connected an elastic IP and a subdomain to the instance and opened all the 
 ports on the firewall.
 However I am not getting any docker containers running.
 Here are my mesos and marathon urls:
 Marathon: mesos.cronycle.net:8080
 Mesos: mesos.cronycle.net:5050
 I just want to get a Rails application up on a docker container and be able 
 to scale it automatically based on resource consumptions and also be able to 
 use the procfile for it similar to Heroku. 
 Is Mesos a good tool for this? I am currently looking at using Deis+CoreOS, 
 but their statistics and monitoring tools seem to be non-existant, there's no 
 automated way of monitoring processes like there is with marathon for 
 instance.
 So would have liked to make it work ideally if possible. 
 Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MESOS-1593) Add DockerInfo Configuration

2014-07-15 Thread Tobi Knaup (JIRA)

[ 
https://issues.apache.org/jira/browse/MESOS-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062976#comment-14062976
 ] 

Tobi Knaup commented on MESOS-1593:
---

You can actually launch as any user in the docker group:

$ ll /var/run/docker.sock
srw-rw 1 root docker 0 Jul 15 05:22 /var/run/docker.sock

I don't expect this to be very common though so supporting just root in the 
first pass will be fine.

 Add DockerInfo Configuration
 

 Key: MESOS-1593
 URL: https://issues.apache.org/jira/browse/MESOS-1593
 Project: Mesos
  Issue Type: Task
Reporter: Timothy Chen
Assignee: Timothy Chen

 We want to add a new proto message to encapsulate all Docker related 
 configurations into DockerInfo.
 Here is the document that describes the design for DockerInfo:
 https://github.com/tnachen/mesos/wiki/DockerInfo-design



--
This message was sent by Atlassian JIRA
(v6.2#6252)