Re: Debugging Scheduler HTTP API Failures

2016-08-15 Thread Zameer Manji
Dario was right.

I filed MESOS-6041  to
enhance the error message.

On Sun, Aug 14, 2016 at 10:35 PM, Dario Rexin  wrote:

> Zameer,
>
> the header value is enclosed in []. This is because headers can have
> multiple values and the library you use pus them into a list. You have to
> take the first item from that list and then it should work.
>
> On Aug 14, 2016, at 10:19 PM, Zameer Manji  wrote:
>
> Here is a MWE: https://github.com/zmanji/mesos-mwe
>
> Follow the instructions in the README to reproduce.
>
> On Sun, Aug 14, 2016 at 9:04 PM, Dario Rexin  wrote:
>
>> Can you post the code somewhere?
>>
>>
>> On Aug 14, 2016, at 8:54 PM, Zameer Manji  wrote:
>>
>> Dario,
>>
>> The logs show that no disconnections occur until after the second POST
>> request. I would expect a log entry indicating a disconnect between the two
>> POST requests if the stream id changed.
>>
>> On Sun, Aug 14, 2016 at 8:28 PM, Dario Rexin  wrote:
>>
>>> You’re absolutely right. I just tried the exact same steps and it worked
>>> fine for me. I also don’t see the log message. Do you have any reconnection
>>> logic in place? Is it possible, that your framework reconnected before you
>>> send the call? The Stream Id would change in that case.
>>>
>>>
>>> On Aug 14, 2016, at 6:48 PM, Zameer Manji  wrote:
>>>
>>> Dario,
>>>
>>> I do not think the case sensitivity matters here. If the master was
>>> expecting a header that was exactly 'Mesos-Stream-Id' and did not see it, I
>>> would expect to get the error response: `All non-subscribe calls should
>>> include the 'Mesos-Stream-Id' header`. That is the error response that you
>>> get when you do not set the header.
>>>
>>> Possibly related, I expected to see the stream id in the mesos logs. I
>>> see this log message
>>> 
>>>  in
>>> the code, but I do not see it in the logs.
>>>
>>>
>>> On Sun, Aug 14, 2016 at 6:12 PM, Dario Rexin  wrote:
>>>
 Oh, sorry, I didn't see you actually set the header (wall of text ;) ).
 That's an interesting issue, do you set the header case sensitive? I know
 headers shouldn't be case sensitive, but maybe there's a bug in the Mesos
 code. I have not seen this issue before.

 On Aug 14, 2016, at 5:58 PM, Zameer Manji  wrote:

 Hey,

 I'm using the Mesos HTTP API for the first time. I am currently
 encountering an issue where after a successful SUBSCRIBE call and receiving
 a SUBSCRIBED and HEARTBEAT event, a subsequent TEARDOWN call fails with
 HTTP 400 with a message of "The stream ID included in this request didn't
 match the stream ID currently associated with framework ID".

 Here is a detailed breakdown of what happens with logs:

 A new framework sends an SUBSCRIBE call with the following body:

 
 framework_id {
   value: "0dffbee9-a514-4ffa-87e1-2850dd4dcf00"
 }
 type: SUBSCRIBE
 subscribe {
   framework_info {
 user: "user"
 name: "name"
 id {
   value: "0dffbee9-a514-4ffa-87e1-2850dd4dcf00"
 }
   }
 }
 

 It then receives a 200 OK response with the following headers:
 `{content-type=[application/x-protobuf], date=[Sat, 13 Aug 2016
 02:42:48 GMT], transfer-encoding=[chunked], mesos-stream-id=[71a0294f-e9c4
 -4efe-b237-fb120836aaf8]}`

 Over this connection it receives a successful subscribed event:
 
 type: SUBSCRIBED
 subscribed {
   framework_id {
 value: "0dffbee9-a514-4ffa-87e1-2850dd4dcf00"
   }
   heartbeat_interval_seconds: 15.0
 }
 

 It also receives a single heart beat event.

 Then it tries to send the following request:
 
 Sending: framework_id {
   value: "0dffbee9-a514-4ffa-87e1-2850dd4dcf00"
 }
 type: TEARDOWN
 
 with the following headers:
 `{accept=[application/x-protobuf], accept-encoding=[gzip],
 mesos-stream-id=[71a0294f-e9c4-4efe-b237-fb120836aaf8]}`

 The response is a 400 with the body: `The stream ID included in this
 request didn't match the stream ID currently associated with framework ID
 '0dffbee9-a514-4ffa-87e1-2850dd4dcf00'`.


 The master logs contains:
 
 I0813 02:42:48.376819 13934 http.cpp:381] HTTP POST for
 /master/api/v1/scheduler from 192.168.33.1:60780 with
 User-Agent='Google-HTTP-Java-Client/1.20.0 (gzip)'
 I0813 02:42:48.376998 13934 master.cpp:2146] Received subscription
 request for HTTP framework 'name'
 I0813 02:42:48.377104 13934 master.cpp:2244] Subscribing framework
 'name' with checkpointing disabled and capabilities [  ]
 I0813 

Re: Debugging Scheduler HTTP API Failures

2016-08-14 Thread Dario Rexin
Oh, sorry, I didn't see you actually set the header (wall of text ;) ). That's 
an interesting issue, do you set the header case sensitive? I know headers 
shouldn't be case sensitive, but maybe there's a bug in the Mesos code. I have 
not seen this issue before.

> On Aug 14, 2016, at 5:58 PM, Zameer Manji  wrote:
> 
> Hey,
> 
> I'm using the Mesos HTTP API for the first time. I am currently encountering 
> an issue where after a successful SUBSCRIBE call and receiving a SUBSCRIBED 
> and HEARTBEAT event, a subsequent TEARDOWN call fails with HTTP 400 with a 
> message of "The stream ID included in this request didn't match the stream ID 
> currently associated with framework ID".
> 
> Here is a detailed breakdown of what happens with logs:
> 
> A new framework sends an SUBSCRIBE call with the following body:
> 
> 
> framework_id {
>   value: "0dffbee9-a514-4ffa-87e1-2850dd4dcf00"
> }
> type: SUBSCRIBE
> subscribe {
>   framework_info {
> user: "user"
> name: "name"
> id {
>   value: "0dffbee9-a514-4ffa-87e1-2850dd4dcf00"
> }
>   }
> }
> 
> 
> It then receives a 200 OK response with the following headers:
> `{content-type=[application/x-protobuf], date=[Sat, 13 Aug 2016 02:42:48 
> GMT], transfer-encoding=[chunked], 
> mesos-stream-id=[71a0294f-e9c4-4efe-b237-fb120836aaf8]}`
> 
> Over this connection it receives a successful subscribed event:
> 
> type: SUBSCRIBED
> subscribed {
>   framework_id {
> value: "0dffbee9-a514-4ffa-87e1-2850dd4dcf00"
>   }
>   heartbeat_interval_seconds: 15.0
> }
> 
> 
> It also receives a single heart beat event.
> 
> Then it tries to send the following request:
> 
> Sending: framework_id {
>   value: "0dffbee9-a514-4ffa-87e1-2850dd4dcf00"
> }
> type: TEARDOWN
> 
> with the following headers:
> `{accept=[application/x-protobuf], accept-encoding=[gzip], 
> mesos-stream-id=[71a0294f-e9c4-4efe-b237-fb120836aaf8]}`
> 
> The response is a 400 with the body: `The stream ID included in this request 
> didn't match the stream ID currently associated with framework ID 
> '0dffbee9-a514-4ffa-87e1-2850dd4dcf00'`.
> 
> 
> The master logs contains:
> 
> I0813 02:42:48.376819 13934 http.cpp:381] HTTP POST for 
> /master/api/v1/scheduler from 192.168.33.1:60780 with 
> User-Agent='Google-HTTP-Java-Client/1.20.0 (gzip)'
> I0813 02:42:48.376998 13934 master.cpp:2146] Received subscription request 
> for HTTP framework 'name'
> I0813 02:42:48.377104 13934 master.cpp:2244] Subscribing framework 'name' 
> with checkpointing disabled and capabilities [  ]
> I0813 02:42:48.377378 13934 hierarchical.cpp:271] Added framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00
> I0813 02:42:49.475163 13929 http.cpp:381] HTTP POST for 
> /master/api/v1/scheduler from 192.168.33.1:60782 with 
> User-Agent='Google-HTTP-Java-Client/1.20.0 (gzip)'
> I0813 02:42:51.133513 13930 master.cpp:1284] Framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00 (name) disconnected
> I0813 02:42:51.133597 13930 master.cpp:2725] Disconnecting framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00 (name)
> I0813 02:42:51.133618 13930 master.cpp:2749] Deactivating framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00 (name)
> I0813 02:42:51.133644 13930 master.cpp:1297] Giving framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00 (name) 0ns to failover
> I0813 02:42:51.133692 13932 hierarchical.cpp:382] Deactivated framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00
> I0813 02:42:51.137265 13931 master.cpp:5561] Framework failover timeout, 
> removing framework 0dffbee9-a514-4ffa-87e1-2850dd4dcf00 (name)
> I0813 02:42:51.137339 13931 master.cpp:6296] Removing framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00 (name)
> I0813 02:42:51.137464 13931 hierarchical.cpp:333] Removed framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00
> 
> Note the immediate disconnection after the second POST is intentional.
> 
> This is with Mesos 1.0.0 on Ubuntu Trusty.
> 
> What can I do to debug this issue? The logs do not provide a lot of 
> information to act on. The stream id generated by mesos is not in the logs, 
> nor anything indicating that an HTTP 400 was sent.
> 
> --
> Zameer Manji


Re: Debugging Scheduler HTTP API Failures

2016-08-14 Thread Dario Rexin
HinZameer,

when you send the SUBSCRIBE to Mesos, the response will contain a header 
'Mesos-Stream-Id'. You have to send that header with every subsequent call you 
send to Mesos for that framework.

--
Dario

> On Aug 14, 2016, at 5:58 PM, Zameer Manji  wrote:
> 
> Hey,
> 
> I'm using the Mesos HTTP API for the first time. I am currently encountering 
> an issue where after a successful SUBSCRIBE call and receiving a SUBSCRIBED 
> and HEARTBEAT event, a subsequent TEARDOWN call fails with HTTP 400 with a 
> message of "The stream ID included in this request didn't match the stream ID 
> currently associated with framework ID".
> 
> Here is a detailed breakdown of what happens with logs:
> 
> A new framework sends an SUBSCRIBE call with the following body:
> 
> 
> framework_id {
>   value: "0dffbee9-a514-4ffa-87e1-2850dd4dcf00"
> }
> type: SUBSCRIBE
> subscribe {
>   framework_info {
> user: "user"
> name: "name"
> id {
>   value: "0dffbee9-a514-4ffa-87e1-2850dd4dcf00"
> }
>   }
> }
> 
> 
> It then receives a 200 OK response with the following headers:
> `{content-type=[application/x-protobuf], date=[Sat, 13 Aug 2016 02:42:48 
> GMT], transfer-encoding=[chunked], 
> mesos-stream-id=[71a0294f-e9c4-4efe-b237-fb120836aaf8]}`
> 
> Over this connection it receives a successful subscribed event:
> 
> type: SUBSCRIBED
> subscribed {
>   framework_id {
> value: "0dffbee9-a514-4ffa-87e1-2850dd4dcf00"
>   }
>   heartbeat_interval_seconds: 15.0
> }
> 
> 
> It also receives a single heart beat event.
> 
> Then it tries to send the following request:
> 
> Sending: framework_id {
>   value: "0dffbee9-a514-4ffa-87e1-2850dd4dcf00"
> }
> type: TEARDOWN
> 
> with the following headers:
> `{accept=[application/x-protobuf], accept-encoding=[gzip], 
> mesos-stream-id=[71a0294f-e9c4-4efe-b237-fb120836aaf8]}`
> 
> The response is a 400 with the body: `The stream ID included in this request 
> didn't match the stream ID currently associated with framework ID 
> '0dffbee9-a514-4ffa-87e1-2850dd4dcf00'`.
> 
> 
> The master logs contains:
> 
> I0813 02:42:48.376819 13934 http.cpp:381] HTTP POST for 
> /master/api/v1/scheduler from 192.168.33.1:60780 with 
> User-Agent='Google-HTTP-Java-Client/1.20.0 (gzip)'
> I0813 02:42:48.376998 13934 master.cpp:2146] Received subscription request 
> for HTTP framework 'name'
> I0813 02:42:48.377104 13934 master.cpp:2244] Subscribing framework 'name' 
> with checkpointing disabled and capabilities [  ]
> I0813 02:42:48.377378 13934 hierarchical.cpp:271] Added framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00
> I0813 02:42:49.475163 13929 http.cpp:381] HTTP POST for 
> /master/api/v1/scheduler from 192.168.33.1:60782 with 
> User-Agent='Google-HTTP-Java-Client/1.20.0 (gzip)'
> I0813 02:42:51.133513 13930 master.cpp:1284] Framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00 (name) disconnected
> I0813 02:42:51.133597 13930 master.cpp:2725] Disconnecting framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00 (name)
> I0813 02:42:51.133618 13930 master.cpp:2749] Deactivating framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00 (name)
> I0813 02:42:51.133644 13930 master.cpp:1297] Giving framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00 (name) 0ns to failover
> I0813 02:42:51.133692 13932 hierarchical.cpp:382] Deactivated framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00
> I0813 02:42:51.137265 13931 master.cpp:5561] Framework failover timeout, 
> removing framework 0dffbee9-a514-4ffa-87e1-2850dd4dcf00 (name)
> I0813 02:42:51.137339 13931 master.cpp:6296] Removing framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00 (name)
> I0813 02:42:51.137464 13931 hierarchical.cpp:333] Removed framework 
> 0dffbee9-a514-4ffa-87e1-2850dd4dcf00
> 
> Note the immediate disconnection after the second POST is intentional.
> 
> This is with Mesos 1.0.0 on Ubuntu Trusty.
> 
> What can I do to debug this issue? The logs do not provide a lot of 
> information to act on. The stream id generated by mesos is not in the logs, 
> nor anything indicating that an HTTP 400 was sent.
> 
> --
> Zameer Manji