Re: [VOTE] Release Apache Mesos 0.24.0 (rc1)

Dario Rexin Tue, 01 Sep 2015 01:51:03 -0700

One more question. From the Mesos code it doesn’t look like events are being 
split or combined, so given I have a client that gives me access to the 
individual chunks, is it safe to assume that each chunk contains exactly one 
event? Because that would make parsing the events a lot easier for me.


Thanks,
Dario

> On Sep 1, 2015, at 8:42 AM, [email protected] wrote:
> 
> Hi Vinod,
> 
> thanks for the explanation, I got it now.
> 
> Thanks,
> Dario
> 
> On 31.08.2015, at 23:47, Vinod Kone <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> I think you might be confused with the HTTP chunked encoding and RecordIO 
>> encoding. Most HTTP client libraries dechunk the stream before presenting it 
>> to the application. So the application needs to know the encoding of the 
>> dechunked data to be able to process it.
>> 
>> In Mesos's case, the server (master here) can encode it in JSON or Protobuf. 
>> We wanted to have a consistent way to encode both these formats and 
>> Record-IO format was the one we settled on. Note that this format is also 
>> used by the Twitter streaming API 
>> <https://dev.twitter.com/streaming/overview/processing> (see delimited 
>> messages section).
>> 
>> HTH,
>> 
>> On Mon, Aug 31, 2015 at 2:09 PM, Dario Rexin <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi Vino,
>> 
>>> On Aug 31, 2015, at 9:36 PM, Vinod Kone <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi Dario,
>>> 
>>> Can you test with "curl --no-buffer" option? Looks like your stdout might 
>>> be line-buffered.
>> 
>> that did the trick, thanks!
>> 
>>> 
>>> The reason we used record-io formatting is to be consistent in how we 
>>> stream protobuf and json encoded data.
>>> 
>> 
>> How does simple chunked encoding prevent you from doing this?
>> 
>> Thanks,
>> Dario
>> 
>>> On Fri, Aug 28, 2015 at 2:04 PM, <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Anand,
>>> 
>>> thanks for the explanation. I didn't think about the case when you have to 
>>> split a message, now it makes sense.
>>> 
>>> But the case I observed with curl is still weird. Even when splitting a 
>>> message, it should still receive both parts almost at the same time. Do you 
>>> have any idea why it could behave like this?
>>> 
>>> On 28.08.2015, at 21:31, Anand Mazumdar <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>>> Dario,
>>>> 
>>>> Most HTTP libraries/parsers ( including one that Mesos uses internally ) 
>>>> provide a way to specify a default size of each chunk. If a Mesos Event is 
>>>> too big , it would get split into smaller chunks and vice-versa.
>>>> 
>>>> -anand
>>>> 
>>>>> On Aug 28, 2015, at 11:51 AM, [email protected] 
>>>>> <mailto:[email protected]> wrote:
>>>>> 
>>>>> Anand,
>>>>> 
>>>>> in the example from my first mail you can see that curl prints the size 
>>>>> of a message and then waits for the next message and only when it 
>>>>> receives that message it will print the prior message plus the size of 
>>>>> the next message, but not the actual message.
>>>>> 
>>>>> What's the benefit of encoding multiple messages in a single chunk? You 
>>>>> could simply create a single chunk per event.
>>>>> 
>>>>> Cheers,
>>>>> Dario
>>>>> 
>>>>> On 28.08.2015, at 19:43, Anand Mazumdar <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>>> Dario,
>>>>>> 
>>>>>> Can you shed a bit more light on what you still find puzzling about the 
>>>>>> CURL behavior after my explanation ? 
>>>>>> 
>>>>>> PS: A single HTTP chunk can have 0 or more Mesos (Scheduler API) Events. 
>>>>>> So in your example, the first chunk had complete information about the 
>>>>>> first “event”, followed by partial information about the subsequent 
>>>>>> event from another chunk.
>>>>>> 
>>>>>> As for the benefit of using RecordIO format here, how else do you think 
>>>>>> we could have de-marcated two events in the response ?
>>>>>> 
>>>>>> -anand
>>>>>> 
>>>>>> 
>>>>>>> On Aug 28, 2015, at 10:01 AM, [email protected] 
>>>>>>> <mailto:[email protected]> wrote:
>>>>>>> 
>>>>>>> Anand,
>>>>>>> 
>>>>>>> thanks for the explanation. I'm still a little puzzled why curl behaves 
>>>>>>> so strange. I will check how other client behave as soon as I have a 
>>>>>>> chance.
>>>>>>> 
>>>>>>> Vinod,
>>>>>>> 
>>>>>>> what exactly is the benefit of using recordio here? Doesn't it make the 
>>>>>>> content-type somewhat wrong? If I send 'Accept: application/json' and 
>>>>>>> receive 'Content-Type: application/json', I actually expect to receive 
>>>>>>> only json in the message.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Dario
>>>>>>> 
>>>>>>> On 28.08.2015, at 18:13, Vinod Kone <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> 
>>>>>>>> I'm happy to add the "\n" after the event (note it's different from 
>>>>>>>> chunk) if that makes CURL play nicer. I'm not sure about the "\r" part 
>>>>>>>> though? Is that a nice to have or does it have some other benefit?
>>>>>>>> 
>>>>>>>> The design doc is not set in the stone since this has not been 
>>>>>>>> released yet. So definitely want to do the right/easy thing.
>>>>>>>> 
>>>>>>>> On Fri, Aug 28, 2015 at 7:53 AM, Anand Mazumdar <[email protected] 
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> Dario,
>>>>>>>> 
>>>>>>>> Thanks for the detailed explanation and for trying out the new API. 
>>>>>>>> However, this is not a bug. The output from CURL is the encoding used 
>>>>>>>> by Mesos for the events stream. From the user doc 
>>>>>>>> <https://github.com/apache/mesos/blob/master/docs/scheduler_http_api.md>:
>>>>>>>> 
>>>>>>>> "Master encodes each Event in RecordIO format, i.e., string 
>>>>>>>> representation of length of the event in bytes followed by JSON or 
>>>>>>>> binary Protobuf  (possibly compressed) encoded event. Note that the 
>>>>>>>> value of length will never be ‘0’ and the size of the length will be 
>>>>>>>> the size of unsigned integer (i.e., 64 bits). Also, note that the 
>>>>>>>> RecordIO encoding should be decoded by the scheduler whereas the 
>>>>>>>> underlying HTTP chunked encoding is typically invisible at the 
>>>>>>>> application (scheduler) layer.“
>>>>>>>> 
>>>>>>>> If you run CURL with tracing enabled i.e. —trace, the output would be 
>>>>>>>> something similar to this:
>>>>>>>> 
>>>>>>>> <= Recv header, 2 bytes (0x2)
>>>>>>>> 0000: 0d 0a                                           ..
>>>>>>>> <= Recv data, 115 bytes (0x73)
>>>>>>>> 0000: 36 64 0d 0a 31 30 35 0a 7b 22 73 75 62 73 63 72 6d..105.{"subscr
>>>>>>>> 0010: 69 62 65 64 22 3a 7b 22 66 72 61 6d 65 77 6f 72 ibed":{"framewor
>>>>>>>> 0020: 6b 5f 69 64 22 3a 7b 22 76 61 6c 75 65 22 3a 22 k_id":{"value":"
>>>>>>>> 0030: 32 30 31 35 30 38 32 35 2d 31 30 33 30 31 38 2d 20150825-103018-
>>>>>>>> 0040: 33 38 36 33 38 37 31 34 39 38 2d 35 30 35 30 2d 3863871498-5050-
>>>>>>>> 0050: 31 31 38 35 2d 30 30 31 30 22 7d 7d 2c 22 74 79 1185-0010"}},"ty
>>>>>>>> 0060: 70 65 22 3a 22 53 55 42 53 43 52 49 42 45 44 22 pe":"SUBSCRIBED"
>>>>>>>> 0070: 7d 0d 0a                                        }..
>>>>>>>> <others
>>>>>>>> 
>>>>>>>> In the output above, the chunks are correctly delimited by ‘CRLF' (0d 
>>>>>>>> 0a) as per the HTTP RFC. As mentioned earlier, the output that you 
>>>>>>>> observe on stdout with CURL is of the Record-IO encoding used for the 
>>>>>>>> events stream ( and is not related to the RFC ):
>>>>>>>> 
>>>>>>>> event = event-size LF
>>>>>>>>              event-data
>>>>>>>> 
>>>>>>>> Looking forward to more bug reports as you try out the new API !
>>>>>>>> 
>>>>>>>> -anand
>>>>>>>> 
>>>>>>>>> On Aug 28, 2015, at 12:56 AM, Dario Rexin <[email protected] 
>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>> 
>>>>>>>>> -1 (non-binding)
>>>>>>>>> 
>>>>>>>>> I found a breaking bug in the new HTTP API. The messages do not 
>>>>>>>>> conform to the HTTP standard for chunked transfer encoding. in RFC 
>>>>>>>>> 2616 Sec. 3 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html 
>>>>>>>>> <http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html>) a chunk is 
>>>>>>>>> defined as:
>>>>>>>>> 
>>>>>>>>> chunk = chunk-size [ chunk-extension ] CRLF
>>>>>>>>>         chunk-data CRLF
>>>>>>>>> 
>>>>>>>>> The HTTP API currently sends a chunk as:
>>>>>>>>> 
>>>>>>>>> chunk = chunk-size LF
>>>>>>>>>         chunk-data
>>>>>>>>> 
>>>>>>>>> A standard conform HTTP client like curl can’t correctly interpret 
>>>>>>>>> the data as a complete chunk. In curl it currently looks like this:
>>>>>>>>> 
>>>>>>>>> 104
>>>>>>>>> {"subscribed":{"framework_id":{"value":"20150820-114552-16777343-5050-43704-0000"}},"type":"SUBSCRIBED"}20
>>>>>>>>> {"type":"HEARTBEAT”}666
>>>>>>>>> …. waiting …
>>>>>>>>> {"offers":{"offers":[{"agent_id":{"value":"20150820-114552-16777343-5050-43704-S0"},"framework_id":{"value":"20150820-114552-16777343-5050-43704-0000"},"hostname":"localhost","id":{"value":"20150820-114552-16777343-5050-43704-O0"},"resources":[{"name":"cpus","role":"*","scalar":{"value":8},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":15360},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":2965448},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"localhost","ip":"127.0.0.1","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS”}20
>>>>>>>>> … waiting …
>>>>>>>>> {"type":"HEARTBEAT”}20
>>>>>>>>> … waiting …
>>>>>>>>> 
>>>>>>>>> It will receive a couple of messages after successful registration 
>>>>>>>>> with the master and the last thing printed is a number (in this case 
>>>>>>>>> 666). Then after some time it will print the first offers message 
>>>>>>>>> followed by the number 20. The explanation for this behavior is, that 
>>>>>>>>> curl can’t interpret the data it gets from Mesos as a complete chunk 
>>>>>>>>> and waits for the missing data. So it prints what it thinks is a 
>>>>>>>>> chunk (a message followed by the size of the next messsage) and keeps 
>>>>>>>>> the rest of the message until another message arrives and so on. The 
>>>>>>>>> fix for this is to terminate both lines, the message size and the 
>>>>>>>>> message data, with CRLF.
>>>>>>>>> 
>>>>>>>>> Cheers,
>>>>>>>>> Dario
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>>> 
>> 
>>

Re: [VOTE] Release Apache Mesos 0.24.0 (rc1)

Reply via email to