Hi Vinod, thanks for the explanation, I got it now.
Thanks, Dario > On 31.08.2015, at 23:47, Vinod Kone <[email protected]> wrote: > > I think you might be confused with the HTTP chunked encoding and RecordIO > encoding. Most HTTP client libraries dechunk the stream before presenting it > to the application. So the application needs to know the encoding of the > dechunked data to be able to process it. > > In Mesos's case, the server (master here) can encode it in JSON or Protobuf. > We wanted to have a consistent way to encode both these formats and Record-IO > format was the one we settled on. Note that this format is also used by the > Twitter streaming API (see delimited messages section). > > HTH, > >> On Mon, Aug 31, 2015 at 2:09 PM, Dario Rexin <[email protected]> wrote: >> Hi Vino, >> >>> On Aug 31, 2015, at 9:36 PM, Vinod Kone <[email protected]> wrote: >>> >>> Hi Dario, >>> >>> Can you test with "curl --no-buffer" option? Looks like your stdout might >>> be line-buffered. >> >> that did the trick, thanks! >> >>> >>> The reason we used record-io formatting is to be consistent in how we >>> stream protobuf and json encoded data. >> >> How does simple chunked encoding prevent you from doing this? >> >> Thanks, >> Dario >> >>>> On Fri, Aug 28, 2015 at 2:04 PM, <[email protected]> wrote: >>>> Anand, >>>> >>>> thanks for the explanation. I didn't think about the case when you have to >>>> split a message, now it makes sense. >>>> >>>> But the case I observed with curl is still weird. Even when splitting a >>>> message, it should still receive both parts almost at the same time. Do >>>> you have any idea why it could behave like this? >>>> >>>>> On 28.08.2015, at 21:31, Anand Mazumdar <[email protected]> wrote: >>>>> >>>>> Dario, >>>>> >>>>> Most HTTP libraries/parsers ( including one that Mesos uses internally ) >>>>> provide a way to specify a default size of each chunk. If a Mesos Event >>>>> is too big , it would get split into smaller chunks and vice-versa. >>>>> >>>>> -anand >>>>> >>>>>> On Aug 28, 2015, at 11:51 AM, [email protected] wrote: >>>>>> >>>>>> Anand, >>>>>> >>>>>> in the example from my first mail you can see that curl prints the size >>>>>> of a message and then waits for the next message and only when it >>>>>> receives that message it will print the prior message plus the size of >>>>>> the next message, but not the actual message. >>>>>> >>>>>> What's the benefit of encoding multiple messages in a single chunk? You >>>>>> could simply create a single chunk per event. >>>>>> >>>>>> Cheers, >>>>>> Dario >>>>>> >>>>>>> On 28.08.2015, at 19:43, Anand Mazumdar <[email protected]> wrote: >>>>>>> >>>>>>> Dario, >>>>>>> >>>>>>> Can you shed a bit more light on what you still find puzzling about the >>>>>>> CURL behavior after my explanation ? >>>>>>> >>>>>>> PS: A single HTTP chunk can have 0 or more Mesos (Scheduler API) >>>>>>> Events. So in your example, the first chunk had complete information >>>>>>> about the first “event”, followed by partial information about the >>>>>>> subsequent event from another chunk. >>>>>>> >>>>>>> As for the benefit of using RecordIO format here, how else do you think >>>>>>> we could have de-marcated two events in the response ? >>>>>>> >>>>>>> -anand >>>>>>> >>>>>>> >>>>>>>> On Aug 28, 2015, at 10:01 AM, [email protected] wrote: >>>>>>>> >>>>>>>> Anand, >>>>>>>> >>>>>>>> thanks for the explanation. I'm still a little puzzled why curl >>>>>>>> behaves so strange. I will check how other client behave as soon as I >>>>>>>> have a chance. >>>>>>>> >>>>>>>> Vinod, >>>>>>>> >>>>>>>> what exactly is the benefit of using recordio here? Doesn't it make >>>>>>>> the content-type somewhat wrong? If I send 'Accept: application/json' >>>>>>>> and receive 'Content-Type: application/json', I actually expect to >>>>>>>> receive only json in the message. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Dario >>>>>>>> >>>>>>>>> On 28.08.2015, at 18:13, Vinod Kone <[email protected]> wrote: >>>>>>>>> >>>>>>>>> I'm happy to add the "\n" after the event (note it's different from >>>>>>>>> chunk) if that makes CURL play nicer. I'm not sure about the "\r" >>>>>>>>> part though? Is that a nice to have or does it have some other >>>>>>>>> benefit? >>>>>>>>> >>>>>>>>> The design doc is not set in the stone since this has not been >>>>>>>>> released yet. So definitely want to do the right/easy thing. >>>>>>>>> >>>>>>>>>> On Fri, Aug 28, 2015 at 7:53 AM, Anand Mazumdar >>>>>>>>>> <[email protected]> wrote: >>>>>>>>>> Dario, >>>>>>>>>> >>>>>>>>>> Thanks for the detailed explanation and for trying out the new API. >>>>>>>>>> However, this is not a bug. The output from CURL is the encoding >>>>>>>>>> used by Mesos for the events stream. From the user doc: >>>>>>>>>> >>>>>>>>>> "Master encodes each Event in RecordIO format, i.e., string >>>>>>>>>> representation of length of the event in bytes followed by JSON or >>>>>>>>>> binary Protobuf (possibly compressed) encoded event. Note that the >>>>>>>>>> value of length will never be ‘0’ and the size of the length will be >>>>>>>>>> the size of unsigned integer (i.e., 64 bits). Also, note that the >>>>>>>>>> RecordIO encoding should be decoded by the scheduler whereas the >>>>>>>>>> underlying HTTP chunked encoding is typically invisible at the >>>>>>>>>> application (scheduler) layer.“ >>>>>>>>>> >>>>>>>>>> If you run CURL with tracing enabled i.e. —trace, the output would >>>>>>>>>> be something similar to this: >>>>>>>>>> >>>>>>>>>> <= Recv header, 2 bytes (0x2) >>>>>>>>>> 0000: 0d 0a .. >>>>>>>>>> <= Recv data, 115 bytes (0x73) >>>>>>>>>> 0000: 36 64 0d 0a 31 30 35 0a 7b 22 73 75 62 73 63 72 >>>>>>>>>> 6d..105.{"subscr >>>>>>>>>> 0010: 69 62 65 64 22 3a 7b 22 66 72 61 6d 65 77 6f 72 >>>>>>>>>> ibed":{"framewor >>>>>>>>>> 0020: 6b 5f 69 64 22 3a 7b 22 76 61 6c 75 65 22 3a 22 >>>>>>>>>> k_id":{"value":" >>>>>>>>>> 0030: 32 30 31 35 30 38 32 35 2d 31 30 33 30 31 38 2d >>>>>>>>>> 20150825-103018- >>>>>>>>>> 0040: 33 38 36 33 38 37 31 34 39 38 2d 35 30 35 30 2d >>>>>>>>>> 3863871498-5050- >>>>>>>>>> 0050: 31 31 38 35 2d 30 30 31 30 22 7d 7d 2c 22 74 79 >>>>>>>>>> 1185-0010"}},"ty >>>>>>>>>> 0060: 70 65 22 3a 22 53 55 42 53 43 52 49 42 45 44 22 >>>>>>>>>> pe":"SUBSCRIBED" >>>>>>>>>> 0070: 7d 0d 0a }.. >>>>>>>>>> <others >>>>>>>>>> >>>>>>>>>> In the output above, the chunks are correctly delimited by ‘CRLF' >>>>>>>>>> (0d 0a) as per the HTTP RFC. As mentioned earlier, the output that >>>>>>>>>> you observe on stdout with CURL is of the Record-IO encoding used >>>>>>>>>> for the events stream ( and is not related to the RFC ): >>>>>>>>>> >>>>>>>>>> event = event-size LF >>>>>>>>>> event-data >>>>>>>>>> >>>>>>>>>> Looking forward to more bug reports as you try out the new API ! >>>>>>>>>> >>>>>>>>>> -anand >>>>>>>>>> >>>>>>>>>>> On Aug 28, 2015, at 12:56 AM, Dario Rexin <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> -1 (non-binding) >>>>>>>>>>> >>>>>>>>>>> I found a breaking bug in the new HTTP API. The messages do not >>>>>>>>>>> conform to the HTTP standard for chunked transfer encoding. in RFC >>>>>>>>>>> 2616 Sec. 3 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html) >>>>>>>>>>> a chunk is defined as: >>>>>>>>>>> >>>>>>>>>>> chunk = chunk-size [ chunk-extension ] CRLF >>>>>>>>>>> chunk-data CRLF >>>>>>>>>>> >>>>>>>>>>> The HTTP API currently sends a chunk as: >>>>>>>>>>> >>>>>>>>>>> chunk = chunk-size LF >>>>>>>>>>> chunk-data >>>>>>>>>>> >>>>>>>>>>> A standard conform HTTP client like curl can’t correctly interpret >>>>>>>>>>> the data as a complete chunk. In curl it currently looks like this: >>>>>>>>>>> >>>>>>>>>>> 104 >>>>>>>>>>> {"subscribed":{"framework_id":{"value":"20150820-114552-16777343-5050-43704-0000"}},"type":"SUBSCRIBED"}20 >>>>>>>>>>> {"type":"HEARTBEAT”}666 >>>>>>>>>>> …. waiting … >>>>>>>>>>> {"offers":{"offers":[{"agent_id":{"value":"20150820-114552-16777343-5050-43704-S0"},"framework_id":{"value":"20150820-114552-16777343-5050-43704-0000"},"hostname":"localhost","id":{"value":"20150820-114552-16777343-5050-43704-O0"},"resources":[{"name":"cpus","role":"*","scalar":{"value":8},"type":"SCALAR"},{"name":"mem","role":"*","scalar":{"value":15360},"type":"SCALAR"},{"name":"disk","role":"*","scalar":{"value":2965448},"type":"SCALAR"},{"name":"ports","ranges":{"range":[{"begin":31000,"end":32000}]},"role":"*","type":"RANGES"}],"url":{"address":{"hostname":"localhost","ip":"127.0.0.1","port":5051},"path":"\/slave(1)","scheme":"http"}}]},"type":"OFFERS”}20 >>>>>>>>>>> … waiting … >>>>>>>>>>> {"type":"HEARTBEAT”}20 >>>>>>>>>>> … waiting … >>>>>>>>>>> >>>>>>>>>>> It will receive a couple of messages after successful registration >>>>>>>>>>> with the master and the last thing printed is a number (in this >>>>>>>>>>> case 666). Then after some time it will print the first offers >>>>>>>>>>> message followed by the number 20. The explanation for this >>>>>>>>>>> behavior is, that curl can’t interpret the data it gets from Mesos >>>>>>>>>>> as a complete chunk and waits for the missing data. So it prints >>>>>>>>>>> what it thinks is a chunk (a message followed by the size of the >>>>>>>>>>> next messsage) and keeps the rest of the message until another >>>>>>>>>>> message arrives and so on. The fix for this is to terminate both >>>>>>>>>>> lines, the message size and the message data, with CRLF. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Dario >

