RE: more WebSockets

2021-08-12 Thread Dmitry Karpov via curl-library
tioning the upper application layers (like JS WS clients) 
which deal only with a subset of WS messages - like Text and Binary, Ping/Pong 
and Close.
Such high-level clients expose only these messages in a kind of "abstract" way 
and thus they kind of hide the opcodes. 

But libcurl, in my opinion, is a transport layer below that higher layer, so it 
should provide WS opcodes in frame and messages because it is important 
"protocol" information (i.e., like HTTP method), and some clients may need to 
use reserved opcodes for their proprietary communication.

> Can opcode 3-7 and b-f be used "at will" by implementations to signal 
> something without negotiating an extension?

In general, all extensions must be negotiated, but I saw cases when some WS 
client/server implementations used reserved control opcodes for their 
proprietary communication.
I guess they used it for some kind of "real-time" (as control frames are short 
and has delivery priority over data messages) out-of-band protocol.

Thanks,
Dmitry Karpov


-Original Message-
From: Daniel Stenberg  
Sent: Thursday, August 12, 2021 12:13 AM
To: Dmitry Karpov via curl-library 
Cc: Dmitry Karpov 
Subject: RE: more WebSockets

On Wed, 11 Aug 2021, Dmitry Karpov via curl-library wrote:

Thanks for the feedback, this is very helpful!

> From a brief look at the document, it looks like Curl will provide 
> only WebSocket frame level of communication, so the client will have 
> to implement full message assembling itself.

If you by "assembling" mean concatenating multiple frames until the FIN frame, 
then my thinking was yes, so that we wouldn't have to buffer up potentially a 
large amount of data before passing it on.

How do other client implementations work and how do they handle the unlimited 
message size? Should we just impose our own maximum size and have applications 
raise it when needed?

> If my understanding is correct, then it seems like a good initial 
> approach to me - it handles the most critical WS steps: WS handshake, 
> frame sending/receiving and error reporting, even though it leaves WS 
> message layer communication (sending/receiving full message) to the clients.

That's my current thinking, yes.

I want the API to be "good enough" to get sufficiently advanced websockets 
communication going against "most" server-side websockets implementations.

When we think it is, we can start working on code to make it real and then see 
how it actually works with some early test client applications. The feature 
will be marked EXPERIMENTAL until we deem it ready anyway so there will be 
wiggle room to change things around all the way through until we decide it is 
fine enough to carve in stone (and ship enabled by default in a release).

> As it was mentioned in some of our WS-related discussions, the WS 
> message layer is more complex than the framing layer

Can you elaborate on this? Aren't ws messages just the payload from N frames 
concatenated and delivered?

I know there can be control frames injected in the middle of stream of data 
frames, but the only standard such frames are close, ping and pong and I 
imagine libcurl would handle them and thus what is passed on to the client 
would be an unbroken stream of data frames.

> libcurl should provide a way to the client to handle incoming message data. 
> This means that besides Frame-based "send/receive" callbacks, as 
> described in the document, there should be message-based callbacks on 
> top of the Frame-layer, which would allow clients to work with WS 
> messages, rather than with WS frames.

There's no way then for libcurl to avoid having to buffer the entire ws 
message, right?

> A small note about iflags:
>
> " iflags is a bitmask featuring the following (incoming) flags:
>CURLWS_TEXT - this is text data
>CURLWS_BINARY - this is binary data
>CURLWS_FIN - this is also the final fragment of a message
>CURLWS_CLOSE - this transfer is now closed"
>
> The "Text" and "Binary" are special WS frame/message opcodes, so it is 
> probably better to distinguish frame flags and the frame opcodes 
> instead of mixing them together.

This has been mentioned before but I don't understand why.

Why does it matter to an application exactly how the information arrived? The 
application doesn't see the websocket protocol and it doesn't have to know much 
about it using this API.

> If client is supposed to handle WS frames it gets from libcurl, then 
> it needs to know the precise opcode along with the frame flags, so it 
> can properly handle cases when "control" and "data" frames from 
> different messages are intermixed (i.e. one large "data" messages 
> intermixed with many "control" messages) and when some "custom

Re: more WebSockets

2021-08-12 Thread Stefan Eissing via curl-library


> Am 12.08.2021 um 09:33 schrieb Stefan Eissing via curl-library 
> :
> 
> One thing from rfc6455, ch. 5.4:
> 
> "An intermediary MUST NOT change the fragmentation of a message if
>  any reserved bit values are used and the meaning of these values
>  is not known to the intermediary."
> 
> 
> which I read as: if you want to use libcurl as an intermediary, it needs
> to expose the frames and its bits.
> 
> Since libcurl never will do any semantic interpretation of the frames, I
> would always regard it as an "intermediary".

Commenting myself:

But if one wants to disregard all this "future proof" and "maybe one day 
multiplexing"
thing in the standard, a re-assembly of fragments into "messages" seems useful 
for
an application.

> 
>> Am 12.08.2021 um 09:24 schrieb Daniel Stenberg via curl-library 
>> :
>> 
>> On Thu, 12 Aug 2021, Weston Schmidt wrote:
>> 
>>> I'd like to add a flag to CURLOPT_WS_OPTIONS that tells curl if it
>>> should negotiate compression or not for easy & multi.
>> 
>>> I like the automatic response to pings & pongs by default.  Perhaps
>>> another CURLOPT_WS_OPTIONS flag might disable the automatic response
>>> behavior in the cases where an app doesn't want to respond (or delay
>>> the response, etc).
>> 
>> Thanks, vert good remarks and I've added some text about it now.
>> 
>> -- 
>> 
>> / daniel.haxx.se
>> | Commercial curl support up to 24x7 is available!
>> | Private help, bug fixes, support, ports, new features
>> | https://curl.se/support.html
>> ---
>> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
>> Etiquette:   https://curl.se/mail/etiquette.html
> 
> 
> ---
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
> Etiquette:   https://curl.se/mail/etiquette.html


---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Re: more WebSockets

2021-08-12 Thread Stefan Eissing via curl-library
One thing from rfc6455, ch. 5.4:

"An intermediary MUST NOT change the fragmentation of a message if
  any reserved bit values are used and the meaning of these values
  is not known to the intermediary."


which I read as: if you want to use libcurl as an intermediary, it needs
to expose the frames and its bits.

Since libcurl never will do any semantic interpretation of the frames, I
would always regard it as an "intermediary".

> Am 12.08.2021 um 09:24 schrieb Daniel Stenberg via curl-library 
> :
> 
> On Thu, 12 Aug 2021, Weston Schmidt wrote:
> 
>> I'd like to add a flag to CURLOPT_WS_OPTIONS that tells curl if it
>> should negotiate compression or not for easy & multi.
> 
>> I like the automatic response to pings & pongs by default.  Perhaps
>> another CURLOPT_WS_OPTIONS flag might disable the automatic response
>> behavior in the cases where an app doesn't want to respond (or delay
>> the response, etc).
> 
> Thanks, vert good remarks and I've added some text about it now.
> 
> -- 
> 
> / daniel.haxx.se
> | Commercial curl support up to 24x7 is available!
> | Private help, bug fixes, support, ports, new features
> | https://curl.se/support.html
> ---
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
> Etiquette:   https://curl.se/mail/etiquette.html


---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

RE: more WebSockets

2021-08-12 Thread Daniel Stenberg via curl-library

On Thu, 12 Aug 2021, Daniel Stenberg via curl-library wrote:

Should we just impose our own maximum size and have applications raise it 
when needed?


Answering myself. =)

Ok, I'm convinced we should make the API able to provide full messages. I'll 
adjust acccording.


--

 / daniel.haxx.se
 | Commercial curl support up to 24x7 is available!
 | Private help, bug fixes, support, ports, new features
 | https://curl.se/support.html
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Re: more WebSockets

2021-08-12 Thread Daniel Stenberg via curl-library

On Thu, 12 Aug 2021, Weston Schmidt wrote:


I'd like to add a flag to CURLOPT_WS_OPTIONS that tells curl if it
should negotiate compression or not for easy & multi.



I like the automatic response to pings & pongs by default.  Perhaps
another CURLOPT_WS_OPTIONS flag might disable the automatic response
behavior in the cases where an app doesn't want to respond (or delay
the response, etc).


Thanks, vert good remarks and I've added some text about it now.

--

 / daniel.haxx.se
 | Commercial curl support up to 24x7 is available!
 | Private help, bug fixes, support, ports, new features
 | https://curl.se/support.html
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

RE: more WebSockets

2021-08-12 Thread Daniel Stenberg via curl-library

On Wed, 11 Aug 2021, Dmitry Karpov via curl-library wrote:

Thanks for the feedback, this is very helpful!

From a brief look at the document, it looks like Curl will provide only 
WebSocket frame level of communication, so the client will have to implement 
full message assembling itself.


If you by "assembling" mean concatenating multiple frames until the FIN frame, 
then my thinking was yes, so that we wouldn't have to buffer up potentially a 
large amount of data before passing it on.


How do other client implementations work and how do they handle the unlimited 
message size? Should we just impose our own maximum size and have applications 
raise it when needed?


If my understanding is correct, then it seems like a good initial approach 
to me - it handles the most critical WS steps: WS handshake, frame 
sending/receiving and error reporting, even though it leaves WS message 
layer communication (sending/receiving full message) to the clients.


That's my current thinking, yes.

I want the API to be "good enough" to get sufficiently advanced websockets 
communication going against "most" server-side websockets implementations.


When we think it is, we can start working on code to make it real and then see 
how it actually works with some early test client applications. The feature 
will be marked EXPERIMENTAL until we deem it ready anyway so there will be 
wiggle room to change things around all the way through until we decide it is 
fine enough to carve in stone (and ship enabled by default in a release).


As it was mentioned in some of our WS-related discussions, the WS message 
layer is more complex than the framing layer


Can you elaborate on this? Aren't ws messages just the payload from N frames 
concatenated and delivered?


I know there can be control frames injected in the middle of stream of data 
frames, but the only standard such frames are close, ping and pong and I 
imagine libcurl would handle them and thus what is passed on to the client 
would be an unbroken stream of data frames.


libcurl should provide a way to the client to handle incoming message data. 
This means that besides Frame-based "send/receive" callbacks, as described 
in the document, there should be message-based callbacks on top of the 
Frame-layer, which would allow clients to work with WS messages, rather than 
with WS frames.


There's no way then for libcurl to avoid having to buffer the entire ws 
message, right?



A small note about iflags:

" iflags is a bitmask featuring the following (incoming) flags:
   CURLWS_TEXT - this is text data
   CURLWS_BINARY - this is binary data
   CURLWS_FIN - this is also the final fragment of a message
   CURLWS_CLOSE - this transfer is now closed"

The "Text" and "Binary" are special WS frame/message opcodes, so it is 
probably better to distinguish frame flags and the frame opcodes instead of 
mixing them together.


This has been mentioned before but I don't understand why.

Why does it matter to an application exactly how the information arrived? The 
application doesn't see the websocket protocol and it doesn't have to know 
much about it using this API.


If client is supposed to handle WS frames it gets from libcurl, then it 
needs to know the precise opcode along with the frame flags, so it can 
properly handle cases when "control" and "data" frames from different 
messages are intermixed (i.e. one large "data" messages intermixed with many 
"control" messages) and when some "custom" opcodes are used for some 
proprietary WS communications.


To me, that sounds like an argument for providing all the opcodes through to 
the application. I didn't understand that they are actually used like that, 
especially not within the same message.


Can opcode 3-7 and b-f be used "at will" by implementations to signal 
something without negotiating an extension?


--

 / daniel.haxx.se
 | Commercial curl support up to 24x7 is available!
 | Private help, bug fixes, support, ports, new features
 | https://curl.se/support.html
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Re: more WebSockets

2021-08-12 Thread Weston Schmidt via curl-library
>> Other websockets implementations are doing that then I presume?

I'll only speak to my implementation ... I provided both a streaming
interface and a block/message interface.  The block/message is nice
for small stuff if you know limits but, for all the reasons you point
out, streaming through a library is safer & then delegate to the app
to assemble.  Streaming is also simpler when you need to deal with how
non-aligned UTF8 encoded text is handled.  A few extra callbacks with
small bits of text can reduce larger allocations, copies, or buffers
that have reserved padding at the start to handle when you have to
carry over a single 4 byte character at the start of a block of text.

I like the proposal Daniel.  The few thoughts I have:

For the easy interface I'm not sure how valuable the curl_ws_poll()
call will be.  I like the simplicity of just tx and rx.  That seems
pretty useful for several simple cases.  If you're doing more than
that you probably are going to want the flexibility of multi.

I'd like to add a flag to CURLOPT_WS_OPTIONS that tells curl if it
should negotiate compression or not for easy & multi.  This allows
users to negotiate their own subprotocols where compression may not be
allowed and instruct curl to play nicely.  Also, for debugging
purposes this would be nice.

I like the automatic response to pings & pongs by default.  Perhaps
another CURLOPT_WS_OPTIONS flag might disable the automatic response
behavior in the cases where an app doesn't want to respond (or delay
the response, etc).  Since pings and pongs are allowed to contain
application data, it would be useful to send that through the
CURL_WS_WRITE callback with a CURLWS_PING or CURLWS_PONG flag so the
application gets the payload data.  The ability for a client to
initiate a ping with it's own arbitrary data is valuable as that
enables bidirectional health checking of a connection at the
application layer.

Pings can be effectively used for all sorts of interesting out of band
data while large transfers are happening.

Wes

On Wed, Aug 11, 2021 at 11:50 PM Daniel Stenberg via curl-library
 wrote:
>
> On Wed, 11 Aug 2021, Felipe Gasper wrote:
>
> >> When a single frame can be 61 bits large?
>
> (Of course I meant 63...)
>
> And thanks for this. As you know I'm a WebSockets rookie so I need and
> appricate pointers like this!
>
> > I believe most implementations enforce a maximum message length. Mojolicious
> > (Perl), for example, stipulates 256 KiB by default.
> > (https://metacpan.org/pod/Mojo::Transaction::WebSocket#max_websocket_size) I
> > think Firefox is 2 GiB.
>
> It could of course work to have a maximum message size set, but this makes me
> curious. Surely a client will run into problems if you use 256KB max size
> against a server-side websocket thing that assumes much larger?
>
> Using up to 2 gigabytes buffer for a single message is still several
> magnitudes larger than I would want libcurl to do.
>
> Other websockets implementations are doing that then I presume?
>
> --
>
>   / daniel.haxx.se
>   | Commercial curl support up to 24x7 is available!
>   | Private help, bug fixes, support, ports, new features
>   | https://curl.se/support.html
> ---
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
> Etiquette:   https://curl.se/mail/etiquette.html
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Re: more WebSockets

2021-08-12 Thread Daniel Stenberg via curl-library

On Wed, 11 Aug 2021, Felipe Gasper wrote:


When a single frame can be 61 bits large?


(Of course I meant 63...)

And thanks for this. As you know I'm a WebSockets rookie so I need and 
appricate pointers like this!


I believe most implementations enforce a maximum message length. Mojolicious 
(Perl), for example, stipulates 256 KiB by default. 
(https://metacpan.org/pod/Mojo::Transaction::WebSocket#max_websocket_size) I 
think Firefox is 2 GiB.


It could of course work to have a maximum message size set, but this makes me 
curious. Surely a client will run into problems if you use 256KB max size 
against a server-side websocket thing that assumes much larger?


Using up to 2 gigabytes buffer for a single message is still several 
magnitudes larger than I would want libcurl to do.


Other websockets implementations are doing that then I presume?

--

 / daniel.haxx.se
 | Commercial curl support up to 24x7 is available!
 | Private help, bug fixes, support, ports, new features
 | https://curl.se/support.html
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Re: more WebSockets

2021-08-11 Thread Felipe Gasper via curl-library

> On Aug 11, 2021, at 6:34 PM, Daniel Stenberg  wrote:
> 
> On Wed, 11 Aug 2021, Felipe Gasper wrote:
> 
>> Why frame by frame? JS’s API only does full messages, and I think the RFC 
>> actually stipulates that.
> 
> When a single frame can be 61 bits large?

I believe most implementations enforce a maximum message length. Mojolicious 
(Perl), for example, stipulates 256 KiB by default. 
(https://metacpan.org/pod/Mojo::Transaction::WebSocket#max_websocket_size) I 
think Firefox is 2 GiB.

WS close code 1009 serves this purpose.

This can be enforced without receiving a full frame: parse the frame header, 
determine the size, add it to the previously-received size, and if it exceeds 
the limit, fail the connection.

That said, the RFC does, I now see, explicitly allow for streaming-type 
interfaces that give individual frame contents to the application. Frame 
boundaries, though, don’t have the same guarantees that message boundaries do; 
intermediaries/proxies are free to reshuffle those as they wish. (With TLS now 
being so prevalent that _probably_ doesn’t happen very often, though.)

-F
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

RE: more WebSockets

2021-08-11 Thread Dmitry Karpov via curl-library
Hi Daniel,

From a brief look at the document, it looks like Curl will provide only 
WebSocket frame level of communication, so the client will have to implement 
full message assembling itself.
Summarizing this approach,  it seems that libcurl will provide the following:

1. WS handshake handling over HTTP(s). This will include the "upgrade" and 
handling WS handshake errors (like WS protocol requested by the client wasn't 
selected by the server etc) and probably handling of some well-known WS 
extensions like compression.

2. Basic WS framing capabilities - sending/receiving WS frames with all 
necessary information (flags, data) needed for client to implement full message 
assembling.
   This should include compression/decompression, so the client will not have 
to deal with this low-level stuff.

3. Proper WS closure implementation, so the client will have to specify only 
close code and reason for client-initiated closures and get close code and 
reason for server initiated closures (i.e. some WS info options or close 
callback).

4. "WS alone" mode. Perform only basic WS handshake (probably with handling 
compression extension), so the client will be able to handle send/receive raw 
WS frames and do some custom processing.

5. Provide automatic Ping/Pong response and timer-based Ping/Pong pinging (with 
optional client supplied Ping data).

6. Provide WS-specific error reporting - via proper WS error codes etc.

If my understanding is correct, then it seems like a good initial approach to 
me - it handles the most critical WS steps: WS handshake, frame 
sending/receiving and error reporting, even though it leaves WS message layer 
communication (sending/receiving full message) to the clients.

As it was mentioned in some of our WS-related discussions, the WS message layer 
is more complex than the framing layer, and potentially fully assembled WS 
messages can be huge, so libcurl should provide a way to the client to handle 
incoming message data.
This means that besides Frame-based "send/receive" callbacks, as described in 
the document, there should be message-based callbacks on top of the 
Frame-layer, which would allow clients to work with WS messages, rather than 
with WS frames.

But this would require implementation of WS message layer in libcurl, which can 
be done in some subsequent extensions of WS support in libcurl.

A small note about iflags:

" iflags is a bitmask featuring the following (incoming) flags:
CURLWS_TEXT - this is text data
CURLWS_BINARY - this is binary data
CURLWS_FIN - this is also the final fragment of a message
CURLWS_CLOSE - this transfer is now closed"

The "Text" and "Binary" are special WS frame/message opcodes, so it is probably 
better to distinguish frame flags and the frame opcodes instead of mixing them 
together.
If client is supposed to handle WS frames it gets from libcurl, then it needs 
to know the precise opcode along with the frame flags, so it can properly 
handle cases when "control" and "data" frames from different messages are 
intermixed (i.e. one large "data" messages intermixed with many "control" 
messages)
and when some "custom" opcodes are used for some proprietary WS communications.

Thanks,
Dmitry Karpov



-Original Message-
From: curl-library  On Behalf Of Daniel 
Stenberg via curl-library
Sent: Wednesday, August 11, 2021 2:41 PM
To: libcurl hacking 
Cc: Daniel Stenberg 
Subject: more WebSockets

Hi,

I've refreshed the wiki page a bit using input from the discussion so far. See 
https://github.com/curl/curl/wiki/WebSockets

A few things I realized and tried to reflect in the page:

A single fragment can be 61 bits large and a message consists of multiple such
fragments: we must have an API that provides data piece by piece to the 
applicaiton and signal the FIN when it arrives.

We need to provide a callback-based approach (as well) to allow for many 
concurrent websocket transfers - especially for applications that want to mix 
those up with a few "regular protocol" transfers as well. I've tried to 
describe how it could work. Not sure it is flexible enough.

I added a few questions marked "TBD" in there that I don't think we have 
answered yet.

I think we can design an API that can work. What's the biggest omissions or 
mistakes in the current draft?

-- 

  / daniel.haxx.se
  | Commercial curl support up to 24x7 is available!
  | Private help, bug fixes, support, ports, new features
  | https://curl.se/support.html
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Re: more WebSockets

2021-08-11 Thread Daniel Stenberg via curl-library

On Wed, 11 Aug 2021, Felipe Gasper wrote:

Why frame by frame? JS’s API only does full messages, and I think the RFC 
actually stipulates that.


When a single frame can be 61 bits large?

--

 / daniel.haxx.se
 | Commercial curl support up to 24x7 is available!
 | Private help, bug fixes, support, ports, new features
 | https://curl.se/support.html---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Re: more WebSockets

2021-08-11 Thread Felipe Gasper via curl-library

> On Aug 11, 2021, at 17:46, Daniel Stenberg via curl-library 
>  wrote:
> 
> A single fragment can be 61 bits large and a message consists of multiple 
> such fragments: we must have an API that provides data piece by piece to the 
> applicaiton and signal the FIN when it arrives.

Why frame by frame? JS’s API only does full messages, and I think the RFC 
actually stipulates that.

-F



---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html