Re: [Rd] Question on non-blocking socket

2023-02-17 Thread Simon Urbanek
Ben,

yes, by definition - non-blocking means that reads won't block and always 
return immediately (the point of non-blocking). The loop below is terrible as 
it will cause 100% CPU usage while it's spinning. It seems that you want to 
block so why are you using non-blocking mode? select() effectively gets you 
back to blocking mode, because it does the "block" that read() would normally 
do in blocking mode. Moreover select() allows you to block for a specified time 
(the point of the timeout argument) so if you want to wait, you should set the 
timeout - you should never use a spin loop without timeouts. Also there are 
many other conditions you should be handling - there may be an error on the 
socket or EINTR (you should call R's interrupt handler) or EAGAIN (which you do 
implicitly, but you can't tell it from an actual error).

Sockets and I/O are quite complex matter - it's easy to get it wrong and create 
hard-to-detect bugs in you code unless you are an expert in it. It's one of the 
wheels you don't want to be reinventing.

Cheers,
Simon


> On Feb 18, 2023, at 3:00 AM, Ben Engbers  wrote:
> 
> Hi Tomas,
> 
> Apparently, inserting some kind of socketSelect() is essential when using 
> non-blocking sockets and a client/erve architecture. That is at least one 
> thing that I have learned ;-).
> 
> In C++, between sending and requesting, I inserted a call to this function:
> bool wait(int s) {
>  fd_set read_set;
>  struct timeval timeout {};
>  memset(, 0, sizeof(timeout));
>  bool done{};
>  while (!done ) {
>FD_ZERO(_set);
>FD_SET(s, _set);
>int rc = select(s + 1, _set, NULL, NULL, );
>done = (rc == 1) && FD_ISSET(s, _set);
>  };
>  return done;
> };
> 
> Inserting this call was essential in solving my problem.
> 
> Ben
> 
> Op 15-02-2023 om 17:17 schreef Tomas Kalibera:
>> In the example you are waiting only for a single byte. But if the response 
>> may be longer, one needs to take into account in the client that not all 
>> bytes of the response may be available right away. One would keep receiving 
>> the data in a loop, as they become available (e.g. socketSelect() would 
>> tell), keep appending them to a buffer, and keep looking for when they are 
>> complete.
>> Tomas
>>> Ben
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question on non-blocking socket

2023-02-17 Thread Ben Engbers

Hi Tomas,

Apparently, inserting some kind of socketSelect() is essential when 
using non-blocking sockets and a client/erve architecture. That is at 
least one thing that I have learned ;-).


In C++, between sending and requesting, I inserted a call to this function:
bool wait(int s) {
  fd_set read_set;
  struct timeval timeout {};
  memset(, 0, sizeof(timeout));
  bool done{};
  while (!done ) {
FD_ZERO(_set);
FD_SET(s, _set);
int rc = select(s + 1, _set, NULL, NULL, );
done = (rc == 1) && FD_ISSET(s, _set);
  };
  return done;
};

Inserting this call was essential in solving my problem.

Ben

Op 15-02-2023 om 17:17 schreef Tomas Kalibera:
In the example you are waiting only for a single byte. But if the 
response may be longer, one needs to take into account in the client 
that not all bytes of the response may be available right away. One 
would keep receiving the data in a loop, as they become available (e.g. 
socketSelect() would tell), keep appending them to a buffer, and keep 
looking for when they are complete.


Tomas

Ben


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question on non-blocking socket

2023-02-16 Thread Charlie Gao via R-devel
> Date: Wed, 15 Feb 2023 01:24:26 +0100
> From: Ben Engbers 
> To: r-devel@r-project.org
> Subject: [Rd] Question on non-blocking socket
> Message-ID: <68ce63b0-7e91-6372-6926-59f3fcfff...@be-logical.nl>
> Content-Type: text/plain; charset="utf-8"; Format="flowed"
> 
> Hi,
> 
> December 27, 2021 I started a thread asking for help troubleshooting 
> non-blocking sockets.
> While developing the RBaseX client, I had issues with the authentication 
> process. It eventually turned out that a short break had to be inserted 
> in this process between sending the credentials to the server and 
> requesting the status. Tomas Kalibera put me on the right track by 
> drawing my attention to the 'socketSelect' function. I don't know 
> exactly the purpose of this function is (the function itself is 
> documented, but I can't find any information for which situations this 
> function should be called.) but it sufficed to call this function once 
> between sending and requesting.
> 
> I have two questions.
> The first is where I can find R documentation on proper use of 
> non-blocking sockets and on the proper use of the socketSelect function?
> 
> The second question is more focused on using non-blocking sockets in 
> general. Is it allowed to execute a read and a receive command 
> immediately after each other or must a short waiting loop be built in.
> I'm asking this because I'm running into the same problems in a C++ 
> project as I did with RBaseX.
> 
> Ben Engbers
> 

Hi Ben,

For an easier experience with sockets, you may wish to have a look at the 
`nanonext` package. This wraps 'NNG' and is generally used for messaging over 
its own protocols (req/rep, pub/sub etc.), although you can also use it for 
HTTP and websockets.

In any case, a low level stream interface allows connecting with arbitrary 
sockets. Using something like `s <- stream(dial = "tcp://0.0.0.0:")` 
substituting in the actual address. This would allow you greater flexibility in 
sending and receiving over the bytestream without worrying so much about order 
and timing as per your current experience.

For example, a common pattern this allows for is doing an async receive `r <- 
recv_aio(s)`  before sending a request `send(s, "some request")`, and then 
query the receive result afterwards at `r$data`.

I won't go into too much detail here, but as it is my own package, please feel 
free to reach out separately via email or github etc.

Thanks,

Charlie

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question on non-blocking socket

2023-02-15 Thread Tomas Kalibera



On 2/15/23 16:44, Ben Engbers wrote:

Hi

Op 15-02-2023 om 14:38 schreef Tomas Kalibera:

On 2/15/23 01:24, Ben Engbers wrote:

Hi,

December 27, 2021 I started a thread asking for help troubleshooting 
non-blocking sockets.

..


I have two questions.
The first is where I can find R documentation on proper use of 
non-blocking sockets and on the proper use of the socketSelect 
function?


In addition to the demos I sent to you in that 2021 thread on 
R-pkg-devel, you could also have a look at how it is used in R 
itself, in the parallel package, in snowSOCK.R, to set up the snow 
cluster in parallel. Some hints may be also found in the blog post 
https://blog.r-project.org/2020/03/17/socket-connections-update/. 
But, in principle, R API is just a thin layer on top of what the OS 
provides, so general literature and tutorials on sockets should help, 
there should be even textbooks used at CS universities in networking 
classes.

Thanks for the suggestions!

Basically select() can tell you when data is ready (on input), when 
the socket interface is able to accept more data (on output) or when 
there is an incoming connection. In practice, you should not need any 
delays to be inserted in your program to make it work - if that is 
needed, it means that is an error in it (a race condition). If the 
program is polling (checking in a loop whether something has already 
happened, and then sleeping for a short while), the duration of the 
sleep may indeed influence latency, but should not affect correctness 
- if it does, there is an error.


In RBaseX I first calculate an MD5 hash that is send to the server and 
then I check the status byte that is returned by the server.


writeBin(auth, private$conn)
socketSelect(list(conn))
Accepted <- readBin(conn, what = "raw", n = 1) == 0

Without the second line, 'Accepted' is always FALSE. With this line it 
is TRUE.


BaseX provides example API's in several languages. I've looked at 
several but indeed none uses any form of delay.
All API's follow the same pattern, calculate a MD5, send it to the 
server and check the status byte. So the server is not likely to 
enforce a delay. So there is nothing left but to look for that racing 
condition ;-(


Without knowing more details, this looks ok. If you have a non-blocking 
connection, and the server produces a response based on the client 
request, the client has to take into account that it takes the server 
some time to produce the response. Right, the sockets are full duplex 
and so could be the communication protocol, but in this case it 
apparently isn't, it is request/response.


Without the second line, there would be a race condition between the 
server sending a response and the client receiving it. With the second 
line, the client waits for the server before it starts receiving. In 
theory, one could be waiting for the response actively in a loop 
(polling), but socketSelect() is better. Both ways would resolve the 
race condition. Adding a single fixed-time wait, instead, would not 
remove the race condition, because one can never be sure that the server 
wouldn't take longer (apart from waiting too long most of the time).


In the example you are waiting only for a single byte. But if the 
response may be longer, one needs to take into account in the client 
that not all bytes of the response may be available right away. One 
would keep receiving the data in a loop, as they become available (e.g. 
socketSelect() would tell), keep appending them to a buffer, and keep 
looking for when they are complete.


Tomas



Ben


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question on non-blocking socket

2023-02-15 Thread Ben Engbers

Hi

Op 15-02-2023 om 14:38 schreef Tomas Kalibera:

On 2/15/23 01:24, Ben Engbers wrote:

Hi,

December 27, 2021 I started a thread asking for help troubleshooting 
non-blocking sockets.

..


I have two questions.
The first is where I can find R documentation on proper use of 
non-blocking sockets and on the proper use of the socketSelect function?


In addition to the demos I sent to you in that 2021 thread on 
R-pkg-devel, you could also have a look at how it is used in R itself, 
in the parallel package, in snowSOCK.R, to set up the snow cluster in 
parallel. Some hints may be also found in the blog post 
https://blog.r-project.org/2020/03/17/socket-connections-update/. But, 
in principle, R API is just a thin layer on top of what the OS provides, 
so general literature and tutorials on sockets should help, there should 
be even textbooks used at CS universities in networking classes.

Thanks for the suggestions!

Basically select() can tell you when data is ready (on input), when the 
socket interface is able to accept more data (on output) or when there 
is an incoming connection. In practice, you should not need any delays 
to be inserted in your program to make it work - if that is needed, it 
means that is an error in it (a race condition). If the program is 
polling (checking in a loop whether something has already happened, and 
then sleeping for a short while), the duration of the sleep may indeed 
influence latency, but should not affect correctness - if it does, there 
is an error.


In RBaseX I first calculate an MD5 hash that is send to the server and 
then I check the status byte that is returned by the server.


writeBin(auth, private$conn)
socketSelect(list(conn))
Accepted <- readBin(conn, what = "raw", n = 1) == 0

Without the second line, 'Accepted' is always FALSE. With this line it 
is TRUE.


BaseX provides example API's in several languages. I've looked at 
several but indeed none uses any form of delay.
All API's follow the same pattern, calculate a MD5, send it to the 
server and check the status byte. So the server is not likely to enforce 
a delay. So there is nothing left but to look for that racing condition ;-(


Ben

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question on non-blocking socket

2023-02-15 Thread Ivan Krylov
В Wed, 15 Feb 2023 01:24:26 +0100
Ben Engbers  пишет:

> where I can find R documentation on proper use of non-blocking
> sockets and on the proper use of the socketSelect function?

A useful guide to the Berkeley sockets API can be found at
. You'll have to translate between the C
idioms and the R idioms, but it's better than having no guide at all.
In particular, R spares you from having to figure out differently-sized
struct sockaddr objects by converting them to string representations of
the addresses (currently limited to IPv4).

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Question on non-blocking socket

2023-02-15 Thread Tomas Kalibera



On 2/15/23 01:24, Ben Engbers wrote:

Hi,

December 27, 2021 I started a thread asking for help troubleshooting 
non-blocking sockets.
While developing the RBaseX client, I had issues with the 
authentication process. It eventually turned out that a short break 
had to be inserted in this process between sending the credentials to 
the server and requesting the status. Tomas Kalibera put me on the 
right track by drawing my attention to the 'socketSelect' function. I 
don't know exactly the purpose of this function is (the function 
itself is documented, but I can't find any information for which 
situations this function should be called.) but it sufficed to call 
this function once between sending and requesting.


I have two questions.
The first is where I can find R documentation on proper use of 
non-blocking sockets and on the proper use of the socketSelect function?


In addition to the demos I sent to you in that 2021 thread on 
R-pkg-devel, you could also have a look at how it is used in R itself, 
in the parallel package, in snowSOCK.R, to set up the snow cluster in 
parallel. Some hints may be also found in the blog post 
https://blog.r-project.org/2020/03/17/socket-connections-update/. But, 
in principle, R API is just a thin layer on top of what the OS provides, 
so general literature and tutorials on sockets should help, there should 
be even textbooks used at CS universities in networking classes.


Basically select() can tell you when data is ready (on input), when the 
socket interface is able to accept more data (on output) or when there 
is an incoming connection. In practice, you should not need any delays 
to be inserted in your program to make it work - if that is needed, it 
means that is an error in it (a race condition). If the program is 
polling (checking in a loop whether something has already happened, and 
then sleeping for a short while), the duration of the sleep may indeed 
influence latency, but should not affect correctness - if it does, there 
is an error.


The second question is more focused on using non-blocking sockets in 
general. Is it allowed to execute a read and a receive command 
immediately after each other or must a short waiting loop be built in.
I'm asking this because I'm running into the same problems in a C++ 
project as I did with RBaseX.


No, in general there is no need to insert any delays between reads and 
writes, they can actually happen concurrently. But these are general 
networking questions, not the topic of this list.


Best
Tomas



Ben Engbers

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel