Re: Asynchronous http poll

2018-05-15 Thread Didier
I think I want to simplify some things.

Normally, client/server async is implemented by the client/server framework. 
What happens is the interchange of messages between the client and server 
through the http connection on the socket is made non-blocking. But the entire 
request/response still happens within the context of a single http connection.

This often allows for the server to take in many more requests, and for the 
client to overlap other processing while waiting. That's because open IO 
operations are cheaper then open threads. So modern hardware support many more 
concurrent open connections on a socket then it does open threads.

If that's what you want, you need to move to a different client/server 
framework that supports non-blocking exchanges, such as Netty in the Java world.

If you want to avoid moving framework, or your operations are going to be 
really long, or you want to survive network drop outs. Then you can go for 
something more like what you were trying to go for.

In that case, you need to choose between push and pull.

If pull, you want a distributed map as I said, which is often known as a 
database. A sql table can do, a nosql key/value store also works. I'm a fan of 
DynamoDB for this. Ideally, you want your distributed map to have equal scaling 
capability as your server APIs will, otherwise it will become a bottleneck.

You can also go with a distributed queue, like RabbitMQ or AWS SQS. This allows 
the client to use a reactor evented response handling. Instead of having the 
client poll your get API to know if a response is availaible. You will put a 
message on the queue saying GUID-X is now done. And your client will work 
through the queue, and for every msg in it, it will poll you for the result.

If you want Push, you need the client to expose an endpoint to be contacted on 
when done. You can do this easily with AWS SNS for example. This could mean the 
client exposes a call-when-done(guid, result) API. It tells your server about 
it, and when you are done, you send a request to that API to notify the client. 
It allows the client to know right away that its done, and saves it the CPU 
work of having to poll. But it gets complicated if you fail to reach the 
client's endpoint, what happens? So with push, you can often miss a response. 
To avoid that, people often offer both pull and push.

So my practical recommendation to you would be to first look into a 
non-blocking client/server framework like Netty. Maybe that's all you need.

If not, then look into using DynamoDB or SQS or SNS or similar products.

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Asynchronous http poll

2018-05-15 Thread Brjánn Ljótsson
Thank you so much Didier for your detailed response! I will need some time
to digest it but a lot of what you write sounds very reasonable.

Thanks!

Brjánn

On 15 May 2018 at 02:57, Didier  wrote:

> Oh, I forgot something important.
>
> If you're hoping to have multiple hosts, and run this application in a
> distributed way, you really should not do this that way. Things get a lot
> more complicated. The problem is, your request queue is local to a host. So
> if the client creates the Future on S1 on host A, and calls for
> get-s1-result, and he is routed to host B? That Future will be missing.
>
> So what you need is to turn that atom map of Futures into a distributed
> one. You could still have the Future atom map, but as the last step of each
> Future, you need to update the distributed map with the result or error.
> And if you want statuses, in your loop, you should also update it for
> status. So on get-s1-result, you just check the value of that distributed
> map. Each host still processes their own share of requests, but the
> distributed map exposes their result and processing status to all other
> hosts.
>
> There's many other ways to handle this issue. For example, I believe you
> can route the client to a direct connection to the particular host who
> handled S1, so that calls to get-s1-result go to that specific host. The
> downside is, it gets harder to evenly distribute the polls. Also, it takes
> more complex infrastructure to do that, all hosts must have their IPs
> exposed to the clients for example. Another way, is the VIP might be able
> to support smarter routing, based on some indicator, or you need to use a
> Master host, which delegates back, and has that logic itself.
>
> An alternate way, is to let go of the polling, and instead have a push
> model. Your server could call the client to tell it the request is handled.
> This also has its own complexities and trade offs.
>
> Anyways, in a distributed environment, async and non-blocking becomes
> quite a bit more complex.
>
>
>
> On Monday, 14 May 2018 17:35:39 UTC-7, Didier wrote:
>>
>> Its hard to answer without additional detail.
>>
>> I'll make some assumptions, and answer assuming those are true:
>>
>> 1) I assume your S1 API is blocking, and that each request to it is
>> handled on its own thread, and that those threads come form a fixed size
>> thread pool with a size of 30.
>>
>> 2) I assume that S2 is also blocking, and that it returns a promise when
>> you call it. And that you need to keep polling another API, that I'll call
>> get-s2-result which takes the promise, and is also blocking, and returns
>> the result, error, or that its still not available.
>>
>> 3) I assume you want to turn your blocking S1 API, into a pseudo
>> non-blocking behavior.
>>
>> 4) Thus, you would have S1 return a promise. When called, you do not
>> process the request, but you queue the request in a "to be processed"
>> queue, and you return a promise that eventually, the request will be
>> processed and will have a value, or an error.
>>
>> 5) Similarly, you need a way for the client to check the promise, thus
>> you will also expose a blocking API that I will call get-s1-result which
>> takes the promise and returns either the result, an error, or that it's not
>> available yet.
>>
>> 6) Your promise will take the form of a GUID that uniquely identifies the
>> queued request.
>>
>> 7) This is your APIs design. Your clients can now start work and
>> integration with your APIs, while you implement its functionality.
>>
>> 8) Now you need to implement the queuing up of requests. This is where
>> you have options, and core.async is one of them. I do agree with the advice
>> of not using core.async unless simpler tools don't work. So I will start
>> with a simpler tool: Future, and a global atom map from promise GUID to
>> request map.
>>
>> 9) So you create a global atom, which contains a map from GUID -> FUTURE.
>>
>> 10) On every request to S1, you create a new GUID and Future, and you
>> swap! assoc the GUID with the Future.
>>
>> 11) The Future is your request handler. So in it, you synchronously
>> handle the request, whatever that means for you. So maybe you do some
>> processing, and then you call S2, and then you loop, and every 100ms, in
>> the loop, you call get-s2-result until it returns an error or a result.
>> Every time you loop, you check that the time its been since the time you
>> started looping is not more then X timeout, so that you don't loop forever.
>> If you eventually get a result or an error, you handle them however you
>> need too, and eventually your future itself returns a result or an error.
>> Its important you design the future task to timeout eventually. So that you
>> don't leak futures that get stuck in infinite loops. So you must be able to
>> deterministically know that the future will finish.
>>
>> 12) Now you implement get-s1-result. Whenever it is called, you get the
>> future from the 

Re: Asynchronous http poll

2018-05-14 Thread Didier
Oh, I forgot something important.

If you're hoping to have multiple hosts, and run this application in a 
distributed way, you really should not do this that way. Things get a lot 
more complicated. The problem is, your request queue is local to a host. So 
if the client creates the Future on S1 on host A, and calls for 
get-s1-result, and he is routed to host B? That Future will be missing.

So what you need is to turn that atom map of Futures into a distributed 
one. You could still have the Future atom map, but as the last step of each 
Future, you need to update the distributed map with the result or error. 
And if you want statuses, in your loop, you should also update it for 
status. So on get-s1-result, you just check the value of that distributed 
map. Each host still processes their own share of requests, but the 
distributed map exposes their result and processing status to all other 
hosts.

There's many other ways to handle this issue. For example, I believe you 
can route the client to a direct connection to the particular host who 
handled S1, so that calls to get-s1-result go to that specific host. The 
downside is, it gets harder to evenly distribute the polls. Also, it takes 
more complex infrastructure to do that, all hosts must have their IPs 
exposed to the clients for example. Another way, is the VIP might be able 
to support smarter routing, based on some indicator, or you need to use a 
Master host, which delegates back, and has that logic itself.

An alternate way, is to let go of the polling, and instead have a push 
model. Your server could call the client to tell it the request is handled. 
This also has its own complexities and trade offs.

Anyways, in a distributed environment, async and non-blocking becomes quite 
a bit more complex.


On Monday, 14 May 2018 17:35:39 UTC-7, Didier wrote:
>
> Its hard to answer without additional detail.
>
> I'll make some assumptions, and answer assuming those are true:
>
> 1) I assume your S1 API is blocking, and that each request to it is 
> handled on its own thread, and that those threads come form a fixed size 
> thread pool with a size of 30.
>
> 2) I assume that S2 is also blocking, and that it returns a promise when 
> you call it. And that you need to keep polling another API, that I'll call 
> get-s2-result which takes the promise, and is also blocking, and returns 
> the result, error, or that its still not available.
>
> 3) I assume you want to turn your blocking S1 API, into a pseudo 
> non-blocking behavior.
>
> 4) Thus, you would have S1 return a promise. When called, you do not 
> process the request, but you queue the request in a "to be processed" 
> queue, and you return a promise that eventually, the request will be 
> processed and will have a value, or an error.
>
> 5) Similarly, you need a way for the client to check the promise, thus you 
> will also expose a blocking API that I will call get-s1-result which takes 
> the promise and returns either the result, an error, or that it's not 
> available yet.
>
> 6) Your promise will take the form of a GUID that uniquely identifies the 
> queued request.
>
> 7) This is your APIs design. Your clients can now start work and 
> integration with your APIs, while you implement its functionality.
>
> 8) Now you need to implement the queuing up of requests. This is where you 
> have options, and core.async is one of them. I do agree with the advice of 
> not using core.async unless simpler tools don't work. So I will start with 
> a simpler tool: Future, and a global atom map from promise GUID to request 
> map.
>
> 9) So you create a global atom, which contains a map from GUID -> FUTURE.
>
> 10) On every request to S1, you create a new GUID and Future, and you 
> swap! assoc the GUID with the Future.
>
> 11) The Future is your request handler. So in it, you synchronously handle 
> the request, whatever that means for you. So maybe you do some processing, 
> and then you call S2, and then you loop, and every 100ms, in the loop, you 
> call get-s2-result until it returns an error or a result. Every time you 
> loop, you check that the time its been since the time you started looping 
> is not more then X timeout, so that you don't loop forever. If you 
> eventually get a result or an error, you handle them however you need too, 
> and eventually your future itself returns a result or an error. Its 
> important you design the future task to timeout eventually. So that you 
> don't leak futures that get stuck in infinite loops. So you must be able to 
> deterministically know that the future will finish.
>
> 12) Now you implement get-s1-result. Whenever it is called, you get the 
> future from the global atom map of futures, and you call future-done? on 
> it. If false, you return that the result is not available yet. If it is 
> done, you deref the future, swap! dessoc the mapEntry for it from your 
> global atom, and return the result or error.
>
> The only danger of this approach, 

Re: Asynchronous http poll

2018-05-14 Thread Didier
Its hard to answer without additional detail.

I'll make some assumptions, and answer assuming those are true:

1) I assume your S1 API is blocking, and that each request to it is handled 
on its own thread, and that those threads come form a fixed size thread 
pool with a size of 30.

2) I assume that S2 is also blocking, and that it returns a promise when 
you call it. And that you need to keep polling another API, that I'll call 
get-s2-result which takes the promise, and is also blocking, and returns 
the result, error, or that its still not available.

3) I assume you want to turn your blocking S1 API, into a pseudo 
non-blocking behavior.

4) Thus, you would have S1 return a promise. When called, you do not 
process the request, but you queue the request in a "to be processed" 
queue, and you return a promise that eventually, the request will be 
processed and will have a value, or an error.

5) Similarly, you need a way for the client to check the promise, thus you 
will also expose a blocking API that I will call get-s1-result which takes 
the promise and returns either the result, an error, or that it's not 
available yet.

6) Your promise will take the form of a GUID that uniquely identifies the 
queued request.

7) This is your APIs design. Your clients can now start work and 
integration with your APIs, while you implement its functionality.

8) Now you need to implement the queuing up of requests. This is where you 
have options, and core.async is one of them. I do agree with the advice of 
not using core.async unless simpler tools don't work. So I will start with 
a simpler tool: Future, and a global atom map from promise GUID to request 
map.

9) So you create a global atom, which contains a map from GUID -> FUTURE.

10) On every request to S1, you create a new GUID and Future, and you swap! 
assoc the GUID with the Future.

11) The Future is your request handler. So in it, you synchronously handle 
the request, whatever that means for you. So maybe you do some processing, 
and then you call S2, and then you loop, and every 100ms, in the loop, you 
call get-s2-result until it returns an error or a result. Every time you 
loop, you check that the time its been since the time you started looping 
is not more then X timeout, so that you don't loop forever. If you 
eventually get a result or an error, you handle them however you need too, 
and eventually your future itself returns a result or an error. Its 
important you design the future task to timeout eventually. So that you 
don't leak futures that get stuck in infinite loops. So you must be able to 
deterministically know that the future will finish.

12) Now you implement get-s1-result. Whenever it is called, you get the 
future from the global atom map of futures, and you call future-done? on 
it. If false, you return that the result is not available yet. If it is 
done, you deref the future, swap! dessoc the mapEntry for it from your 
global atom, and return the result or error.

The only danger of this approach, is that the Future queue is unbounded. So 
what happens is that clients can call S1 and get-s1-result with at most 30 
concurrent request. That's because I assumed your APIs are blocking and 
bounded on a shared fixed thread pool of size 30.

Now say it takes you 1 second to process on average an S1 request, so your 
future will finish on average in 1 second, and you time them out at 5 
seconds. Now say we go for worst case scenario. This means say S2 is down, 
so all requests take the max of 5 seconds to be handled. Now say your 
clients are also maxing out your concurrency for S1, so you get around 30 
concurrent request constantly. Say S1 takes 100ms to return the promise. 
What you get is this:

* Every second, you are creating 300 Future, because every 100ms, you 
process 30 new S1 requests.

So say we are at the beginning, you have 0 Future, one second later, you 
have 300, 5 second later, you have 1500, but your first 300 timeout, so you 
end up with 1200. At the 6th second, you have 1200 again, since 300 more 
were queued, but 300 more timed out, and from this point on, every second 
you have 1200 open Futures, with a max of 1500.

Thus you need to make sure you can handle 1500 open threads on your host.

Indirectly, this stabilizes because you made sure your Future tasks time 
out at 5 second, and because your S1 API is itself bounded to 30 concurrent 
request max.

If, you'd prefer to not rely on the bound of the S1 requests, and you have 
a hard time knowing the timing of your S1, you can keep track of the count 
of queued Future, and on a request to S1 where the count is above your 
bound, you return an error, instead of a promise, asking the client to wait 
a bit, and retry the call in a bit, where you have more resourced available.

I hope this helps.

On Tuesday, 8 May 2018 13:45:00 UTC-7, Brjánn Ljótsson wrote:
>
> Hi!
>
> I'm writing a server-side (S1) function that initiates an action on 
> another server (S2) 

Re: Asynchronous http poll

2018-05-13 Thread Brjánn Ljótsson
Hi Oliy,

I really appreciate your input since I'm totally new to writing
asynchronous tasks. If I understand you correctly, promises are a better
way to go if only one value is collected from S2 by S1. However, S1 will
keep polling S2 (once every 1.5 secs, by S2's API specification) for
updates on the status of the task that S2 is running. The task can be in
several different states and S1 needs to know the current state of the task
and pass it on to the client. So, actually, multiple and different values
will be sent down the channel. The polling will quit once the task has been
completed or timed out (ca 5 minutes). That's why I went for core.async as
it seems to be suitable for launching a separate "thread" that takes care
of the polling.

Does this make any sense as a rationale for using core.async?

Thanks!

Brjánn

On 13 May 2018 at 20:58, Oliver Hine  wrote:

> Hi,
>
> Not a direct answer, but something that may help you simplify your problem:
>
> I have a general rule to avoid core.async when only one value would ever
> be sent down the channel. For these use cases promises are an order of
> magnitude simpler, giving you control of the thread of operation, simple
> testing for completion (future-done?), simple timeouts, and no cleanup
> required afterwards.
>
> From what I understand, a promise would fit your requirements and I think
> would be much easier to reason about.
>
> Hope this helps,
> Oliy
>
>
> On Tuesday, 8 May 2018 21:45:00 UTC+1, Brjánn Ljótsson wrote:
>>
>> Hi!
>>
>> I'm writing a server-side (S1) function that initiates an action on
>> another server (S2) and regularly checks if the action has finished
>> (failed/completed). It should also be possible for a client to ask S1 for
>> the status of the action performed by S2.
>>
>> My idea is to create an uid on S1 that represents the action and the uid
>> is returned to the client. S1 then asynchronously polls S2 for action
>> status and updates an atom with the uid as key and status as value. The
>> client can then request the status of the uid from S1. Below is a link to
>> my proof of concept code (without any code for the client requests or
>> timeout guards) - and it is my first try at writing code using core.async.
>>
>> https://gist.github.com/brjann/d80f1709b3c17ef10a4fc89ae693927f
>>
>> The code can be tested with for example (start-poll) or (repeatedly 10
>> start-poll) to test the behavior when multiple requests are made.
>>
>> The code seems to work, but is it a correct use of core.async? One thing
>> I'm wondering is if the init-request and poll functions should use threads
>> instead of go-blocks, since the http requests may take a few hundred
>> milliseconds and many different requests (with different uids) could be
>> made simultaneously. I've read that "long-running" tasks should not be put
>> in go blocks. I haven't figured out how to use threads though.
>>
>> I would be thankful for any input!
>>
>> Best wishes,
>> Brjánn Ljótsson
>>
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Asynchronous http poll

2018-05-13 Thread Oliver Hine
Hi,

Not a direct answer, but something that may help you simplify your problem:

I have a general rule to avoid core.async when only one value would ever be 
sent down the channel. For these use cases promises are an order of 
magnitude simpler, giving you control of the thread of operation, simple 
testing for completion (future-done?), simple timeouts, and no cleanup 
required afterwards.

>From what I understand, a promise would fit your requirements and I think 
would be much easier to reason about.

Hope this helps,
Oliy

On Tuesday, 8 May 2018 21:45:00 UTC+1, Brjánn Ljótsson wrote:
>
> Hi!
>
> I'm writing a server-side (S1) function that initiates an action on 
> another server (S2) and regularly checks if the action has finished 
> (failed/completed). It should also be possible for a client to ask S1 for 
> the status of the action performed by S2.
>
> My idea is to create an uid on S1 that represents the action and the uid 
> is returned to the client. S1 then asynchronously polls S2 for action 
> status and updates an atom with the uid as key and status as value. The 
> client can then request the status of the uid from S1. Below is a link to 
> my proof of concept code (without any code for the client requests or 
> timeout guards) - and it is my first try at writing code using core.async.
>
> https://gist.github.com/brjann/d80f1709b3c17ef10a4fc89ae693927f
>
> The code can be tested with for example (start-poll) or (repeatedly 10 
> start-poll) to test the behavior when multiple requests are made.
>
> The code seems to work, but is it a correct use of core.async? One thing 
> I'm wondering is if the init-request and poll functions should use threads 
> instead of go-blocks, since the http requests may take a few hundred 
> milliseconds and many different requests (with different uids) could be 
> made simultaneously. I've read that "long-running" tasks should not be put 
> in go blocks. I haven't figured out how to use threads though.
>
> I would be thankful for any input!
>
> Best wishes,
> Brjánn Ljótsson
>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Asynchronous http poll

2018-05-08 Thread Brjánn Ljótsson
Hi!

I'm writing a server-side (S1) function that initiates an action on another
server (S2) and regularly checks if the action has finished
(failed/completed). It should also be possible for a client to ask S1 for
the status of the action performed by S2.

My idea is to create an uid on S1 that represents the action and the uid is
returned to the client. S1 then asynchronously polls S2 for action status
and updates an atom with the uid as key and status as value. The client can
then request the status of the uid from S1. Below is a link to my proof of
concept code (without any code for the client requests or timeout guards) -
and it is my first try at writing code using core.async.

https://gist.github.com/brjann/d80f1709b3c17ef10a4fc89ae693927f

The code can be tested with for example (start-poll) or (repeatedly 10
start-poll) to test the behavior when multiple requests are made.

The code seems to work, but is it a correct use of core.async? One thing
I'm wondering is if the init-request and poll functions should use threads
instead of go-blocks, since the http requests may take a few hundred
milliseconds and many different requests (with different uids) could be
made simultaneously. I've read that "long-running" tasks should not be put
in go blocks. I haven't figured out how to use threads though.

I would be thankful for any input!

Best wishes,
Brjánn Ljótsson

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.