Re: [go-nuts] Existing production-ready libraries for parallelizing HTTP requests?

2019-08-21 Thread roger peppe
For just the exponential backoff part, you might want to take a look at
https://godoc.org/gopkg.in/retry.v1 which provides easily pluggable retry
strategies, including exponential backoff with jitter, and you can write
your code as a normal for loop - no awkward callback to use.

  cheers,
rog.

On Mon, 19 Aug 2019, 17:20 tom via golang-nuts, <
golang-nuts@googlegroups.com> wrote:

> tl;dr Do you of any libraries for parallelizing HTTP requests with
> per-server concurrency control and handling of retries?
>
> I'm writing a service that fetches many independent small binary blobs
> (map tiles) over HTTP from a several upstream servers and package them
> together in to a single archive. I want to parallelize the fetching of the
> small binary blobs. Currently there are O(10) upstream servers and O(1000)
> small binary blobs fetched from each.
>
> Making parallel HTTP requests in Go is trivially easy and is demonstrated
> in many Go tutorials and blog posts. However, I'm looking for a "production
> ready" library that supports:
> * Per upstream server concurrency limits.
> * Overall (across all upstream servers) concurrency limits.
> * Controllable retries with exponential backoff in the case of upstream
> server errors.
> * Timeouts for upstream requests.
> * context.Context support.
>
> This would seem to be a common enough task that I would expect to find an
> existing library that does all of the above. Existing Go web scrapers, e.g.
> colly , likely have this functionality internally
> but do not expose it in their API and are instead focused on crawling web
> pages.
>
> Do you know of any such library?
>
> Many thanks,
> Tom
>
> Confidentiality Notice:
> This electronic message and any attached documents contain confidential
> and privileged information and is for the sole use of the individual or
> entity to whom it is addressed. If you are not the addressee of this email,
> or the employee or agent responsible for delivering it to the addressee,
> you are hereby notified that any dissemination, distribution or copying of
> this transmission is strictly prohibited. If you receive this message in
> error, please notify the sender immediately by return e-mail or telephone
> and destroy the attached message (and all attached documents) immediately.
> Thank you for your cooperation.
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/cfb138b2-88d4-46ee-9315-996389718bad%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAJhgacj4OGP-xOYUimUZtQawJtQ2BDcKEvQOOaU2VDuAngCh3w%40mail.gmail.com.


Re: [go-nuts] Existing production-ready libraries for parallelizing HTTP requests?

2019-08-20 Thread Sam Fourman Jr.
I am also in the same boat as tom, there is certainly a demand for this
type of library.

-- Sam Fourman

On Tue, Aug 20, 2019 at 3:33 PM 'Thomas Bushnell, BSG' via golang-nuts <
golang-nuts@googlegroups.com> wrote:

> I am of the opinion that a case like this is best handled by simply
> writing the thing you want.
>
> Concurrency limits are easily managed by using tokens to gate fetches. One
> simple technique is to make a channel of struct{} with capacity equal to
> the maximum number of concurrent connections you are allowed. You can
> either fill it with things at startup, and then read from it before a
> request and send back to the channel when done, or start with it empty and
> send to it before a request and read back after. Either is equivalent.
>
> I'm not sure the point of overall concurrency limits in general, but the
> same thing works. It's unlikely to be a problem IMO for the size of job you
> describe.
>
> Retries are best done inside each fetch; wrap http.Get with the logic you
> want. There is no one-size-fits all here. There is a public backoff library
> available, but it's a bit complex and the code could easily be simpler if
> you address exactly what you want directly.
>
> For contexts, just use the http package's (*Request).WithContext method.
> That accomplishes timeouts too.
>
> On Mon, Aug 19, 2019 at 12:20 PM tom via golang-nuts <
> golang-nuts@googlegroups.com> wrote:
>
>> tl;dr Do you of any libraries for parallelizing HTTP requests with
>> per-server concurrency control and handling of retries?
>>
>> I'm writing a service that fetches many independent small binary blobs
>> (map tiles) over HTTP from a several upstream servers and package them
>> together in to a single archive. I want to parallelize the fetching of the
>> small binary blobs. Currently there are O(10) upstream servers and O(1000)
>> small binary blobs fetched from each.
>>
>> Making parallel HTTP requests in Go is trivially easy and is demonstrated
>> in many Go tutorials and blog posts. However, I'm looking for a "production
>> ready" library that supports:
>> * Per upstream server concurrency limits.
>> * Overall (across all upstream servers) concurrency limits.
>> * Controllable retries with exponential backoff in the case of upstream
>> server errors.
>> * Timeouts for upstream requests.
>> * context.Context support.
>>
>> This would seem to be a common enough task that I would expect to find an
>> existing library that does all of the above. Existing Go web scrapers, e.g.
>> colly , likely have this functionality internally
>> but do not expose it in their API and are instead focused on crawling web
>> pages.
>>
>> Do you know of any such library?
>>
>> Many thanks,
>> Tom
>>
>> Confidentiality Notice:
>> This electronic message and any attached documents contain confidential
>> and privileged information and is for the sole use of the individual or
>> entity to whom it is addressed. If you are not the addressee of this email,
>> or the employee or agent responsible for delivering it to the addressee,
>> you are hereby notified that any dissemination, distribution or copying of
>> this transmission is strictly prohibited. If you receive this message in
>> error, please notify the sender immediately by return e-mail or telephone
>> and destroy the attached message (and all attached documents) immediately.
>> Thank you for your cooperation.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to golang-nuts+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/golang-nuts/cfb138b2-88d4-46ee-9315-996389718bad%40googlegroups.com
>> 
>> .
>>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/CA%2BYjuxvjvAQU-R7PAeocEJ%3Dp9-k0MSqRL%2BtcL4XhXaXjV%3DUepw%40mail.gmail.com
> 
> .
>


-- 

Sam Fourman Jr.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOFF%2BZ0hpyrJ8aHV0QWVC4j8BiYQKZ2RFwbkBhrBobMLESYgxg%40mail.gmail.com.


Re: [go-nuts] Existing production-ready libraries for parallelizing HTTP requests?

2019-08-20 Thread 'Thomas Bushnell, BSG' via golang-nuts
I am of the opinion that a case like this is best handled by simply writing
the thing you want.

Concurrency limits are easily managed by using tokens to gate fetches. One
simple technique is to make a channel of struct{} with capacity equal to
the maximum number of concurrent connections you are allowed. You can
either fill it with things at startup, and then read from it before a
request and send back to the channel when done, or start with it empty and
send to it before a request and read back after. Either is equivalent.

I'm not sure the point of overall concurrency limits in general, but the
same thing works. It's unlikely to be a problem IMO for the size of job you
describe.

Retries are best done inside each fetch; wrap http.Get with the logic you
want. There is no one-size-fits all here. There is a public backoff library
available, but it's a bit complex and the code could easily be simpler if
you address exactly what you want directly.

For contexts, just use the http package's (*Request).WithContext method.
That accomplishes timeouts too.

On Mon, Aug 19, 2019 at 12:20 PM tom via golang-nuts <
golang-nuts@googlegroups.com> wrote:

> tl;dr Do you of any libraries for parallelizing HTTP requests with
> per-server concurrency control and handling of retries?
>
> I'm writing a service that fetches many independent small binary blobs
> (map tiles) over HTTP from a several upstream servers and package them
> together in to a single archive. I want to parallelize the fetching of the
> small binary blobs. Currently there are O(10) upstream servers and O(1000)
> small binary blobs fetched from each.
>
> Making parallel HTTP requests in Go is trivially easy and is demonstrated
> in many Go tutorials and blog posts. However, I'm looking for a "production
> ready" library that supports:
> * Per upstream server concurrency limits.
> * Overall (across all upstream servers) concurrency limits.
> * Controllable retries with exponential backoff in the case of upstream
> server errors.
> * Timeouts for upstream requests.
> * context.Context support.
>
> This would seem to be a common enough task that I would expect to find an
> existing library that does all of the above. Existing Go web scrapers, e.g.
> colly , likely have this functionality internally
> but do not expose it in their API and are instead focused on crawling web
> pages.
>
> Do you know of any such library?
>
> Many thanks,
> Tom
>
> Confidentiality Notice:
> This electronic message and any attached documents contain confidential
> and privileged information and is for the sole use of the individual or
> entity to whom it is addressed. If you are not the addressee of this email,
> or the employee or agent responsible for delivering it to the addressee,
> you are hereby notified that any dissemination, distribution or copying of
> this transmission is strictly prohibited. If you receive this message in
> error, please notify the sender immediately by return e-mail or telephone
> and destroy the attached message (and all attached documents) immediately.
> Thank you for your cooperation.
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/cfb138b2-88d4-46ee-9315-996389718bad%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CA%2BYjuxvjvAQU-R7PAeocEJ%3Dp9-k0MSqRL%2BtcL4XhXaXjV%3DUepw%40mail.gmail.com.


[go-nuts] Existing production-ready libraries for parallelizing HTTP requests?

2019-08-19 Thread tom via golang-nuts
tl;dr Do you of any libraries for parallelizing HTTP requests with 
per-server concurrency control and handling of retries?

I'm writing a service that fetches many independent small binary blobs (map 
tiles) over HTTP from a several upstream servers and package them together 
in to a single archive. I want to parallelize the fetching of the small 
binary blobs. Currently there are O(10) upstream servers and O(1000) small 
binary blobs fetched from each.

Making parallel HTTP requests in Go is trivially easy and is demonstrated 
in many Go tutorials and blog posts. However, I'm looking for a "production 
ready" library that supports:
* Per upstream server concurrency limits.
* Overall (across all upstream servers) concurrency limits.
* Controllable retries with exponential backoff in the case of upstream 
server errors.
* Timeouts for upstream requests.
* context.Context support.

This would seem to be a common enough task that I would expect to find an 
existing library that does all of the above. Existing Go web scrapers, e.g. 
colly , likely have this functionality internally but 
do not expose it in their API and are instead focused on crawling web pages.

Do you know of any such library?

Many thanks,
Tom

-- 
Confidentiality Notice:
This electronic message and any attached documents 
contain confidential and privileged information and is for the sole use of 
the individual or entity to whom it is addressed. If you are not the 
addressee of this email, or the employee or agent responsible for 
delivering it to the addressee, you are hereby notified that any 
dissemination, distribution or copying of this transmission is strictly 
prohibited. If you receive this message in error, please notify the sender 
immediately by return e-mail or telephone and destroy the attached message 
(and all attached documents) immediately. Thank you for your cooperation.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/cfb138b2-88d4-46ee-9315-996389718bad%40googlegroups.com.