Hello!
I think you could model it with ATOMIC cache:
while (true) {
long time = cache.get(host);
if (time < System.currentTimeMillis() && cache.replace(host, time, time
+ hostDelay) {
// do request to host
// break
else
// sleep or do other requests in the meantime
}
Regards,
--
Ilya Kasnacheev
вт, 9 окт. 2018 г. в 16:36, matt <[email protected]>:
> Hi,
>
> I'm working on prototyping a web crawler using Ignite as the crawl-db. I'd
> like to ensure the crawler obey's the appropriate Craw-Delay time as set in
> a site's robots.txt file - the way I have this setup now, is by submitting
> "candidates" to an Ignite cache. A local listener is setup to receive
> successfully persisted items, which then submits the items to a queue for a
> fetcher to pull from.
>
> Goal: Support a delay time + maximum fetch concurrency, per-host, per-item.
>
> Put another way: "for each fetch item, ensure that requests made to the
> associated host are delayed as required, and no more than n-requests are
> made during each delayed run".
>
> This could be modeled as a Map<Host,DelayQueue> or maybe even a by using
> ScheduledExecutorService where each task represents a host, and is repeated
> according to the delay time.
>
> I'd like to prevent items from being put into the java work queue if they
> are not yet ready to be fetched, and I'm slightly worried about the
> potential number of hosts (in reference to the java Map<Host,...>
> data-structure).
>
> So my question is: is there something that Ignite can provide for making
> this all work?
>
> - Matt
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>