Hey folks,

I am currently evaluating Trident as a replacement for our website analysis 
tool.
We currently have several components that do: crawling, analyzing, aggregation 
and reporting. They talk to each other via message queues.

I think that most of our current infrastructure code can be replaced by Storms 
Trident, but at one point I am unsure if this is possible:
When we crawl a webpage we don´t know how many pages are to be crawled in 
advance. Once our Crawler does not detect any new pages it fires an aggregation 
event and we for example check if all subpages have Google Analytics installed. 
We include several more metrics and send a report.
A simple flowchart: 1 Crawler produces X pages, Analyzer consumes 1 page and 
produces 1 result, Aggregator consumes X results and produces 1 report, 
Reporting consumes 1 report and produces 1 enriched report in Y formats.

The critical thing here is the migration of our aggregation system because as 
far as I understood it is only possible in real-time and not batch-wise. What I 
would like to know is if there is a way to say: „Do the aggregation once there 
has not been any new data for 5 minutes or so“.

Is this somehow achievable? Or do you see any other methods I could use? Or is 
this a wrong use-case for Trident?

Best regards,

Daniel


Reply via email to