Hi Daniel,In Trident it is possible to do batch aggregations. If the spout 
emits X pages for a batch the aggregation can happen on that batch. 
In the example that you have, the spout will keep emitting all the X pages from 
a website as tuples for a single batch. Once you have no more pages to emit, 
the spout will signal the completion of batch. 
For that batch then you can do aggregations using a State and persist the 
values using any storage system. After that the report can be generated.
-Nikhil 


     On Wednesday, June 3, 2015 6:04 AM, Daniel Sachse <[email protected]> 
wrote:
   

 Hey folks,
I am currently evaluating Trident as a replacement for our website analysis 
tool.We currently have several components that do: crawling, analyzing, 
aggregation and reporting. They talk to each other via message queues.
I think that most of our current infrastructure code can be replaced by Storms 
Trident, but at one point I am unsure if this is possible:When we crawl a 
webpage we don´t know how many pages are to be crawled in advance. Once our 
Crawler does not detect any new pages it fires an aggregation event and we for 
example check if all subpages have Google Analytics installed. We include 
several more metrics and send a report.A simple flowchart: 1 Crawler produces X 
pages, Analyzer consumes 1 page and produces 1 result, Aggregator consumes X 
results and produces 1 report, Reporting consumes 1 report and produces 1 
enriched report in Y formats.
The critical thing here is the migration of our aggregation system because as 
far as I understood it is only possible in real-time and not batch-wise. What I 
would like to know is if there is a way to say: „Do the aggregation once there 
has not been any new data for 5 minutes or so“.
Is this somehow achievable? Or do you see any other methods I could use? Or is 
this a wrong use-case for Trident?
Best regards,
Daniel


  

Reply via email to