Dear Matthias,

In this case, we refretch everything in monthly? Why not enough refretch only changed pages (check last modified date and not 404 error). In current situation is need large bandwith to fetching.

I can fetch topN 500,000 daily -> 500 * 30 = 15 million pages db only?

The dedup only remove from segment index or remove from segments too?

Matthias Jaekle wrotte:

I think better way Matthias idea: dedup segments. In the older then 30 days segments you can found not changed pages, thats are not exists in the new segments.

No, pages become refetched after 30 days. So they will be in the new segments. You could remove segments after 30 + x days.


Matthias



Reply via email to