Hello,

I've posted this as bug #548021 on sourceforge, but since I'm not very
familiar with the project conventions, I'll post it on the list as well.
Please Cc me in the replies, as I'm not subscribed to the list
currently. Thanks.

I've scheduled sitescooper in my crontab to scoop up a number of sites
into Plucker files, and my script to do so runs sitescooper as follows:

sitescooper -mplucker -outputtemplate -prctitle Site

This works fine for sites that basically update once a day or less
frequently, and which I have the time to check in Plucker once a day.
Those scoops that have new articles in them have been updated and show
up in Plucker as such (when I sort the pdb's by date).

However, I don't necessarily have the time to read everything every day,
and on the other hand, I'd like to run sitescooper more often for some
news sites so that no matter what time I sychronize my Palm, the news
scoops would be more up to date than they now are (worst case, they're
still from yesterday if the sitescooper schedule hasn't yet run).

However, if I schedule sitescooper more often, it will not include the
articles that were picked up by an earlier scoop in the Plucker pdb,
since they were already found in the cache. I could bypass this by using
the -refresh option, but then some sites would create gigantic pdb's
(for example, The Register has a week worth of links on the page I
scoop, and I'd like to get two days at most into the scoop).

I could use -refresh -maxstories n, but I don't know how many stories a
site produces beforehand, and I'd have to run sitescooper separately for
each site to set it on a site-by-site basis. I could set SizeLimit in
the site file, but that's not much better.

What I'd like to suggest is a -maxage parameter, which, perhaps when
combined with -refresh, would pull in pages from the cache into the
scoop as long as they're not older than specified by the parameter. This
would be perfect - I could run sitescooper every hour, limit the scoop
to stories less than a day or two old, and get fresh stuff onto the Palm
any time I sync.

Thanks,

 Osma

-- 
Osma Ahvenlampi <[EMAIL PROTECTED]>


_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/sitescooper-talk

Reply via email to