On Tue, Sep 20, 2005, Lennon Day-Reynolds wrote:
> Yeah, the feeder is pretty a pretty bad hack -- rather than muck about
> with anything complicated to determine whether the entries in a feed
> were new or old, it was easier to just trash everything and refresh it
> on each fetch.
> 
> I'll probably spend a little time working on a better algorithm, if
> for no other reason than to avoid churning through row ids so quickly.
> In the meantime, it'd probably be safe to increase the loop delay to a
> much longer value, like 30-60 minutes.

IIRC, in any compliant feed each entry should have a GUID.  You can MD5
the entry text, compare it to what's in the database for that GUID, and
do the appropriate thing.

You can also MD5 the entire feed file to see if anything's changed since
the last fetch.

You might also want to take a look at how LiveJournal's aggregator
works.  I believe it uses a combination of those two (with some other
tactics) but I seem to remember it being pretty cool.

Ben
_______________________________________________
PdxRuby-dev mailing list
[email protected]
http://lists.pdxruby.org/mailman/listinfo/pdxruby-dev

Reply via email to