On Tue, Sep 20, 2005, Lennon Day-Reynolds wrote: > Yeah, the feeder is pretty a pretty bad hack -- rather than muck about > with anything complicated to determine whether the entries in a feed > were new or old, it was easier to just trash everything and refresh it > on each fetch. > > I'll probably spend a little time working on a better algorithm, if > for no other reason than to avoid churning through row ids so quickly. > In the meantime, it'd probably be safe to increase the loop delay to a > much longer value, like 30-60 minutes.
IIRC, in any compliant feed each entry should have a GUID. You can MD5 the entry text, compare it to what's in the database for that GUID, and do the appropriate thing. You can also MD5 the entire feed file to see if anything's changed since the last fetch. You might also want to take a look at how LiveJournal's aggregator works. I believe it uses a combination of those two (with some other tactics) but I seem to remember it being pretty cool. Ben _______________________________________________ PdxRuby-dev mailing list [email protected] http://lists.pdxruby.org/mailman/listinfo/pdxruby-dev
