Re: Keeping crawlers up-to-date

Kingsley Idehen Tue, 28 Apr 2009 07:41:02 -0700

Melvin Carvalho wrote:

On Tue, Apr 28, 2009 at 3:39 PM, Yves Raimond <[email protected]> wrote:

Hello!


I know this issue has been raised during the LOD BOF at WWW 2009, but
I don't know if any possible solutions emerged from there.

The problem we are facing is that data on BBC Programmes changes
approximately 50 000 times a day (new/updated
broadcasts/versions/programmes/segments etc.). As we'd like to keep a
set of RDF crawlers up-to-date with our information we were wondering
how best to ping these. pingthesemanticweb seems like a nice option,
but it needs the crawlers to ping it often enough to make sure they
didn't miss a change. Another solution we were thinking of would be to
stick either Talis changesets [1] or SPARQL/Update statements in a
message queue, which would then be consumed by the crawlers.


That's a lot of data, I wonder if there is a smart way of filtering it down.

Perhaps an RDF version of "twitter" would be interesting, where you
"follow" changes that you're interested in?  You could even follow by
possibly user, or by SPARQL query, and maybe accross multiple domains.

How about: http://dev.live.com/feedsync/intro.aspx

Nothing stops RDF info. resources being shuttled about using RSS/Atom :-)

Kingsley

Did anyone tried to tackle this problem already?

Cheers!
y


[1] http://n2.talis.com/wiki/Changeset



--


Regards,

Kingsley Idehen       Weblog: http://www.openlinksw.com/blog/~kidehen

President & CEOOpenLink Software Web: http://www.openlinksw.com

Re: Keeping crawlers up-to-date

Reply via email to