Possibly relevant:
http://www.ietf.org/rfc/rfc5005.txt

Feed paging and archiving for Atom feeds. Paging is a nice solution to the "small window" problem with syndication feeds. The concept might be translatable to RSS 1.0.

Although I have to say that I find the idea of pushing RDF updates via Atom quite appealing.

Richard


On 28 Apr 2009, at 17:01, Yves Raimond wrote:

Hello!

I think the two main options are either to publish a feed containing
pointers to changes, or using a messaging system to push out notifications.

Despite the recent discussion around benefits of, say, Jabber or other
mechanisms for pushing out notifications, I think that a more RESTful
approach using RSS or Atom feeds might be nicer. Then we can focus on the
resource design, i.e. what kinds of changes do we need to publish.

So for example for /programmes it may be sufficient to publish a set of feeds for new, e.g. brands, episodes, versions, etc. These could be RSS 1.0
and then include additional RDF data as appropriate.

My only concern about this is that you need to limit the number of
items in the feed. If you have a sudden burst of activity and the
crawler just ping the feed at regular intervals, it may miss some
updates. However, even with 1M updates in a day, with a feed capped to
100 items would just need the crawlers to ping the feed about every
hour and a half. So that's not too bad.
(Just noticed that Soren's proposal includes pagination of feeds,
which might solve that problem).

So yes, I guess it could be done, using RDF feeds e.g.
http://www.bbc.co.uk/programmes/updates/2009/04/28/brands.rdf etc.
We'd need to carefully think about the feeds we offer though.

Cheers!
y


This has the added advantage that a crawler that only wanted to collect certain information, e.g. about brands, could monitor just the resource(s) it was interested in. Similarly with careful resource design, the timing of updates could also be under the control of the crawler, e.g. new versions in last 12 hours, 24 hours, 7 days (avoiding a massive firehose of updates). This could be easily done with URIs and avoids having to build that into the
messaging system.

Interested to know what you think.

Cheers,

L.

2009/4/28 Yves Raimond <[email protected]>

Hello!

I know this issue has been raised during the LOD BOF at WWW 2009, but
I don't know if any possible solutions emerged from there.

The problem we are facing is that data on BBC Programmes changes
approximately 50 000 times a day (new/updated
broadcasts/versions/programmes/segments etc.). As we'd like to keep a set of RDF crawlers up-to-date with our information we were wondering
how best to ping these. pingthesemanticweb seems like a nice option,
but it needs the crawlers to ping it often enough to make sure they
didn't miss a change. Another solution we were thinking of would be to
stick either Talis changesets [1] or SPARQL/Update statements in a
message queue, which would then be consumed by the crawlers.

Did anyone tried to tackle this problem already?

Cheers!
y


[1] http://n2.talis.com/wiki/Changeset

Please consider the environment before printing this email.

Find out more about Talis at www.talis.com

shared innovationTM

Any views or personal opinions expressed within this email may not be those of Talis Information Ltd or its employees. The content of this email message and any files that may be attached are confidential, and for the usage of the intended recipient only. If you are not the intended recipient, then please return this message to the sender and delete it. Any use of this
e-mail by an unauthorised recipient is prohibited.

Talis Information Ltd is a member of the Talis Group of companies and is registered in England No 3638278 with its registered office at Knights
Court, Solihull Parkway, Birmingham Business Park, B37 7YB.

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________



--
Leigh Dodds
Programme Manager, Talis Platform
Talis
[email protected]
http://www.talis.com




Reply via email to