On 15/6/09 08:41, Giovanni Tummarello wrote:
We are currently processing from 500 to 50000 pings per day in
sindice. (this is excluding semantic sitemaps, and excluding crawler
findings)

We are now scaling using hadoop, so that each one is completely
reassoned upon (closure of all the imported or simply "mentioned"
ontologies) before being indexed. This is performed on hadoop so we
can scale to 500000 or whatever, just add servers. (this was described
a few days ago on http://blog.sindice.com )

If they community wants them for a reason i'll be happy to implement a
ping "giveout" api

I am not sure however how having such a mass of pings with no filter
or some capability on top can help. What about RSS and pings on
specific URIs or searchers? (like long standing queries?) wouldnt that
be better?

As a vocabulary maintainer, I'd be very happy to have a custom feed of information relating to use of terms from any namespace I maintain. Plus general stats of course. We talked about this briefly and I'd like to make a proposal for the stats aspects (will blog something). Re pings, the point here would be to identify novel vocabulary patterns early on in their deployment lifecycle, so that in the case that they were innovative/creative, we'd know about it; and if they were simply in error, a conversation about alternative idioms could begin before the publisher was too committed to the quirky pattern...

Handwaving a bit on the detail,

Dan

Reply via email to