Miguel A Paraz wrote:

Hi,
I would like to add ping and one-time aggregation functionality to
Planet Roller. A blog or other source will send in a ping with a feed.
The source can opt to send a special feed containing only one item.

This is tied up with my hacks to support GeoInfo and Google Maps. I
can send to the code for this if anyone's interested.

The ping is to notify Planet of new information for "Recovery 2.0":
http://www.socialtext.net/recovery2/index.cgi

My plan:
1. Subclass (with a joined table) PlanetSubscriptionData, to support a
one-time subscription that will be in the db but not pulled from PlanetManager. This should have the time of last pull.
2. Write a ping servlet that will do the pull. Check the last pull
time and do rate limiting since this can be abused - a malicious site
can ask us to ping its victim, DoSing the victim and us.

What's a good ping protocol to adopt?
For "outbound" pings we are using the weblogUpdates.ping xml-rpc protocol.

http://www.xmlrpc.com/discuss/msgReader$2014

You can also look at the WeblogUpdatePinger class in the tree (which is the client side).

For reliability, we use a queue to process outbound pings, and I believe the larger aggregators use queues on inbound pings as well, primarily to be able to better manage the amount of ping processing that is done. (Pulling synchronously on pings ties up the request thread while the "pull" is processed; the throttling by last pull is only partly effective in limiting this if you expect to receive a lot.) This may or may not be problematic for your application. A servlet pulling directly certainly would work for an initial prototype to see how the whole thing ties together.

Thanks for any suggestions and feedback.
Do you think this is a useful application in general?

I don't quite understand the intended functionality. Extending Planet to allow it to aggregate loosely affiliated sets of blogs rather than only specifically configured communities seems like it a valuable addition. Spam becomes a concern immediately in this model though, so you may need to consider mechanisms to filter/edit/manage pulled content.


Reply via email to