Miguel A Paraz wrote:
Hi,
I would like to add ping and one-time aggregation functionality to
Planet Roller. A blog or other source will send in a ping with a feed.
The source can opt to send a special feed containing only one item.
This is tied up with my hacks to support GeoInfo and Google Maps. I
can send to the code for this if anyone's interested.
The ping is to notify Planet of new information for "Recovery 2.0":
http://www.socialtext.net/recovery2/index.cgi
My plan:
1. Subclass (with a joined table) PlanetSubscriptionData, to support a
one-time subscription that will be in the db but not pulled from
PlanetManager. This should have the time of last pull.
2. Write a ping servlet that will do the pull. Check the last pull
time and do rate limiting since this can be abused - a malicious site
can ask us to ping its victim, DoSing the victim and us.
What's a good ping protocol to adopt?
For "outbound" pings we are using the weblogUpdates.ping xml-rpc protocol.
http://www.xmlrpc.com/discuss/msgReader$2014
You can also look at the WeblogUpdatePinger class in the tree (which is
the client side).
For reliability, we use a queue to process outbound pings, and I believe
the larger aggregators use queues on inbound pings as well, primarily to
be able to better manage the amount of ping processing that is done.
(Pulling synchronously on pings ties up the request thread while the
"pull" is processed; the throttling by last pull is only partly
effective in limiting this if you expect to receive a lot.) This may or
may not be problematic for your application. A servlet pulling directly
certainly would work for an initial prototype to see how the whole thing
ties together.
Thanks for any suggestions and feedback.
Do you think this is a useful application in general?
I don't quite understand the intended functionality. Extending Planet
to allow it to aggregate loosely affiliated sets of blogs rather than
only specifically configured communities seems like it a valuable
addition. Spam becomes a concern immediately in this model though, so
you may need to consider mechanisms to filter/edit/manage pulled content.