Thanks for your answers. Understanding better now, still grappling
with some questions.

How do you guys see the advantages/disadvantages of POSTing feed data
in these scenarios:

1. Hub does not serve delta feed. In my mind, this is can be
interesting for 3 reasons: a) building simple hubs that don't inspect
feeds at all b) building hubs that are completely agnostic to their
feed formats, c) hubs convert feed to standard format, subscribers
pull the first feed data from hub, not from original publisher (heck,
how do the superfeedr guys do that?)
2. Large data sets (i. e. DC's 2009 crime feed has 1.2MB) 
http://data.octo.dc.gov/
3. Many and often changing subscribers - wouldn't this lead to
unnecessarily sent large POST requests to subscribers that actually
don't exist anymore?

Alex

On Oct 26, 4:49 pm, Jeff Lindsay <[email protected]> wrote:
> Luckily, probably 80% of subscribers will be doing something so simple that
> it's not even an issue. I do agree Hub's should have a timeout to encourage
> good practice. I don't think keeping the connection open is a big deal if
> you're doing it right. Hell, some people let you keep a connection open
> indefinitely, even "at scale" (see Twitter Stream API). You'll see in my
> upcoming hub implementation...
>
> On Mon, Oct 26, 2009 at 1:41 PM, Pádraic Brady <[email protected]>wrote:
>
>
>
>
>
> > It's sort of an expectations game with the question being what does the Hub
> > expect. Ideally, it's expecting to POST a delta, get a 2xx response, and
> > move on to the next Subscriber. If, however, the Subscriber acts
> > synchronously then the Hub is carrying the cost of maintaining a connection
> > while the Subscriber does all the update processing work before sending a
> > 2xx response.
>
> > Should be Hub be stuck waiting for a response because the Subscriber is
> > doing work of absolutely no impact to the expected 2xx response? Personally,
> > I don't think so. That clashes with a web developers instinct to treat all
> > work within a single request as being essential to the response which is why
> > most will (I agree absolutely) not use a separate queue. Doesn't make it
> > correct or efficient though. Subscribers should give the Hub the expected
> > response once it's request needs are met - i.e. the Subscriber received the
> > update and verified it as being valid. Anything outside that is not
> > essential to the Hub response.
>
> > I think it's important because synchronous processing will land Hubs with
> > the impact of a Subscriber's ill advised practices - clumped requests taking
> > forever since the server is bogged down in swap, inefficient database ops,
> > slow processing, etc. If I were running a Hub, I'd paint a 10 second max
> > timeout on my connections and make it abundantly clear to Subscribers that
> > not meeting that timeout is their problem to solve.
>
> > Maybe I'm being harsh though ;). I just don't like building it into
> > practices that poor implementations can get away with bogging down other
> > parties for no good reason. It's practically begging for people to do the
> > wrong thing because it's actively tolerated. As a wise man once said,
> > programming really is the one discipline where we seem unfathomably obsessed
> > with making life easier for the less skilled of its members.
>
> > Paddy
>
> > Pádraic Brady
>
> >http://blog.astrumfutura.com
> >http://www.survivethedeepend.com
> > OpenID Europe Foundation Irish Representative<http://www.openideurope.eu/>
>
> > ------------------------------
> > *From:* Jeff Lindsay <[email protected]>
> > *To:* [email protected]
> > *Sent:* Mon, October 26, 2009 6:14:40 PM
>
> > *Subject:* [pubsubhubbub] Re: Are fat pings efficient?
>
> > To your second point, Subscribers should never synchronously process
> >> updates. They should be dumped immediately to a job queue for asynchronous
> >> processing. This will help spread the processing load more evenly over time
> >> instead of being clumped together which I gather is what you're against. So
> >> it's receive update, verify it is an update (input validation), dump update
> >> to queue, and respond with a 200 code.
>
> > Actually I don't see anything wrong with handling the event synchronously.
> > While it's courteous to the hub, the hubs will HAVE to be able to handle
> > this because that's just how most people will do it. From the subscriber
> > perspective, a job queue is unnecessary because their web server should
> > already be handling the request asynchronously. Apache is generally already
> > a big worker pool using incoming HTTP requests as the job queue.
>
> > --
> > Jeff Lindsay
> >http://webhooks.org-- Make the web more programmable
> >http://shdh.org-- A party for hackers and thinkers
> >http://tigdb.com-- Discover indie games
> >http://progrium.com-- More interesting things
>
> --
> Jeff Lindsayhttp://webhooks.org-- Make the web more 
> programmablehttp://shdh.org-- A party for hackers and 
> thinkershttp://tigdb.com-- Discover indie gameshttp://progrium.com-- More 
> interesting things

Reply via email to