[pubsubhubbub] Re: PSHB Firehose

igrigorik Sat, 24 Oct 2009 18:08:48 -0700

This doesn't answer roster management, but an OPML export from the hub
of all the available feeds would certainly work for me.


@Nick: Does the hub really need to have a subscriber? If as a
publisher I register my feed with the hub it could still keep it in
the roster -- the hub just doesn't have to do anything if there are no
subscribers, just discard the pings. Although, if there was a firehose
delivery mechanism, then you could implicitly call that a subscriber
to every feed.

ig

On Oct 22, 9:37 am, Alexis Richardson <[email protected]>
wrote:
> I don't think that's crazy at all.  I suppose the question would then
> become how to make this 'standard' for roster discovery and
> management.
>
> On Thu, Oct 22, 2009 at 2:22 PM, igrigorik <[email protected]> wrote:
>
> > Ok, so let me try rephrasing the problem. The major problem is not the
> > Hub, or the spec, but the need to crawl thousands of sites to find the
> > feeds in the first place. Of course, the hub already knows about them,
> > hence this discussion.
>
> > Would it be crazy to then expose a mechanism to enumerate all of the
> > feeds that a hub tracks, and let the client then subscribe to them?
>
> > An obvious obstacle is private feeds, but I wonder if that can be
> > handled as a special case?
>
> > ig
>
> > On Oct 21, 1:55 pm, Marcus Herou <[email protected]> wrote:
> >> Well... We can as well publish a firehose but currently our business
> >> model is not aimed at that.
>
> >> I was not talking about a service rather a technology which could take
> >> the crawling business to another level by aggregating hundreds of hubs
> >> and creating something which effectively can deliver tb/s bandwidth by
> >> having decentralized servers and data. We are still limited you know
> >> by our infrastructure even though it have gb/s to the internet.
>
> >> Since the realtime web is currently still very small all of us need to
> >> poll something even you I presume to be able to create a pub/sub arch.
> >> In that remark our companies are quite similar, you chose to aggregate
> >> and publish your data and make a business of it. We aggregate and
> >> refine the data and make business out of that.
>
> >> Dont take me wrong I really like what you do but i am not looking for
> >> a data supplier at this time ( might change though ). But if I would
> >> look in the data supplier direction you are currently in my/our top
> >> ten list :)
>
> >> Skickat från min iPhone
>
> >> On Oct 21, 5:03 pm, Julien Genestoux <[email protected]>
> >> wrote:
>
> >> > Hum...http://superfeedr.com?
>
> >> > "Putting ressources in common" is definetely one of the key reasons why 
> >> > we
> >> > built superfeedr. More about that there 
> >> > :http://blog.superfeedr.com/gospel/something-stupid/
>
> >> > And yes, we have a firehose available.
>
> >> > Julien
>
> >> > --
> >> > Julien Genestoux,
>
> >> >http://twitter.com/julien51http://superfeedr.com
>
> >> > +1 (415) 254 7340
> >> > +33 (0)9 70 44 76 29
>
> >> > On Wed, Oct 21, 2009 at 5:26 AM, Marcus Herou 
> >> > <[email protected]>wrote:
>
> >> > > Feedtree looks cool.... but updated 2006 ?
>
> >> > > On Wed, Oct 21, 2009 at 2:20 PM, Nick Johnson (Google) <
> >> > > [email protected]> wrote:
>
> >> > >> On Wed, Oct 21, 2009 at 1:14 PM, Alexis Richardson <
> >> > >> [email protected]> wrote:
>
> >> > >>> Hmmm ... gossiptorrent?
>
> >> > >> Feedtree.
>
> >> > >>> On Wed, Oct 21, 2009 at 7:23 AM, Marcus Herou
> >> > >>> <[email protected]> wrote:
>
> >> > >>> > Hi.
>
> >> > >>> > We host a search app which is based on feeds of 
> >> > >>> > blogs/twitter/forums/
> >> > >>> > news etc. We are as you are mentioning polling everything like 
> >> > >>> > crazy
> >> > >>> > and it seems like a total waste of everyones resources.
>
> >> > >>> > So this means that subscribing to something which would potentially
> >> > >>> > deliver the material to us would be great not just for us but as 
> >> > >>> > well
> >> > >>> > all sites we are crawling.
>
> >> > >>> > However who would like to open up a firehose for free for everyone 
> >> > >>> > to
> >> > >>> > consume ? It will for sure consume a lot of bandwidth and a few
> >> > >>> > subscribers will consume most of the bandwidth with this model.
> >> > >>> > I thought of something that might solve this issue. Consider the
> >> > >>> > following:
>
> >> > >>> > 1)
> >> > >>> > * Charge for the bandwidth (wordpress.com does this with flat fee)
>
> >> > >>> > 2)
> >> > >>> > * Everyone that have firehose consuming needs should as well start 
> >> > >>> > a
> >> > >>> > hub to show good faith and morale.
> >> > >>> > * Add support in firehose enabled hubs to share state (with a
> >> > >>> > master ?)
> >> > >>> > * A firehose enabled hub can subscribe to a master hub which makes
> >> > >>> > sure that the subscriber as well fulfils some form of contract 
> >> > >>> > (i.e.
> >> > >>> > actually updating/delivering feeds)
> >> > >>> > * Each firehose enabled hub must be public and everyone can 
> >> > >>> > subscribe
> >> > >>> > to the feeds like as of current.
> >> > >>> > * To share load equally (morale part) then subscribers should
> >> > >>> > subscribe to a loadbalanced dns name or some form of delegate
> >> > >>> >  lb.pshb.com = master hub
> >> > >>> >  Example 1: lb.pshb.com resolves to pshb.tailsweep.com
> >> > >>> > pshb.google.com, effectively DNS-roundrobin
> >> > >>> >  Example 2: lb.pshb.com delegates to any active master connected 
> >> > >>> > hub
> >> > >>> > in some way.
>
> >> > >>> > This might be too complex to implement and bottlenecks occur at the
> >> > >>> > master but systems like Hadoop have bottlenecks in terms of the
> >> > >>> > NameNode (master) and it seems to perform just perfect so it can be
> >> > >>> > done. However each firehose hub probably need to persist each feed 
> >> > >>> > for
> >> > >>> > a certain amount of time before purging it.
>
> >> > >>> > Anyway this was just a thought. We at Tailsweep probably could 
> >> > >>> > help in
> >> > >>> > making this happen if there exists some interest.
>
> >> > >>> > Cheers
>
> >> > >>> > //Marcus
>
> >> > >>> > On Oct 20, 8:41 pm, Bob Wyman <[email protected]> wrote:
> >> > >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik <[email protected]> 
> >> > >>> >> wrote:
> >> > >>> >> > Specifically, if we treat 'firehose' as any bundle of
> >> > >>> >> > feeds (all, or some), then a hub could define
> >> > >>> >> > multiple firehose streams.
>
> >> > >>> >> There should be no question that there is tremendous utility in 
> >> > >>> >> being
> >> > >>> able
> >> > >>> >> to compose all sorts of "bundles" of topics into distinct feeds. 
> >> > >>> >> It is
> >> > >>> >> probably also the case that we can identify some number of such
> >> > >>> bundles that
> >> > >>> >> would be useful to a large number of subscribers. On the other 
> >> > >>> >> hand,
> >> > >>> many
> >> > >>> >> bundles will be very specific and only useful to one or a small 
> >> > >>> >> number
> >> > >>> of
> >> > >>> >> subscribers. In fact, I think what we'll see is that once we have 
> >> > >>> >> the
> >> > >>> core
> >> > >>> >> PSHB defined, we'll then see innovation in the definition of "down
> >> > >>> stream"
> >> > >>> >> services whose function is precisely to build and deliver such
> >> > >>> bundles. Some
> >> > >>> >> of these services will aggregate groups of topics while others 
> >> > >>> >> will
> >> > >>> focus
> >> > >>> >> instead on creating content-based streams -- they will bundle 
> >> > >>> >> together
> >> > >>> >> individual entries based on the content of those entries rather 
> >> > >>> >> than
> >> > >>> simply
> >> > >>> >> combining all entries from some set of topics.
>
> >> > >>> >> I think we should be careful not to force too much of the burden 
> >> > >>> >> of
> >> > >>> bundling
> >> > >>> >> or aggregating into the core PSHB hub specification. If we want to
> >> > >>> address
> >> > >>> >> the challenges of building bundles or aggregations, I think it 
> >> > >>> >> best to
> >> > >>> do so
> >> > >>> >> in secondary or companion specifications. This will keep the core
> >> > >>> cleaner
> >> > >>> >> and easy to understand while also allowing the core to be deployed
> >> > >>> without
> >> > >>> >> being delayed by discussions over non-core issues.
>
> >> > >>> >> Having argued against making the core more complicated by 
> >> > >>> >> extending it
> >> > >>> to
> >> > >>> >> include creating aggregate topics, I still suggest that it would 
> >> > >>> >> be
> >> > >>> useful
> >> > >>> >> to have the core system define a common means to obtain a pure
> >> > >>> "firehose"
> >> > >>> >> feed of all topics. The current hub spec works for people who only
> >> > >>> want
> >> > >>> >> "none or some" of the topics served by the hub. I suggest that we
> >> > >>> expand
> >> > >>> >> this to have hubs know how to provide "none, some or all" of the
> >> > >>> topics.
> >> > >>> >> The reason for adding support of "all topics" is that we know, 
> >> > >>> >> without
> >> > >>> much
> >> > >>> >> question, that such an "all topics" feed will be required by many 
> >> > >>> >> of
> >> > >>> the
> >> > >>> >> downstream services that we will one day be relying on to create 
> >> > >>> >> more
> >> > >>> finely
> >> > >>> >> defined aggregations. Given that this specific feed will be 
> >> > >>> >> commonly
> >> > >>> >> required, it would be best if we had a common mechanism for a
> >> > >>> downstream
> >> > >>> >> service/subscriber to request that feed and that we set some
> >> > >>> expectations
> >> > >>> >> for how that feed will be formatted and delivered (i.e. Atom 
> >> > >>> >> entries,
> >> > >>> >> persistent connections, chunked content model, ...). It would be 
> >> > >>> >> very
> >> > >>> >> cumbersome for a downstream filtering/aggregating service to need 
> >> > >>> >> to
> >> > >>> puzzle
> >> > >>> >> through service specific mechanisms for discovering how to obtain 
> >> > >>> >> a
> >> > >>> firehose
> >> > >>> >> feed of "all topics" from many different hubs.
>
> >> > >>> >> bob wyman
>
> >> > >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik <[email protected]> 
> >> > >>> >> wrote:
>
> >> > >>> >> > Right, so how does the smart hub aggregate the feeds? Does it 
> >> > >>> >> > then
> >> > >>> >> > have to crawl to find the list? That wouldn't be very useful. 
> >> > >>> >> > Having
> >> > >>> >> > said that...
>
> >> > >>> >> > +1 For 'smart, aggregating hub generating a synthetic feed'
> >> > >>> >> > +1 For XRD discovery of the firehose endpoint.
>
> >> > >>> >> > Thinking a bit more about the firehose, what about making it 
> >> > >>> >> > more
> >> > >>> >> > flexible. Specifically, if we treat 'firehose' as any bundle of
> >> > >>> feeds
> >> > >>> >> > (all, or some), then a hub could define multiple firehose 
> >> > >>> >> > streams.
> >> > >>> For
> >> > >>> >> > example, at PostRank we classify feeds by topic, so if someone
> >> > >>> wanted
> >> > >>> >> > to subscribe to "Technology", we could expose that as a 
> >> > >>> >> > firehose so
> >> > >>> >> > the user doesn't have to subscribe to every feed in that topic. 
> >> > >>> >> > In
> >> > >>> >> > essence, a firehose stream is then any bundle of feeds.
>
> >> > >>> >> > This may be overloading the hub spec but the overall mechanics 
> >> > >>> >> > would
> >> > >>> >> > be:
> >> > >>> >> >  - A (super)user can declare a
>
> ...
>
> read more »

[pubsubhubbub] Re: PSHB Firehose

Reply via email to