[pubsubhubbub] Re: PSHB Firehose

Alexis Richardson Thu, 22 Oct 2009 15:36:12 -0700

+1


On Thu, Oct 22, 2009 at 7:38 PM, Julien Genestoux
<[email protected]> wrote:
> Ha! Thanks Jeff for pointing to that! :D
> I think subscirber should always select their favorite endpoint and hubs
> should make sure they get the content from the publisher's hub :)
> Ju
>
>
> --
> Julien Genestoux,
>
> http://twitter.com/julien51
> http://superfeedr.com
>
> +1 (415) 254 7340
> +33 (0)9 70 44 76 29
> Sent from San Francisco, CA, United States
>
> On Thu, Oct 22, 2009 at 11:29 AM, Jeff Lindsay <[email protected]> wrote:
>>
>> All these ideas for additions to the Hub are really starting to suggest
>> multiple endpoints. I still don't buy why the publisher endpoint is the same
>> as the subscriber endpoint.
>>
>> On Thu, Oct 22, 2009 at 11:22 AM, Jeremy Hylton <[email protected]>
>> wrote:
>>>
>>> On Thu, Oct 22, 2009 at 9:22 AM, igrigorik <[email protected]> wrote:
>>> >
>>> > Ok, so let me try rephrasing the problem. The major problem is not the
>>> > Hub, or the spec, but the need to crawl thousands of sites to find the
>>> > feeds in the first place. Of course, the hub already knows about them,
>>> > hence this discussion.
>>> >
>>> > Would it be crazy to then expose a mechanism to enumerate all of the
>>> > feeds that a hub tracks, and let the client then subscribe to them?
>>>
>>> I agree it would be helpful if we had a standard way for hubs to
>>> support clients that want either a firehose or the ability to discover
>>> all the feeds available for subscription.  Are there existing
>>> solutions to the problem that are similar in complexity (or
>>> simplicity) to PSHB?
>>>
>>> Jeremy
>>>
>>> >
>>> > An obvious obstacle is private feeds, but I wonder if that can be
>>> > handled as a special case?
>>> >
>>> > ig
>>> >
>>> > On Oct 21, 1:55 pm, Marcus Herou <[email protected]> wrote:
>>> >> Well... We can as well publish a firehose but currently our business
>>> >> model is not aimed at that.
>>> >>
>>> >> I was not talking about a service rather a technology which could take
>>> >> the crawling business to another level by aggregating hundreds of hubs
>>> >> and creating something which effectively can deliver tb/s bandwidth by
>>> >> having decentralized servers and data. We are still limited you know
>>> >> by our infrastructure even though it have gb/s to the internet.
>>> >>
>>> >> Since the realtime web is currently still very small all of us need to
>>> >> poll something even you I presume to be able to create a pub/sub arch.
>>> >> In that remark our companies are quite similar, you chose to aggregate
>>> >> and publish your data and make a business of it. We aggregate and
>>> >> refine the data and make business out of that.
>>> >>
>>> >> Dont take me wrong I really like what you do but i am not looking for
>>> >> a data supplier at this time ( might change though ). But if I would
>>> >> look in the data supplier direction you are currently in my/our top
>>> >> ten list :)
>>> >>
>>> >> Skickat från min iPhone
>>> >>
>>> >> On Oct 21, 5:03 pm, Julien Genestoux <[email protected]>
>>> >> wrote:
>>> >>
>>> >> > Hum...http://superfeedr.com?
>>> >>
>>> >> > "Putting ressources in common" is definetely one of the key reasons
>>> >> > why we
>>> >> > built superfeedr. More about that there
>>> >> > :http://blog.superfeedr.com/gospel/something-stupid/
>>> >>
>>> >> > And yes, we have a firehose available.
>>> >>
>>> >> > Julien
>>> >>
>>> >> > --
>>> >> > Julien Genestoux,
>>> >>
>>> >> >http://twitter.com/julien51http://superfeedr.com
>>> >>
>>> >> > +1 (415) 254 7340
>>> >> > +33 (0)9 70 44 76 29
>>> >>
>>> >> > On Wed, Oct 21, 2009 at 5:26 AM, Marcus Herou
>>> >> > <[email protected]>wrote:
>>> >>
>>> >> > > Feedtree looks cool.... but updated 2006 ?
>>> >>
>>> >> > > On Wed, Oct 21, 2009 at 2:20 PM, Nick Johnson (Google) <
>>> >> > > [email protected]> wrote:
>>> >>
>>> >> > >> On Wed, Oct 21, 2009 at 1:14 PM, Alexis Richardson <
>>> >> > >> [email protected]> wrote:
>>> >>
>>> >> > >>> Hmmm ... gossiptorrent?
>>> >>
>>> >> > >> Feedtree.
>>> >>
>>> >> > >>> On Wed, Oct 21, 2009 at 7:23 AM, Marcus Herou
>>> >> > >>> <[email protected]> wrote:
>>> >>
>>> >> > >>> > Hi.
>>> >>
>>> >> > >>> > We host a search app which is based on feeds of
>>> >> > >>> > blogs/twitter/forums/
>>> >> > >>> > news etc. We are as you are mentioning polling everything like
>>> >> > >>> > crazy
>>> >> > >>> > and it seems like a total waste of everyones resources.
>>> >>
>>> >> > >>> > So this means that subscribing to something which would
>>> >> > >>> > potentially
>>> >> > >>> > deliver the material to us would be great not just for us but
>>> >> > >>> > as well
>>> >> > >>> > all sites we are crawling.
>>> >>
>>> >> > >>> > However who would like to open up a firehose for free for
>>> >> > >>> > everyone to
>>> >> > >>> > consume ? It will for sure consume a lot of bandwidth and a
>>> >> > >>> > few
>>> >> > >>> > subscribers will consume most of the bandwidth with this
>>> >> > >>> > model.
>>> >> > >>> > I thought of something that might solve this issue. Consider
>>> >> > >>> > the
>>> >> > >>> > following:
>>> >>
>>> >> > >>> > 1)
>>> >> > >>> > * Charge for the bandwidth (wordpress.com does this with flat
>>> >> > >>> > fee)
>>> >>
>>> >> > >>> > 2)
>>> >> > >>> > * Everyone that have firehose consuming needs should as well
>>> >> > >>> > start a
>>> >> > >>> > hub to show good faith and morale.
>>> >> > >>> > * Add support in firehose enabled hubs to share state (with a
>>> >> > >>> > master ?)
>>> >> > >>> > * A firehose enabled hub can subscribe to a master hub which
>>> >> > >>> > makes
>>> >> > >>> > sure that the subscriber as well fulfils some form of contract
>>> >> > >>> > (i.e.
>>> >> > >>> > actually updating/delivering feeds)
>>> >> > >>> > * Each firehose enabled hub must be public and everyone can
>>> >> > >>> > subscribe
>>> >> > >>> > to the feeds like as of current.
>>> >> > >>> > * To share load equally (morale part) then subscribers should
>>> >> > >>> > subscribe to a loadbalanced dns name or some form of delegate
>>> >> > >>> >  lb.pshb.com = master hub
>>> >> > >>> >  Example 1: lb.pshb.com resolves to pshb.tailsweep.com
>>> >> > >>> > pshb.google.com, effectively DNS-roundrobin
>>> >> > >>> >  Example 2: lb.pshb.com delegates to any active master
>>> >> > >>> > connected hub
>>> >> > >>> > in some way.
>>> >>
>>> >> > >>> > This might be too complex to implement and bottlenecks occur
>>> >> > >>> > at the
>>> >> > >>> > master but systems like Hadoop have bottlenecks in terms of
>>> >> > >>> > the
>>> >> > >>> > NameNode (master) and it seems to perform just perfect so it
>>> >> > >>> > can be
>>> >> > >>> > done. However each firehose hub probably need to persist each
>>> >> > >>> > feed for
>>> >> > >>> > a certain amount of time before purging it.
>>> >>
>>> >> > >>> > Anyway this was just a thought. We at Tailsweep probably could
>>> >> > >>> > help in
>>> >> > >>> > making this happen if there exists some interest.
>>> >>
>>> >> > >>> > Cheers
>>> >>
>>> >> > >>> > //Marcus
>>> >>
>>> >> > >>> > On Oct 20, 8:41 pm, Bob Wyman <[email protected]> wrote:
>>> >> > >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik
>>> >> > >>> >> <[email protected]> wrote:
>>> >> > >>> >> > Specifically, if we treat 'firehose' as any bundle of
>>> >> > >>> >> > feeds (all, or some), then a hub could define
>>> >> > >>> >> > multiple firehose streams.
>>> >>
>>> >> > >>> >> There should be no question that there is tremendous utility
>>> >> > >>> >> in being
>>> >> > >>> able
>>> >> > >>> >> to compose all sorts of "bundles" of topics into distinct
>>> >> > >>> >> feeds. It is
>>> >> > >>> >> probably also the case that we can identify some number of
>>> >> > >>> >> such
>>> >> > >>> bundles that
>>> >> > >>> >> would be useful to a large number of subscribers. On the
>>> >> > >>> >> other hand,
>>> >> > >>> many
>>> >> > >>> >> bundles will be very specific and only useful to one or a
>>> >> > >>> >> small number
>>> >> > >>> of
>>> >> > >>> >> subscribers. In fact, I think what we'll see is that once we
>>> >> > >>> >> have the
>>> >> > >>> core
>>> >> > >>> >> PSHB defined, we'll then see innovation in the definition of
>>> >> > >>> >> "down
>>> >> > >>> stream"
>>> >> > >>> >> services whose function is precisely to build and deliver
>>> >> > >>> >> such
>>> >> > >>> bundles. Some
>>> >> > >>> >> of these services will aggregate groups of topics while
>>> >> > >>> >> others will
>>> >> > >>> focus
>>> >> > >>> >> instead on creating content-based streams -- they will bundle
>>> >> > >>> >> together
>>> >> > >>> >> individual entries based on the content of those entries
>>> >> > >>> >> rather than
>>> >> > >>> simply
>>> >> > >>> >> combining all entries from some set of topics.
>>> >>
>>> >> > >>> >> I think we should be careful not to force too much of the
>>> >> > >>> >> burden of
>>> >> > >>> bundling
>>> >> > >>> >> or aggregating into the core PSHB hub specification. If we
>>> >> > >>> >> want to
>>> >> > >>> address
>>> >> > >>> >> the challenges of building bundles or aggregations, I think
>>> >> > >>> >> it best to
>>> >> > >>> do so
>>> >> > >>> >> in secondary or companion specifications. This will keep the
>>> >> > >>> >> core
>>> >> > >>> cleaner
>>> >> > >>> >> and easy to understand while also allowing the core to be
>>> >> > >>> >> deployed
>>> >> > >>> without
>>> >> > >>> >> being delayed by discussions over non-core issues.
>>> >>
>>> >> > >>> >> Having argued against making the core more complicated by
>>> >> > >>> >> extending it
>>> >> > >>> to
>>> >> > >>> >> include creating aggregate topics, I still suggest that it
>>> >> > >>> >> would be
>>> >> > >>> useful
>>> >> > >>> >> to have the core system define a common means to obtain a
>>> >> > >>> >> pure
>>> >> > >>> "firehose"
>>> >> > >>> >> feed of all topics. The current hub spec works for people who
>>> >> > >>> >> only
>>> >> > >>> want
>>> >> > >>> >> "none or some" of the topics served by the hub. I suggest
>>> >> > >>> >> that we
>>> >> > >>> expand
>>> >> > >>> >> this to have hubs know how to provide "none, some or all" of
>>> >> > >>> >> the
>>> >> > >>> topics.
>>> >> > >>> >> The reason for adding support of "all topics" is that we
>>> >> > >>> >> know, without
>>> >> > >>> much
>>> >> > >>> >> question, that such an "all topics" feed will be required by
>>> >> > >>> >> many of
>>> >> > >>> the
>>> >> > >>> >> downstream services that we will one day be relying on to
>>> >> > >>> >> create more
>>> >> > >>> finely
>>> >> > >>> >> defined aggregations. Given that this specific feed will be
>>> >> > >>> >> commonly
>>> >> > >>> >> required, it would be best if we had a common mechanism for a
>>> >> > >>> downstream
>>> >> > >>> >> service/subscriber to request that feed and that we set some
>>> >> > >>> expectations
>>> >> > >>> >> for how that feed will be formatted and delivered (i.e. Atom
>>> >> > >>> >> entries,
>>> >> > >>> >> persistent connections, chunked content model, ...). It would
>>> >> > >>> >> be very
>>> >> > >>> >> cumbersome for a downstream filtering/aggregating service to
>>> >> > >>> >> need to
>>> >> > >>> puzzle
>>> >> > >>> >> through service specific mechanisms for discovering how to
>>> >> > >>> >> obtain a
>>> >> > >>> firehose
>>> >> > >>> >> feed of "all topics" from many different hubs.
>>> >>
>>> >> > >>> >> bob wyman
>>> >>
>>> >> > >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik
>>> >> > >>> >> <[email protected]> wrote:
>>> >>
>>> >> > >>> >> > Right, so how does the smart hub aggregate the feeds? Does
>>> >> > >>> >> > it then
>>> >> > >>> >> > have to crawl to find the list? That wouldn't be very
>>> >> > >>> >> > useful. Having
>>> >> > >>> >> > said that...
>>> >>
>>> >> > >>> >> > +1 For 'smart, aggregating hub generating a synthetic feed'
>>> >> > >>> >> > +1 For XRD discovery of the firehose endpoint.
>>> >>
>>> >> > >>> >> > Thinking a bit more about the firehose, what about making
>>> >> > >>> >> > it more
>>> >> > >>> >> > flexible. Specifically, if we treat 'firehose' as any
>>> >> > >>> >> > bundle of
>>> >> > >>> feeds
>>> >> > >>> >> > (all, or some), then a hub could define multiple firehose
>>> >> > >>> >> > streams.
>>> >> > >>> For
>>> >> > >>> >> > example, at PostRank we classify feeds by topic, so if
>>> >> > >>> >> > someone
>>> >> > >>> wanted
>>> >> > >>> >> > to subscribe to "Technology", we could expose that as a
>>> >> > >>> >> > firehose so
>>> >> > >>> >> > the user doesn't have to subscribe to every feed in that
>>> >> > >>> >> > topic. In
>>> >> > >>> >> > essence, a firehose stream is then any bundle of feeds.
>>> >>
>>> >> > >>> >> > This may be overloading the hub spec but the overall
>>> >> > >>> >> > mechanics would
>>> >> > >>> >> > be:
>>> >> > >>> >> >  - A (super)user can declare a firehose endpoint
>>> >> > >>> >> >  - A (super)user is then able to add or remove
>>> >> > >>> >> > subscriptions from
>>> >> > >>> the
>>> >> > >>> >> > firehose to create arbitrary aggregation streams
>>> >> > >>> >> >  - A subscriber uses XRD to discover the available
>>> >> > >>> >> > aggregation
>>> >> > >>> streams
>>> >> > >>> >> >  - Firehose with 'all' feeds is a special case of the
>>> >> > >>> >> > above, where
>>> >> > >>> all
>>> >> > >>> >> > feeds are present
>>> >>
>>> >> > >>> >> > This definitely adds more complexity into the hub... The
>>> >> > >>> >> > alternative
>>> >> > >>> >> > is of course for the publisher to create a syndicated feed
>>> >> > >>> >> > and
>>> >> > >>> publish
>>> >> > >>> >> > that directly as a standalone feed. Still trying to weight
>>> >> > >>> >> > the up/
>>> >> > >>> >> > downsides in my head, but want to put it out there as an
>>> >> > >>> >> > idea.
>>> >>
>>> >> > >>> >> > --------
>>> >> > >>> >> > Ilya Grigorik
>>> >> > >>> >> > postrank.com
>>> >>
>>> >> > >> --
>>> >> > >> Nick Johnson, Developer Programs Engineer, App Engine
>>> >> > >> Google Ireland Ltd. :: Registered in Dublin, Ireland,
>>> >> > >> Registration Number:
>>> >> > >> 368047
>>> >>
>>> >> > > --
>>> >> > > Marcus Herou CTO and co-founder Tailsweep AB
>>> >> > > +46702561312
>>> >> > > [email protected]
>>> >> > >http://www.tailsweep.com/
>>> >
>>
>>
>>
>> --
>> Jeff Lindsay
>> http://webhooks.org -- Make the web more programmable
>> http://shdh.org -- A party for hackers and thinkers
>> http://tigdb.com -- Discover indie games
>> http://progrium.com -- More interesting things
>
>

[pubsubhubbub] Re: PSHB Firehose

Reply via email to