[pubsubhubbub] Re: PSHB Firehose

Marcus Herou Wed, 21 Oct 2009 10:55:31 -0700

Well... We can as well publish a firehose but currently our business
model is not aimed at that.


I was not talking about a service rather a technology which could take
the crawling business to another level by aggregating hundreds of hubs
and creating something which effectively can deliver tb/s bandwidth by
having decentralized servers and data. We are still limited you know
by our infrastructure even though it have gb/s to the internet.

Since the realtime web is currently still very small all of us need to
poll something even you I presume to be able to create a pub/sub arch.
In that remark our companies are quite similar, you chose to aggregate
and publish your data and make a business of it. We aggregate and
refine the data and make business out of that.

Dont take me wrong I really like what you do but i am not looking for
a data supplier at this time ( might change though ). But if I would
look in the data supplier direction you are currently in my/our top
ten list :)


Skickat från min iPhone

On Oct 21, 5:03 pm, Julien Genestoux <[email protected]>
wrote:
> Hum...http://superfeedr.com?
>
> "Putting ressources in common" is definetely one of the key reasons why we
> built superfeedr. More about that there 
> :http://blog.superfeedr.com/gospel/something-stupid/
>
> And yes, we have a firehose available.
>
> Julien
>
> --
> Julien Genestoux,
>
> http://twitter.com/julien51http://superfeedr.com
>
> +1 (415) 254 7340
> +33 (0)9 70 44 76 29
>
> On Wed, Oct 21, 2009 at 5:26 AM, Marcus Herou 
> <[email protected]>wrote:
>
> > Feedtree looks cool.... but updated 2006 ?
>
> > On Wed, Oct 21, 2009 at 2:20 PM, Nick Johnson (Google) <
> > [email protected]> wrote:
>
> >> On Wed, Oct 21, 2009 at 1:14 PM, Alexis Richardson <
> >> [email protected]> wrote:
>
> >>> Hmmm ... gossiptorrent?
>
> >> Feedtree.
>
> >>> On Wed, Oct 21, 2009 at 7:23 AM, Marcus Herou
> >>> <[email protected]> wrote:
>
> >>> > Hi.
>
> >>> > We host a search app which is based on feeds of blogs/twitter/forums/
> >>> > news etc. We are as you are mentioning polling everything like crazy
> >>> > and it seems like a total waste of everyones resources.
>
> >>> > So this means that subscribing to something which would potentially
> >>> > deliver the material to us would be great not just for us but as well
> >>> > all sites we are crawling.
>
> >>> > However who would like to open up a firehose for free for everyone to
> >>> > consume ? It will for sure consume a lot of bandwidth and a few
> >>> > subscribers will consume most of the bandwidth with this model.
> >>> > I thought of something that might solve this issue. Consider the
> >>> > following:
>
> >>> > 1)
> >>> > * Charge for the bandwidth (wordpress.com does this with flat fee)
>
> >>> > 2)
> >>> > * Everyone that have firehose consuming needs should as well start a
> >>> > hub to show good faith and morale.
> >>> > * Add support in firehose enabled hubs to share state (with a
> >>> > master ?)
> >>> > * A firehose enabled hub can subscribe to a master hub which makes
> >>> > sure that the subscriber as well fulfils some form of contract (i.e.
> >>> > actually updating/delivering feeds)
> >>> > * Each firehose enabled hub must be public and everyone can subscribe
> >>> > to the feeds like as of current.
> >>> > * To share load equally (morale part) then subscribers should
> >>> > subscribe to a loadbalanced dns name or some form of delegate
> >>> >  lb.pshb.com = master hub
> >>> >  Example 1: lb.pshb.com resolves to pshb.tailsweep.com
> >>> > pshb.google.com, effectively DNS-roundrobin
> >>> >  Example 2: lb.pshb.com delegates to any active master connected hub
> >>> > in some way.
>
> >>> > This might be too complex to implement and bottlenecks occur at the
> >>> > master but systems like Hadoop have bottlenecks in terms of the
> >>> > NameNode (master) and it seems to perform just perfect so it can be
> >>> > done. However each firehose hub probably need to persist each feed for
> >>> > a certain amount of time before purging it.
>
> >>> > Anyway this was just a thought. We at Tailsweep probably could help in
> >>> > making this happen if there exists some interest.
>
> >>> > Cheers
>
> >>> > //Marcus
>
> >>> > On Oct 20, 8:41 pm, Bob Wyman <[email protected]> wrote:
> >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik <[email protected]> wrote:
> >>> >> > Specifically, if we treat 'firehose' as any bundle of
> >>> >> > feeds (all, or some), then a hub could define
> >>> >> > multiple firehose streams.
>
> >>> >> There should be no question that there is tremendous utility in being
> >>> able
> >>> >> to compose all sorts of "bundles" of topics into distinct feeds. It is
> >>> >> probably also the case that we can identify some number of such
> >>> bundles that
> >>> >> would be useful to a large number of subscribers. On the other hand,
> >>> many
> >>> >> bundles will be very specific and only useful to one or a small number
> >>> of
> >>> >> subscribers. In fact, I think what we'll see is that once we have the
> >>> core
> >>> >> PSHB defined, we'll then see innovation in the definition of "down
> >>> stream"
> >>> >> services whose function is precisely to build and deliver such
> >>> bundles. Some
> >>> >> of these services will aggregate groups of topics while others will
> >>> focus
> >>> >> instead on creating content-based streams -- they will bundle together
> >>> >> individual entries based on the content of those entries rather than
> >>> simply
> >>> >> combining all entries from some set of topics.
>
> >>> >> I think we should be careful not to force too much of the burden of
> >>> bundling
> >>> >> or aggregating into the core PSHB hub specification. If we want to
> >>> address
> >>> >> the challenges of building bundles or aggregations, I think it best to
> >>> do so
> >>> >> in secondary or companion specifications. This will keep the core
> >>> cleaner
> >>> >> and easy to understand while also allowing the core to be deployed
> >>> without
> >>> >> being delayed by discussions over non-core issues.
>
> >>> >> Having argued against making the core more complicated by extending it
> >>> to
> >>> >> include creating aggregate topics, I still suggest that it would be
> >>> useful
> >>> >> to have the core system define a common means to obtain a pure
> >>> "firehose"
> >>> >> feed of all topics. The current hub spec works for people who only
> >>> want
> >>> >> "none or some" of the topics served by the hub. I suggest that we
> >>> expand
> >>> >> this to have hubs know how to provide "none, some or all" of the
> >>> topics.
> >>> >> The reason for adding support of "all topics" is that we know, without
> >>> much
> >>> >> question, that such an "all topics" feed will be required by many of
> >>> the
> >>> >> downstream services that we will one day be relying on to create more
> >>> finely
> >>> >> defined aggregations. Given that this specific feed will be commonly
> >>> >> required, it would be best if we had a common mechanism for a
> >>> downstream
> >>> >> service/subscriber to request that feed and that we set some
> >>> expectations
> >>> >> for how that feed will be formatted and delivered (i.e. Atom entries,
> >>> >> persistent connections, chunked content model, ...). It would be very
> >>> >> cumbersome for a downstream filtering/aggregating service to need to
> >>> puzzle
> >>> >> through service specific mechanisms for discovering how to obtain a
> >>> firehose
> >>> >> feed of "all topics" from many different hubs.
>
> >>> >> bob wyman
>
> >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik <[email protected]> wrote:
>
> >>> >> > Right, so how does the smart hub aggregate the feeds? Does it then
> >>> >> > have to crawl to find the list? That wouldn't be very useful. Having
> >>> >> > said that...
>
> >>> >> > +1 For 'smart, aggregating hub generating a synthetic feed'
> >>> >> > +1 For XRD discovery of the firehose endpoint.
>
> >>> >> > Thinking a bit more about the firehose, what about making it more
> >>> >> > flexible. Specifically, if we treat 'firehose' as any bundle of
> >>> feeds
> >>> >> > (all, or some), then a hub could define multiple firehose streams.
> >>> For
> >>> >> > example, at PostRank we classify feeds by topic, so if someone
> >>> wanted
> >>> >> > to subscribe to "Technology", we could expose that as a firehose so
> >>> >> > the user doesn't have to subscribe to every feed in that topic. In
> >>> >> > essence, a firehose stream is then any bundle of feeds.
>
> >>> >> > This may be overloading the hub spec but the overall mechanics would
> >>> >> > be:
> >>> >> >  - A (super)user can declare a firehose endpoint
> >>> >> >  - A (super)user is then able to add or remove subscriptions from
> >>> the
> >>> >> > firehose to create arbitrary aggregation streams
> >>> >> >  - A subscriber uses XRD to discover the available aggregation
> >>> streams
> >>> >> >  - Firehose with 'all' feeds is a special case of the above, where
> >>> all
> >>> >> > feeds are present
>
> >>> >> > This definitely adds more complexity into the hub... The alternative
> >>> >> > is of course for the publisher to create a syndicated feed and
> >>> publish
> >>> >> > that directly as a standalone feed. Still trying to weight the up/
> >>> >> > downsides in my head, but want to put it out there as an idea.
>
> >>> >> > --------
> >>> >> > Ilya Grigorik
> >>> >> > postrank.com
>
> >> --
> >> Nick Johnson, Developer Programs Engineer, App Engine
> >> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
> >> 368047
>
> > --
> > Marcus Herou CTO and co-founder Tailsweep AB
> > +46702561312
> > [email protected]
> >http://www.tailsweep.com/

[pubsubhubbub] Re: PSHB Firehose

Reply via email to