Hi.

We host a search app which is based on feeds of blogs/twitter/forums/
news etc. We are as you are mentioning polling everything like crazy
and it seems like a total waste of everyones resources.

So this means that subscribing to something which would potentially
deliver the material to us would be great not just for us but as well
all sites we are crawling.

However who would like to open up a firehose for free for everyone to
consume ? It will for sure consume a lot of bandwidth and a few
subscribers will consume most of the bandwidth with this model.
I thought of something that might solve this issue. Consider the
following:

1)
* Charge for the bandwidth (wordpress.com does this with flat fee)

2)
* Everyone that have firehose consuming needs should as well start a
hub to show good faith and morale.
* Add support in firehose enabled hubs to share state (with a
master ?)
* A firehose enabled hub can subscribe to a master hub which makes
sure that the subscriber as well fulfils some form of contract (i.e.
actually updating/delivering feeds)
* Each firehose enabled hub must be public and everyone can subscribe
to the feeds like as of current.
* To share load equally (morale part) then subscribers should
subscribe to a loadbalanced dns name or some form of delegate
  lb.pshb.com = master hub
  Example 1: lb.pshb.com resolves to pshb.tailsweep.com
pshb.google.com, effectively DNS-roundrobin
  Example 2: lb.pshb.com delegates to any active master connected hub
in some way.

This might be too complex to implement and bottlenecks occur at the
master but systems like Hadoop have bottlenecks in terms of the
NameNode (master) and it seems to perform just perfect so it can be
done. However each firehose hub probably need to persist each feed for
a certain amount of time before purging it.

Anyway this was just a thought. We at Tailsweep probably could help in
making this happen if there exists some interest.

Cheers

//Marcus





On Oct 20, 8:41 pm, Bob Wyman <[email protected]> wrote:
> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik <[email protected]> wrote:
> > Specifically, if we treat 'firehose' as any bundle of
> > feeds (all, or some), then a hub could define
> > multiple firehose streams.
>
> There should be no question that there is tremendous utility in being able
> to compose all sorts of "bundles" of topics into distinct feeds. It is
> probably also the case that we can identify some number of such bundles that
> would be useful to a large number of subscribers. On the other hand, many
> bundles will be very specific and only useful to one or a small number of
> subscribers. In fact, I think what we'll see is that once we have the core
> PSHB defined, we'll then see innovation in the definition of "down stream"
> services whose function is precisely to build and deliver such bundles. Some
> of these services will aggregate groups of topics while others will focus
> instead on creating content-based streams -- they will bundle together
> individual entries based on the content of those entries rather than simply
> combining all entries from some set of topics.
>
> I think we should be careful not to force too much of the burden of bundling
> or aggregating into the core PSHB hub specification. If we want to address
> the challenges of building bundles or aggregations, I think it best to do so
> in secondary or companion specifications. This will keep the core cleaner
> and easy to understand while also allowing the core to be deployed without
> being delayed by discussions over non-core issues.
>
> Having argued against making the core more complicated by extending it to
> include creating aggregate topics, I still suggest that it would be useful
> to have the core system define a common means to obtain a pure "firehose"
> feed of all topics. The current hub spec works for people who only want
> "none or some" of the topics served by the hub. I suggest that we expand
> this to have hubs know how to provide "none, some or all" of the topics.
> The reason for adding support of "all topics" is that we know, without much
> question, that such an "all topics" feed will be required by many of the
> downstream services that we will one day be relying on to create more finely
> defined aggregations. Given that this specific feed will be commonly
> required, it would be best if we had a common mechanism for a downstream
> service/subscriber to request that feed and that we set some expectations
> for how that feed will be formatted and delivered (i.e. Atom entries,
> persistent connections, chunked content model, ...). It would be very
> cumbersome for a downstream filtering/aggregating service to need to puzzle
> through service specific mechanisms for discovering how to obtain a firehose
> feed of "all topics" from many different hubs.
>
> bob wyman
>
> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik <[email protected]> wrote:
>
> > Right, so how does the smart hub aggregate the feeds? Does it then
> > have to crawl to find the list? That wouldn't be very useful. Having
> > said that...
>
> > +1 For 'smart, aggregating hub generating a synthetic feed'
> > +1 For XRD discovery of the firehose endpoint.
>
> > Thinking a bit more about the firehose, what about making it more
> > flexible. Specifically, if we treat 'firehose' as any bundle of feeds
> > (all, or some), then a hub could define multiple firehose streams. For
> > example, at PostRank we classify feeds by topic, so if someone wanted
> > to subscribe to "Technology", we could expose that as a firehose so
> > the user doesn't have to subscribe to every feed in that topic. In
> > essence, a firehose stream is then any bundle of feeds.
>
> > This may be overloading the hub spec but the overall mechanics would
> > be:
> >  - A (super)user can declare a firehose endpoint
> >  - A (super)user is then able to add or remove subscriptions from the
> > firehose to create arbitrary aggregation streams
> >  - A subscriber uses XRD to discover the available aggregation streams
> >  - Firehose with 'all' feeds is a special case of the above, where all
> > feeds are present
>
> > This definitely adds more complexity into the hub... The alternative
> > is of course for the publisher to create a syndicated feed and publish
> > that directly as a standalone feed. Still trying to weight the up/
> > downsides in my head, but want to put it out there as an idea.
>
> > --------
> > Ilya Grigorik
> > postrank.com

Reply via email to