Hmmm ... gossiptorrent?

On Wed, Oct 21, 2009 at 7:23 AM, Marcus Herou
<[email protected]> wrote:
>
> Hi.
>
> We host a search app which is based on feeds of blogs/twitter/forums/
> news etc. We are as you are mentioning polling everything like crazy
> and it seems like a total waste of everyones resources.
>
> So this means that subscribing to something which would potentially
> deliver the material to us would be great not just for us but as well
> all sites we are crawling.
>
> However who would like to open up a firehose for free for everyone to
> consume ? It will for sure consume a lot of bandwidth and a few
> subscribers will consume most of the bandwidth with this model.
> I thought of something that might solve this issue. Consider the
> following:
>
> 1)
> * Charge for the bandwidth (wordpress.com does this with flat fee)
>
> 2)
> * Everyone that have firehose consuming needs should as well start a
> hub to show good faith and morale.
> * Add support in firehose enabled hubs to share state (with a
> master ?)
> * A firehose enabled hub can subscribe to a master hub which makes
> sure that the subscriber as well fulfils some form of contract (i.e.
> actually updating/delivering feeds)
> * Each firehose enabled hub must be public and everyone can subscribe
> to the feeds like as of current.
> * To share load equally (morale part) then subscribers should
> subscribe to a loadbalanced dns name or some form of delegate
>  lb.pshb.com = master hub
>  Example 1: lb.pshb.com resolves to pshb.tailsweep.com
> pshb.google.com, effectively DNS-roundrobin
>  Example 2: lb.pshb.com delegates to any active master connected hub
> in some way.
>
> This might be too complex to implement and bottlenecks occur at the
> master but systems like Hadoop have bottlenecks in terms of the
> NameNode (master) and it seems to perform just perfect so it can be
> done. However each firehose hub probably need to persist each feed for
> a certain amount of time before purging it.
>
> Anyway this was just a thought. We at Tailsweep probably could help in
> making this happen if there exists some interest.
>
> Cheers
>
> //Marcus
>
>
>
>
>
> On Oct 20, 8:41 pm, Bob Wyman <[email protected]> wrote:
>> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik <[email protected]> wrote:
>> > Specifically, if we treat 'firehose' as any bundle of
>> > feeds (all, or some), then a hub could define
>> > multiple firehose streams.
>>
>> There should be no question that there is tremendous utility in being able
>> to compose all sorts of "bundles" of topics into distinct feeds. It is
>> probably also the case that we can identify some number of such bundles that
>> would be useful to a large number of subscribers. On the other hand, many
>> bundles will be very specific and only useful to one or a small number of
>> subscribers. In fact, I think what we'll see is that once we have the core
>> PSHB defined, we'll then see innovation in the definition of "down stream"
>> services whose function is precisely to build and deliver such bundles. Some
>> of these services will aggregate groups of topics while others will focus
>> instead on creating content-based streams -- they will bundle together
>> individual entries based on the content of those entries rather than simply
>> combining all entries from some set of topics.
>>
>> I think we should be careful not to force too much of the burden of bundling
>> or aggregating into the core PSHB hub specification. If we want to address
>> the challenges of building bundles or aggregations, I think it best to do so
>> in secondary or companion specifications. This will keep the core cleaner
>> and easy to understand while also allowing the core to be deployed without
>> being delayed by discussions over non-core issues.
>>
>> Having argued against making the core more complicated by extending it to
>> include creating aggregate topics, I still suggest that it would be useful
>> to have the core system define a common means to obtain a pure "firehose"
>> feed of all topics. The current hub spec works for people who only want
>> "none or some" of the topics served by the hub. I suggest that we expand
>> this to have hubs know how to provide "none, some or all" of the topics.
>> The reason for adding support of "all topics" is that we know, without much
>> question, that such an "all topics" feed will be required by many of the
>> downstream services that we will one day be relying on to create more finely
>> defined aggregations. Given that this specific feed will be commonly
>> required, it would be best if we had a common mechanism for a downstream
>> service/subscriber to request that feed and that we set some expectations
>> for how that feed will be formatted and delivered (i.e. Atom entries,
>> persistent connections, chunked content model, ...). It would be very
>> cumbersome for a downstream filtering/aggregating service to need to puzzle
>> through service specific mechanisms for discovering how to obtain a firehose
>> feed of "all topics" from many different hubs.
>>
>> bob wyman
>>
>> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik <[email protected]> wrote:
>>
>> > Right, so how does the smart hub aggregate the feeds? Does it then
>> > have to crawl to find the list? That wouldn't be very useful. Having
>> > said that...
>>
>> > +1 For 'smart, aggregating hub generating a synthetic feed'
>> > +1 For XRD discovery of the firehose endpoint.
>>
>> > Thinking a bit more about the firehose, what about making it more
>> > flexible. Specifically, if we treat 'firehose' as any bundle of feeds
>> > (all, or some), then a hub could define multiple firehose streams. For
>> > example, at PostRank we classify feeds by topic, so if someone wanted
>> > to subscribe to "Technology", we could expose that as a firehose so
>> > the user doesn't have to subscribe to every feed in that topic. In
>> > essence, a firehose stream is then any bundle of feeds.
>>
>> > This may be overloading the hub spec but the overall mechanics would
>> > be:
>> >  - A (super)user can declare a firehose endpoint
>> >  - A (super)user is then able to add or remove subscriptions from the
>> > firehose to create arbitrary aggregation streams
>> >  - A subscriber uses XRD to discover the available aggregation streams
>> >  - Firehose with 'all' feeds is a special case of the above, where all
>> > feeds are present
>>
>> > This definitely adds more complexity into the hub... The alternative
>> > is of course for the publisher to create a syndicated feed and publish
>> > that directly as a standalone feed. Still trying to weight the up/
>> > downsides in my head, but want to put it out there as an idea.
>>
>> > --------
>> > Ilya Grigorik
>> > postrank.com
>

Reply via email to