On Sun, Oct 25, 2009 at 1:08 AM, igrigorik <[email protected]> wrote: > > This doesn't answer roster management, but an OPML export from the hub > of all the available feeds would certainly work for me. > > @Nick: Does the hub really need to have a subscriber? If as a > publisher I register my feed with the hub it could still keep it in > the roster -- the hub just doesn't have to do anything if there are no > subscribers, just discard the pings. Although, if there was a firehose > delivery mechanism, then you could implicitly call that a subscriber > to every feed. >
It could - but the reference implementation, at least, ignores publisher pings for feeds with no subscribers. I don't have hard numbers, but I suspect doing this eliminates a significant chunk of redundant work. -Nick Johnson > > ig > > On Oct 22, 9:37 am, Alexis Richardson <[email protected]> > wrote: > > I don't think that's crazy at all. I suppose the question would then > > become how to make this 'standard' for roster discovery and > > management. > > > > On Thu, Oct 22, 2009 at 2:22 PM, igrigorik <[email protected]> wrote: > > > > > Ok, so let me try rephrasing the problem. The major problem is not the > > > Hub, or the spec, but the need to crawl thousands of sites to find the > > > feeds in the first place. Of course, the hub already knows about them, > > > hence this discussion. > > > > > Would it be crazy to then expose a mechanism to enumerate all of the > > > feeds that a hub tracks, and let the client then subscribe to them? > > > > > An obvious obstacle is private feeds, but I wonder if that can be > > > handled as a special case? > > > > > ig > > > > > On Oct 21, 1:55 pm, Marcus Herou <[email protected]> wrote: > > >> Well... We can as well publish a firehose but currently our business > > >> model is not aimed at that. > > > > >> I was not talking about a service rather a technology which could take > > >> the crawling business to another level by aggregating hundreds of hubs > > >> and creating something which effectively can deliver tb/s bandwidth by > > >> having decentralized servers and data. We are still limited you know > > >> by our infrastructure even though it have gb/s to the internet. > > > > >> Since the realtime web is currently still very small all of us need to > > >> poll something even you I presume to be able to create a pub/sub arch. > > >> In that remark our companies are quite similar, you chose to aggregate > > >> and publish your data and make a business of it. We aggregate and > > >> refine the data and make business out of that. > > > > >> Dont take me wrong I really like what you do but i am not looking for > > >> a data supplier at this time ( might change though ). But if I would > > >> look in the data supplier direction you are currently in my/our top > > >> ten list :) > > > > >> Skickat från min iPhone > > > > >> On Oct 21, 5:03 pm, Julien Genestoux <[email protected]> > > >> wrote: > > > > >> > Hum...http://superfeedr.com? > > > > >> > "Putting ressources in common" is definetely one of the key reasons > why we > > >> > built superfeedr. More about that there : > http://blog.superfeedr.com/gospel/something-stupid/ > > > > >> > And yes, we have a firehose available. > > > > >> > Julien > > > > >> > -- > > >> > Julien Genestoux, > > > > >> >http://twitter.com/julien51http://superfeedr.com > > > > >> > +1 (415) 254 7340 > > >> > +33 (0)9 70 44 76 29 > > > > >> > On Wed, Oct 21, 2009 at 5:26 AM, Marcus Herou < > [email protected]>wrote: > > > > >> > > Feedtree looks cool.... but updated 2006 ? > > > > >> > > On Wed, Oct 21, 2009 at 2:20 PM, Nick Johnson (Google) < > > >> > > [email protected]> wrote: > > > > >> > >> On Wed, Oct 21, 2009 at 1:14 PM, Alexis Richardson < > > >> > >> [email protected]> wrote: > > > > >> > >>> Hmmm ... gossiptorrent? > > > > >> > >> Feedtree. > > > > >> > >>> On Wed, Oct 21, 2009 at 7:23 AM, Marcus Herou > > >> > >>> <[email protected]> wrote: > > > > >> > >>> > Hi. > > > > >> > >>> > We host a search app which is based on feeds of > blogs/twitter/forums/ > > >> > >>> > news etc. We are as you are mentioning polling everything like > crazy > > >> > >>> > and it seems like a total waste of everyones resources. > > > > >> > >>> > So this means that subscribing to something which would > potentially > > >> > >>> > deliver the material to us would be great not just for us but > as well > > >> > >>> > all sites we are crawling. > > > > >> > >>> > However who would like to open up a firehose for free for > everyone to > > >> > >>> > consume ? It will for sure consume a lot of bandwidth and a > few > > >> > >>> > subscribers will consume most of the bandwidth with this > model. > > >> > >>> > I thought of something that might solve this issue. Consider > the > > >> > >>> > following: > > > > >> > >>> > 1) > > >> > >>> > * Charge for the bandwidth (wordpress.com does this with flat > fee) > > > > >> > >>> > 2) > > >> > >>> > * Everyone that have firehose consuming needs should as well > start a > > >> > >>> > hub to show good faith and morale. > > >> > >>> > * Add support in firehose enabled hubs to share state (with a > > >> > >>> > master ?) > > >> > >>> > * A firehose enabled hub can subscribe to a master hub which > makes > > >> > >>> > sure that the subscriber as well fulfils some form of contract > (i.e. > > >> > >>> > actually updating/delivering feeds) > > >> > >>> > * Each firehose enabled hub must be public and everyone can > subscribe > > >> > >>> > to the feeds like as of current. > > >> > >>> > * To share load equally (morale part) then subscribers should > > >> > >>> > subscribe to a loadbalanced dns name or some form of delegate > > >> > >>> > lb.pshb.com = master hub > > >> > >>> > Example 1: lb.pshb.com resolves to pshb.tailsweep.com > > >> > >>> > pshb.google.com, effectively DNS-roundrobin > > >> > >>> > Example 2: lb.pshb.com delegates to any active master > connected hub > > >> > >>> > in some way. > > > > >> > >>> > This might be too complex to implement and bottlenecks occur > at the > > >> > >>> > master but systems like Hadoop have bottlenecks in terms of > the > > >> > >>> > NameNode (master) and it seems to perform just perfect so it > can be > > >> > >>> > done. However each firehose hub probably need to persist each > feed for > > >> > >>> > a certain amount of time before purging it. > > > > >> > >>> > Anyway this was just a thought. We at Tailsweep probably could > help in > > >> > >>> > making this happen if there exists some interest. > > > > >> > >>> > Cheers > > > > >> > >>> > //Marcus > > > > >> > >>> > On Oct 20, 8:41 pm, Bob Wyman <[email protected]> wrote: > > >> > >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik < > [email protected]> wrote: > > >> > >>> >> > Specifically, if we treat 'firehose' as any bundle of > > >> > >>> >> > feeds (all, or some), then a hub could define > > >> > >>> >> > multiple firehose streams. > > > > >> > >>> >> There should be no question that there is tremendous utility > in being > > >> > >>> able > > >> > >>> >> to compose all sorts of "bundles" of topics into distinct > feeds. It is > > >> > >>> >> probably also the case that we can identify some number of > such > > >> > >>> bundles that > > >> > >>> >> would be useful to a large number of subscribers. On the > other hand, > > >> > >>> many > > >> > >>> >> bundles will be very specific and only useful to one or a > small number > > >> > >>> of > > >> > >>> >> subscribers. In fact, I think what we'll see is that once we > have the > > >> > >>> core > > >> > >>> >> PSHB defined, we'll then see innovation in the definition of > "down > > >> > >>> stream" > > >> > >>> >> services whose function is precisely to build and deliver > such > > >> > >>> bundles. Some > > >> > >>> >> of these services will aggregate groups of topics while > others will > > >> > >>> focus > > >> > >>> >> instead on creating content-based streams -- they will bundle > together > > >> > >>> >> individual entries based on the content of those entries > rather than > > >> > >>> simply > > >> > >>> >> combining all entries from some set of topics. > > > > >> > >>> >> I think we should be careful not to force too much of the > burden of > > >> > >>> bundling > > >> > >>> >> or aggregating into the core PSHB hub specification. If we > want to > > >> > >>> address > > >> > >>> >> the challenges of building bundles or aggregations, I think > it best to > > >> > >>> do so > > >> > >>> >> in secondary or companion specifications. This will keep the > core > > >> > >>> cleaner > > >> > >>> >> and easy to understand while also allowing the core to be > deployed > > >> > >>> without > > >> > >>> >> being delayed by discussions over non-core issues. > > > > >> > >>> >> Having argued against making the core more complicated by > extending it > > >> > >>> to > > >> > >>> >> include creating aggregate topics, I still suggest that it > would be > > >> > >>> useful > > >> > >>> >> to have the core system define a common means to obtain a > pure > > >> > >>> "firehose" > > >> > >>> >> feed of all topics. The current hub spec works for people who > only > > >> > >>> want > > >> > >>> >> "none or some" of the topics served by the hub. I suggest > that we > > >> > >>> expand > > >> > >>> >> this to have hubs know how to provide "none, some or all" of > the > > >> > >>> topics. > > >> > >>> >> The reason for adding support of "all topics" is that we > know, without > > >> > >>> much > > >> > >>> >> question, that such an "all topics" feed will be required by > many of > > >> > >>> the > > >> > >>> >> downstream services that we will one day be relying on to > create more > > >> > >>> finely > > >> > >>> >> defined aggregations. Given that this specific feed will be > commonly > > >> > >>> >> required, it would be best if we had a common mechanism for a > > >> > >>> downstream > > >> > >>> >> service/subscriber to request that feed and that we set some > > >> > >>> expectations > > >> > >>> >> for how that feed will be formatted and delivered (i.e. Atom > entries, > > >> > >>> >> persistent connections, chunked content model, ...). It would > be very > > >> > >>> >> cumbersome for a downstream filtering/aggregating service to > need to > > >> > >>> puzzle > > >> > >>> >> through service specific mechanisms for discovering how to > obtain a > > >> > >>> firehose > > >> > >>> >> feed of "all topics" from many different hubs. > > > > >> > >>> >> bob wyman > > > > >> > >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik < > [email protected]> wrote: > > > > >> > >>> >> > Right, so how does the smart hub aggregate the feeds? Does > it then > > >> > >>> >> > have to crawl to find the list? That wouldn't be very > useful. Having > > >> > >>> >> > said that... > > > > >> > >>> >> > +1 For 'smart, aggregating hub generating a synthetic feed' > > >> > >>> >> > +1 For XRD discovery of the firehose endpoint. > > > > >> > >>> >> > Thinking a bit more about the firehose, what about making > it more > > >> > >>> >> > flexible. Specifically, if we treat 'firehose' as any > bundle of > > >> > >>> feeds > > >> > >>> >> > (all, or some), then a hub could define multiple firehose > streams. > > >> > >>> For > > >> > >>> >> > example, at PostRank we classify feeds by topic, so if > someone > > >> > >>> wanted > > >> > >>> >> > to subscribe to "Technology", we could expose that as a > firehose so > > >> > >>> >> > the user doesn't have to subscribe to every feed in that > topic. In > > >> > >>> >> > essence, a firehose stream is then any bundle of feeds. > > > > >> > >>> >> > This may be overloading the hub spec but the overall > mechanics would > > >> > >>> >> > be: > > >> > >>> >> > - A (super)user can declare a > > > > ... > > > > read more » > -- Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047
