Ha! Thanks Jeff for pointing to that! :D I think subscirber should always select their favorite endpoint and hubs should make sure they get the content from the publisher's hub :)
Ju -- Julien Genestoux, http://twitter.com/julien51 http://superfeedr.com +1 (415) 254 7340 +33 (0)9 70 44 76 29 Sent from San Francisco, CA, United States On Thu, Oct 22, 2009 at 11:29 AM, Jeff Lindsay <[email protected]> wrote: > All these ideas for additions to the Hub are really starting to suggest > multiple endpoints. I still don't buy why the publisher endpoint is the same > as the subscriber endpoint. > > On Thu, Oct 22, 2009 at 11:22 AM, Jeremy Hylton <[email protected]>wrote: > >> >> On Thu, Oct 22, 2009 at 9:22 AM, igrigorik <[email protected]> wrote: >> > >> > Ok, so let me try rephrasing the problem. The major problem is not the >> > Hub, or the spec, but the need to crawl thousands of sites to find the >> > feeds in the first place. Of course, the hub already knows about them, >> > hence this discussion. >> > >> > Would it be crazy to then expose a mechanism to enumerate all of the >> > feeds that a hub tracks, and let the client then subscribe to them? >> >> I agree it would be helpful if we had a standard way for hubs to >> support clients that want either a firehose or the ability to discover >> all the feeds available for subscription. Are there existing >> solutions to the problem that are similar in complexity (or >> simplicity) to PSHB? >> >> Jeremy >> >> > >> > An obvious obstacle is private feeds, but I wonder if that can be >> > handled as a special case? >> > >> > ig >> > >> > On Oct 21, 1:55 pm, Marcus Herou <[email protected]> wrote: >> >> Well... We can as well publish a firehose but currently our business >> >> model is not aimed at that. >> >> >> >> I was not talking about a service rather a technology which could take >> >> the crawling business to another level by aggregating hundreds of hubs >> >> and creating something which effectively can deliver tb/s bandwidth by >> >> having decentralized servers and data. We are still limited you know >> >> by our infrastructure even though it have gb/s to the internet. >> >> >> >> Since the realtime web is currently still very small all of us need to >> >> poll something even you I presume to be able to create a pub/sub arch. >> >> In that remark our companies are quite similar, you chose to aggregate >> >> and publish your data and make a business of it. We aggregate and >> >> refine the data and make business out of that. >> >> >> >> Dont take me wrong I really like what you do but i am not looking for >> >> a data supplier at this time ( might change though ). But if I would >> >> look in the data supplier direction you are currently in my/our top >> >> ten list :) >> >> >> >> Skickat från min iPhone >> >> >> >> On Oct 21, 5:03 pm, Julien Genestoux <[email protected]> >> >> wrote: >> >> >> >> > Hum...http://superfeedr.com? >> >> >> >> > "Putting ressources in common" is definetely one of the key reasons >> why we >> >> > built superfeedr. More about that there : >> http://blog.superfeedr.com/gospel/something-stupid/ >> >> >> >> > And yes, we have a firehose available. >> >> >> >> > Julien >> >> >> >> > -- >> >> > Julien Genestoux, >> >> >> >> >http://twitter.com/julien51http://superfeedr.com >> >> >> >> > +1 (415) 254 7340 >> >> > +33 (0)9 70 44 76 29 >> >> >> >> > On Wed, Oct 21, 2009 at 5:26 AM, Marcus Herou < >> [email protected]>wrote: >> >> >> >> > > Feedtree looks cool.... but updated 2006 ? >> >> >> >> > > On Wed, Oct 21, 2009 at 2:20 PM, Nick Johnson (Google) < >> >> > > [email protected]> wrote: >> >> >> >> > >> On Wed, Oct 21, 2009 at 1:14 PM, Alexis Richardson < >> >> > >> [email protected]> wrote: >> >> >> >> > >>> Hmmm ... gossiptorrent? >> >> >> >> > >> Feedtree. >> >> >> >> > >>> On Wed, Oct 21, 2009 at 7:23 AM, Marcus Herou >> >> > >>> <[email protected]> wrote: >> >> >> >> > >>> > Hi. >> >> >> >> > >>> > We host a search app which is based on feeds of >> blogs/twitter/forums/ >> >> > >>> > news etc. We are as you are mentioning polling everything like >> crazy >> >> > >>> > and it seems like a total waste of everyones resources. >> >> >> >> > >>> > So this means that subscribing to something which would >> potentially >> >> > >>> > deliver the material to us would be great not just for us but >> as well >> >> > >>> > all sites we are crawling. >> >> >> >> > >>> > However who would like to open up a firehose for free for >> everyone to >> >> > >>> > consume ? It will for sure consume a lot of bandwidth and a few >> >> > >>> > subscribers will consume most of the bandwidth with this model. >> >> > >>> > I thought of something that might solve this issue. Consider >> the >> >> > >>> > following: >> >> >> >> > >>> > 1) >> >> > >>> > * Charge for the bandwidth (wordpress.com does this with flat >> fee) >> >> >> >> > >>> > 2) >> >> > >>> > * Everyone that have firehose consuming needs should as well >> start a >> >> > >>> > hub to show good faith and morale. >> >> > >>> > * Add support in firehose enabled hubs to share state (with a >> >> > >>> > master ?) >> >> > >>> > * A firehose enabled hub can subscribe to a master hub which >> makes >> >> > >>> > sure that the subscriber as well fulfils some form of contract >> (i.e. >> >> > >>> > actually updating/delivering feeds) >> >> > >>> > * Each firehose enabled hub must be public and everyone can >> subscribe >> >> > >>> > to the feeds like as of current. >> >> > >>> > * To share load equally (morale part) then subscribers should >> >> > >>> > subscribe to a loadbalanced dns name or some form of delegate >> >> > >>> > lb.pshb.com = master hub >> >> > >>> > Example 1: lb.pshb.com resolves to pshb.tailsweep.com >> >> > >>> > pshb.google.com, effectively DNS-roundrobin >> >> > >>> > Example 2: lb.pshb.com delegates to any active master >> connected hub >> >> > >>> > in some way. >> >> >> >> > >>> > This might be too complex to implement and bottlenecks occur at >> the >> >> > >>> > master but systems like Hadoop have bottlenecks in terms of the >> >> > >>> > NameNode (master) and it seems to perform just perfect so it >> can be >> >> > >>> > done. However each firehose hub probably need to persist each >> feed for >> >> > >>> > a certain amount of time before purging it. >> >> >> >> > >>> > Anyway this was just a thought. We at Tailsweep probably could >> help in >> >> > >>> > making this happen if there exists some interest. >> >> >> >> > >>> > Cheers >> >> >> >> > >>> > //Marcus >> >> >> >> > >>> > On Oct 20, 8:41 pm, Bob Wyman <[email protected]> wrote: >> >> > >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik <[email protected]> >> wrote: >> >> > >>> >> > Specifically, if we treat 'firehose' as any bundle of >> >> > >>> >> > feeds (all, or some), then a hub could define >> >> > >>> >> > multiple firehose streams. >> >> >> >> > >>> >> There should be no question that there is tremendous utility >> in being >> >> > >>> able >> >> > >>> >> to compose all sorts of "bundles" of topics into distinct >> feeds. It is >> >> > >>> >> probably also the case that we can identify some number of >> such >> >> > >>> bundles that >> >> > >>> >> would be useful to a large number of subscribers. On the other >> hand, >> >> > >>> many >> >> > >>> >> bundles will be very specific and only useful to one or a >> small number >> >> > >>> of >> >> > >>> >> subscribers. In fact, I think what we'll see is that once we >> have the >> >> > >>> core >> >> > >>> >> PSHB defined, we'll then see innovation in the definition of >> "down >> >> > >>> stream" >> >> > >>> >> services whose function is precisely to build and deliver such >> >> > >>> bundles. Some >> >> > >>> >> of these services will aggregate groups of topics while others >> will >> >> > >>> focus >> >> > >>> >> instead on creating content-based streams -- they will bundle >> together >> >> > >>> >> individual entries based on the content of those entries >> rather than >> >> > >>> simply >> >> > >>> >> combining all entries from some set of topics. >> >> >> >> > >>> >> I think we should be careful not to force too much of the >> burden of >> >> > >>> bundling >> >> > >>> >> or aggregating into the core PSHB hub specification. If we >> want to >> >> > >>> address >> >> > >>> >> the challenges of building bundles or aggregations, I think it >> best to >> >> > >>> do so >> >> > >>> >> in secondary or companion specifications. This will keep the >> core >> >> > >>> cleaner >> >> > >>> >> and easy to understand while also allowing the core to be >> deployed >> >> > >>> without >> >> > >>> >> being delayed by discussions over non-core issues. >> >> >> >> > >>> >> Having argued against making the core more complicated by >> extending it >> >> > >>> to >> >> > >>> >> include creating aggregate topics, I still suggest that it >> would be >> >> > >>> useful >> >> > >>> >> to have the core system define a common means to obtain a pure >> >> > >>> "firehose" >> >> > >>> >> feed of all topics. The current hub spec works for people who >> only >> >> > >>> want >> >> > >>> >> "none or some" of the topics served by the hub. I suggest that >> we >> >> > >>> expand >> >> > >>> >> this to have hubs know how to provide "none, some or all" of >> the >> >> > >>> topics. >> >> > >>> >> The reason for adding support of "all topics" is that we know, >> without >> >> > >>> much >> >> > >>> >> question, that such an "all topics" feed will be required by >> many of >> >> > >>> the >> >> > >>> >> downstream services that we will one day be relying on to >> create more >> >> > >>> finely >> >> > >>> >> defined aggregations. Given that this specific feed will be >> commonly >> >> > >>> >> required, it would be best if we had a common mechanism for a >> >> > >>> downstream >> >> > >>> >> service/subscriber to request that feed and that we set some >> >> > >>> expectations >> >> > >>> >> for how that feed will be formatted and delivered (i.e. Atom >> entries, >> >> > >>> >> persistent connections, chunked content model, ...). It would >> be very >> >> > >>> >> cumbersome for a downstream filtering/aggregating service to >> need to >> >> > >>> puzzle >> >> > >>> >> through service specific mechanisms for discovering how to >> obtain a >> >> > >>> firehose >> >> > >>> >> feed of "all topics" from many different hubs. >> >> >> >> > >>> >> bob wyman >> >> >> >> > >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik <[email protected]> >> wrote: >> >> >> >> > >>> >> > Right, so how does the smart hub aggregate the feeds? Does >> it then >> >> > >>> >> > have to crawl to find the list? That wouldn't be very >> useful. Having >> >> > >>> >> > said that... >> >> >> >> > >>> >> > +1 For 'smart, aggregating hub generating a synthetic feed' >> >> > >>> >> > +1 For XRD discovery of the firehose endpoint. >> >> >> >> > >>> >> > Thinking a bit more about the firehose, what about making it >> more >> >> > >>> >> > flexible. Specifically, if we treat 'firehose' as any bundle >> of >> >> > >>> feeds >> >> > >>> >> > (all, or some), then a hub could define multiple firehose >> streams. >> >> > >>> For >> >> > >>> >> > example, at PostRank we classify feeds by topic, so if >> someone >> >> > >>> wanted >> >> > >>> >> > to subscribe to "Technology", we could expose that as a >> firehose so >> >> > >>> >> > the user doesn't have to subscribe to every feed in that >> topic. In >> >> > >>> >> > essence, a firehose stream is then any bundle of feeds. >> >> >> >> > >>> >> > This may be overloading the hub spec but the overall >> mechanics would >> >> > >>> >> > be: >> >> > >>> >> > - A (super)user can declare a firehose endpoint >> >> > >>> >> > - A (super)user is then able to add or remove subscriptions >> from >> >> > >>> the >> >> > >>> >> > firehose to create arbitrary aggregation streams >> >> > >>> >> > - A subscriber uses XRD to discover the available >> aggregation >> >> > >>> streams >> >> > >>> >> > - Firehose with 'all' feeds is a special case of the above, >> where >> >> > >>> all >> >> > >>> >> > feeds are present >> >> >> >> > >>> >> > This definitely adds more complexity into the hub... The >> alternative >> >> > >>> >> > is of course for the publisher to create a syndicated feed >> and >> >> > >>> publish >> >> > >>> >> > that directly as a standalone feed. Still trying to weight >> the up/ >> >> > >>> >> > downsides in my head, but want to put it out there as an >> idea. >> >> >> >> > >>> >> > -------- >> >> > >>> >> > Ilya Grigorik >> >> > >>> >> > postrank.com >> >> >> >> > >> -- >> >> > >> Nick Johnson, Developer Programs Engineer, App Engine >> >> > >> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration >> Number: >> >> > >> 368047 >> >> >> >> > > -- >> >> > > Marcus Herou CTO and co-founder Tailsweep AB >> >> > > +46702561312 >> >> > > [email protected] >> >> > >http://www.tailsweep.com/ >> > >> > > > > -- > Jeff Lindsay > http://webhooks.org -- Make the web more programmable > http://shdh.org -- A party for hackers and thinkers > http://tigdb.com -- Discover indie games > http://progrium.com -- More interesting things >
