+1
On Thu, Oct 22, 2009 at 7:38 PM, Julien Genestoux <[email protected]> wrote: > Ha! Thanks Jeff for pointing to that! :D > I think subscirber should always select their favorite endpoint and hubs > should make sure they get the content from the publisher's hub :) > Ju > > > -- > Julien Genestoux, > > http://twitter.com/julien51 > http://superfeedr.com > > +1 (415) 254 7340 > +33 (0)9 70 44 76 29 > Sent from San Francisco, CA, United States > > On Thu, Oct 22, 2009 at 11:29 AM, Jeff Lindsay <[email protected]> wrote: >> >> All these ideas for additions to the Hub are really starting to suggest >> multiple endpoints. I still don't buy why the publisher endpoint is the same >> as the subscriber endpoint. >> >> On Thu, Oct 22, 2009 at 11:22 AM, Jeremy Hylton <[email protected]> >> wrote: >>> >>> On Thu, Oct 22, 2009 at 9:22 AM, igrigorik <[email protected]> wrote: >>> > >>> > Ok, so let me try rephrasing the problem. The major problem is not the >>> > Hub, or the spec, but the need to crawl thousands of sites to find the >>> > feeds in the first place. Of course, the hub already knows about them, >>> > hence this discussion. >>> > >>> > Would it be crazy to then expose a mechanism to enumerate all of the >>> > feeds that a hub tracks, and let the client then subscribe to them? >>> >>> I agree it would be helpful if we had a standard way for hubs to >>> support clients that want either a firehose or the ability to discover >>> all the feeds available for subscription. Are there existing >>> solutions to the problem that are similar in complexity (or >>> simplicity) to PSHB? >>> >>> Jeremy >>> >>> > >>> > An obvious obstacle is private feeds, but I wonder if that can be >>> > handled as a special case? >>> > >>> > ig >>> > >>> > On Oct 21, 1:55 pm, Marcus Herou <[email protected]> wrote: >>> >> Well... We can as well publish a firehose but currently our business >>> >> model is not aimed at that. >>> >> >>> >> I was not talking about a service rather a technology which could take >>> >> the crawling business to another level by aggregating hundreds of hubs >>> >> and creating something which effectively can deliver tb/s bandwidth by >>> >> having decentralized servers and data. We are still limited you know >>> >> by our infrastructure even though it have gb/s to the internet. >>> >> >>> >> Since the realtime web is currently still very small all of us need to >>> >> poll something even you I presume to be able to create a pub/sub arch. >>> >> In that remark our companies are quite similar, you chose to aggregate >>> >> and publish your data and make a business of it. We aggregate and >>> >> refine the data and make business out of that. >>> >> >>> >> Dont take me wrong I really like what you do but i am not looking for >>> >> a data supplier at this time ( might change though ). But if I would >>> >> look in the data supplier direction you are currently in my/our top >>> >> ten list :) >>> >> >>> >> Skickat från min iPhone >>> >> >>> >> On Oct 21, 5:03 pm, Julien Genestoux <[email protected]> >>> >> wrote: >>> >> >>> >> > Hum...http://superfeedr.com? >>> >> >>> >> > "Putting ressources in common" is definetely one of the key reasons >>> >> > why we >>> >> > built superfeedr. More about that there >>> >> > :http://blog.superfeedr.com/gospel/something-stupid/ >>> >> >>> >> > And yes, we have a firehose available. >>> >> >>> >> > Julien >>> >> >>> >> > -- >>> >> > Julien Genestoux, >>> >> >>> >> >http://twitter.com/julien51http://superfeedr.com >>> >> >>> >> > +1 (415) 254 7340 >>> >> > +33 (0)9 70 44 76 29 >>> >> >>> >> > On Wed, Oct 21, 2009 at 5:26 AM, Marcus Herou >>> >> > <[email protected]>wrote: >>> >> >>> >> > > Feedtree looks cool.... but updated 2006 ? >>> >> >>> >> > > On Wed, Oct 21, 2009 at 2:20 PM, Nick Johnson (Google) < >>> >> > > [email protected]> wrote: >>> >> >>> >> > >> On Wed, Oct 21, 2009 at 1:14 PM, Alexis Richardson < >>> >> > >> [email protected]> wrote: >>> >> >>> >> > >>> Hmmm ... gossiptorrent? >>> >> >>> >> > >> Feedtree. >>> >> >>> >> > >>> On Wed, Oct 21, 2009 at 7:23 AM, Marcus Herou >>> >> > >>> <[email protected]> wrote: >>> >> >>> >> > >>> > Hi. >>> >> >>> >> > >>> > We host a search app which is based on feeds of >>> >> > >>> > blogs/twitter/forums/ >>> >> > >>> > news etc. We are as you are mentioning polling everything like >>> >> > >>> > crazy >>> >> > >>> > and it seems like a total waste of everyones resources. >>> >> >>> >> > >>> > So this means that subscribing to something which would >>> >> > >>> > potentially >>> >> > >>> > deliver the material to us would be great not just for us but >>> >> > >>> > as well >>> >> > >>> > all sites we are crawling. >>> >> >>> >> > >>> > However who would like to open up a firehose for free for >>> >> > >>> > everyone to >>> >> > >>> > consume ? It will for sure consume a lot of bandwidth and a >>> >> > >>> > few >>> >> > >>> > subscribers will consume most of the bandwidth with this >>> >> > >>> > model. >>> >> > >>> > I thought of something that might solve this issue. Consider >>> >> > >>> > the >>> >> > >>> > following: >>> >> >>> >> > >>> > 1) >>> >> > >>> > * Charge for the bandwidth (wordpress.com does this with flat >>> >> > >>> > fee) >>> >> >>> >> > >>> > 2) >>> >> > >>> > * Everyone that have firehose consuming needs should as well >>> >> > >>> > start a >>> >> > >>> > hub to show good faith and morale. >>> >> > >>> > * Add support in firehose enabled hubs to share state (with a >>> >> > >>> > master ?) >>> >> > >>> > * A firehose enabled hub can subscribe to a master hub which >>> >> > >>> > makes >>> >> > >>> > sure that the subscriber as well fulfils some form of contract >>> >> > >>> > (i.e. >>> >> > >>> > actually updating/delivering feeds) >>> >> > >>> > * Each firehose enabled hub must be public and everyone can >>> >> > >>> > subscribe >>> >> > >>> > to the feeds like as of current. >>> >> > >>> > * To share load equally (morale part) then subscribers should >>> >> > >>> > subscribe to a loadbalanced dns name or some form of delegate >>> >> > >>> > lb.pshb.com = master hub >>> >> > >>> > Example 1: lb.pshb.com resolves to pshb.tailsweep.com >>> >> > >>> > pshb.google.com, effectively DNS-roundrobin >>> >> > >>> > Example 2: lb.pshb.com delegates to any active master >>> >> > >>> > connected hub >>> >> > >>> > in some way. >>> >> >>> >> > >>> > This might be too complex to implement and bottlenecks occur >>> >> > >>> > at the >>> >> > >>> > master but systems like Hadoop have bottlenecks in terms of >>> >> > >>> > the >>> >> > >>> > NameNode (master) and it seems to perform just perfect so it >>> >> > >>> > can be >>> >> > >>> > done. However each firehose hub probably need to persist each >>> >> > >>> > feed for >>> >> > >>> > a certain amount of time before purging it. >>> >> >>> >> > >>> > Anyway this was just a thought. We at Tailsweep probably could >>> >> > >>> > help in >>> >> > >>> > making this happen if there exists some interest. >>> >> >>> >> > >>> > Cheers >>> >> >>> >> > >>> > //Marcus >>> >> >>> >> > >>> > On Oct 20, 8:41 pm, Bob Wyman <[email protected]> wrote: >>> >> > >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik >>> >> > >>> >> <[email protected]> wrote: >>> >> > >>> >> > Specifically, if we treat 'firehose' as any bundle of >>> >> > >>> >> > feeds (all, or some), then a hub could define >>> >> > >>> >> > multiple firehose streams. >>> >> >>> >> > >>> >> There should be no question that there is tremendous utility >>> >> > >>> >> in being >>> >> > >>> able >>> >> > >>> >> to compose all sorts of "bundles" of topics into distinct >>> >> > >>> >> feeds. It is >>> >> > >>> >> probably also the case that we can identify some number of >>> >> > >>> >> such >>> >> > >>> bundles that >>> >> > >>> >> would be useful to a large number of subscribers. On the >>> >> > >>> >> other hand, >>> >> > >>> many >>> >> > >>> >> bundles will be very specific and only useful to one or a >>> >> > >>> >> small number >>> >> > >>> of >>> >> > >>> >> subscribers. In fact, I think what we'll see is that once we >>> >> > >>> >> have the >>> >> > >>> core >>> >> > >>> >> PSHB defined, we'll then see innovation in the definition of >>> >> > >>> >> "down >>> >> > >>> stream" >>> >> > >>> >> services whose function is precisely to build and deliver >>> >> > >>> >> such >>> >> > >>> bundles. Some >>> >> > >>> >> of these services will aggregate groups of topics while >>> >> > >>> >> others will >>> >> > >>> focus >>> >> > >>> >> instead on creating content-based streams -- they will bundle >>> >> > >>> >> together >>> >> > >>> >> individual entries based on the content of those entries >>> >> > >>> >> rather than >>> >> > >>> simply >>> >> > >>> >> combining all entries from some set of topics. >>> >> >>> >> > >>> >> I think we should be careful not to force too much of the >>> >> > >>> >> burden of >>> >> > >>> bundling >>> >> > >>> >> or aggregating into the core PSHB hub specification. If we >>> >> > >>> >> want to >>> >> > >>> address >>> >> > >>> >> the challenges of building bundles or aggregations, I think >>> >> > >>> >> it best to >>> >> > >>> do so >>> >> > >>> >> in secondary or companion specifications. This will keep the >>> >> > >>> >> core >>> >> > >>> cleaner >>> >> > >>> >> and easy to understand while also allowing the core to be >>> >> > >>> >> deployed >>> >> > >>> without >>> >> > >>> >> being delayed by discussions over non-core issues. >>> >> >>> >> > >>> >> Having argued against making the core more complicated by >>> >> > >>> >> extending it >>> >> > >>> to >>> >> > >>> >> include creating aggregate topics, I still suggest that it >>> >> > >>> >> would be >>> >> > >>> useful >>> >> > >>> >> to have the core system define a common means to obtain a >>> >> > >>> >> pure >>> >> > >>> "firehose" >>> >> > >>> >> feed of all topics. The current hub spec works for people who >>> >> > >>> >> only >>> >> > >>> want >>> >> > >>> >> "none or some" of the topics served by the hub. I suggest >>> >> > >>> >> that we >>> >> > >>> expand >>> >> > >>> >> this to have hubs know how to provide "none, some or all" of >>> >> > >>> >> the >>> >> > >>> topics. >>> >> > >>> >> The reason for adding support of "all topics" is that we >>> >> > >>> >> know, without >>> >> > >>> much >>> >> > >>> >> question, that such an "all topics" feed will be required by >>> >> > >>> >> many of >>> >> > >>> the >>> >> > >>> >> downstream services that we will one day be relying on to >>> >> > >>> >> create more >>> >> > >>> finely >>> >> > >>> >> defined aggregations. Given that this specific feed will be >>> >> > >>> >> commonly >>> >> > >>> >> required, it would be best if we had a common mechanism for a >>> >> > >>> downstream >>> >> > >>> >> service/subscriber to request that feed and that we set some >>> >> > >>> expectations >>> >> > >>> >> for how that feed will be formatted and delivered (i.e. Atom >>> >> > >>> >> entries, >>> >> > >>> >> persistent connections, chunked content model, ...). It would >>> >> > >>> >> be very >>> >> > >>> >> cumbersome for a downstream filtering/aggregating service to >>> >> > >>> >> need to >>> >> > >>> puzzle >>> >> > >>> >> through service specific mechanisms for discovering how to >>> >> > >>> >> obtain a >>> >> > >>> firehose >>> >> > >>> >> feed of "all topics" from many different hubs. >>> >> >>> >> > >>> >> bob wyman >>> >> >>> >> > >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik >>> >> > >>> >> <[email protected]> wrote: >>> >> >>> >> > >>> >> > Right, so how does the smart hub aggregate the feeds? Does >>> >> > >>> >> > it then >>> >> > >>> >> > have to crawl to find the list? That wouldn't be very >>> >> > >>> >> > useful. Having >>> >> > >>> >> > said that... >>> >> >>> >> > >>> >> > +1 For 'smart, aggregating hub generating a synthetic feed' >>> >> > >>> >> > +1 For XRD discovery of the firehose endpoint. >>> >> >>> >> > >>> >> > Thinking a bit more about the firehose, what about making >>> >> > >>> >> > it more >>> >> > >>> >> > flexible. Specifically, if we treat 'firehose' as any >>> >> > >>> >> > bundle of >>> >> > >>> feeds >>> >> > >>> >> > (all, or some), then a hub could define multiple firehose >>> >> > >>> >> > streams. >>> >> > >>> For >>> >> > >>> >> > example, at PostRank we classify feeds by topic, so if >>> >> > >>> >> > someone >>> >> > >>> wanted >>> >> > >>> >> > to subscribe to "Technology", we could expose that as a >>> >> > >>> >> > firehose so >>> >> > >>> >> > the user doesn't have to subscribe to every feed in that >>> >> > >>> >> > topic. In >>> >> > >>> >> > essence, a firehose stream is then any bundle of feeds. >>> >> >>> >> > >>> >> > This may be overloading the hub spec but the overall >>> >> > >>> >> > mechanics would >>> >> > >>> >> > be: >>> >> > >>> >> > - A (super)user can declare a firehose endpoint >>> >> > >>> >> > - A (super)user is then able to add or remove >>> >> > >>> >> > subscriptions from >>> >> > >>> the >>> >> > >>> >> > firehose to create arbitrary aggregation streams >>> >> > >>> >> > - A subscriber uses XRD to discover the available >>> >> > >>> >> > aggregation >>> >> > >>> streams >>> >> > >>> >> > - Firehose with 'all' feeds is a special case of the >>> >> > >>> >> > above, where >>> >> > >>> all >>> >> > >>> >> > feeds are present >>> >> >>> >> > >>> >> > This definitely adds more complexity into the hub... The >>> >> > >>> >> > alternative >>> >> > >>> >> > is of course for the publisher to create a syndicated feed >>> >> > >>> >> > and >>> >> > >>> publish >>> >> > >>> >> > that directly as a standalone feed. Still trying to weight >>> >> > >>> >> > the up/ >>> >> > >>> >> > downsides in my head, but want to put it out there as an >>> >> > >>> >> > idea. >>> >> >>> >> > >>> >> > -------- >>> >> > >>> >> > Ilya Grigorik >>> >> > >>> >> > postrank.com >>> >> >>> >> > >> -- >>> >> > >> Nick Johnson, Developer Programs Engineer, App Engine >>> >> > >> Google Ireland Ltd. :: Registered in Dublin, Ireland, >>> >> > >> Registration Number: >>> >> > >> 368047 >>> >> >>> >> > > -- >>> >> > > Marcus Herou CTO and co-founder Tailsweep AB >>> >> > > +46702561312 >>> >> > > [email protected] >>> >> > >http://www.tailsweep.com/ >>> > >> >> >> >> -- >> Jeff Lindsay >> http://webhooks.org -- Make the web more programmable >> http://shdh.org -- A party for hackers and thinkers >> http://tigdb.com -- Discover indie games >> http://progrium.com -- More interesting things > >
