I think it's very likely that hubs will support all sorts of feed management, discovery and search, including Ilya's suggested mechanism.
I suppose the qn is what *breaks* if Ilya's mechanism is not part of PSHB or does not use a standard model. On Thu, Oct 22, 2009 at 9:05 PM, Bob Wyman <[email protected]> wrote: > On Thu, Oct 22, 2009 at 9:22 AM, igrigorik <[email protected]> wrote: >> Would it be crazy to then expose a mechanism to >> enumerate all of the feeds that a hub tracks, and let >> the client then subscribe to them? > > I think that if I was a spammer, I would strongly support a mechanism to > enumerate feeds. That would allow me to easily scan to determine which of my > feeds had been removed as spam and which were still useful. Certainly, I > could "test" my feeds by actually publishing data, however, I would prefer > not to do that since my testing might, in fact, trigger some spam detection > code. Thus, I would like to have a mechanism to check my spam feeds that > didn't involve actually publishing through them. Using this mechanism, I > might do things like slowly build up a set of inactive but "non-spam" feeds > whose "reputation" would probably grow as they live spam-free for a longer > time. I might then burst a whole bunch of spam through some sub-set of my > "well-aged" feeds from time to time. > > Of course, a hub owner could make my life as a spammer just a little bit > more difficult by including all known spam feeds in the enumerated list. > But, I'm thinking that hub managers wouldn't do that since non-spammers > would complain too much. One problem is that the list would grow > indefinitely, the other is that other people would be keep asking the hub to > stop pestering them by listing known spam feeds. In essence, my job as a > spammer would become easier as the hub owners tried to make life easier for > the non-spammers... Help yourself and you help me. > > bob wyman (not a spammer -- just imagining...) > > On Thu, Oct 22, 2009 at 9:22 AM, igrigorik <[email protected]> wrote: >> >> Ok, so let me try rephrasing the problem. The major problem is not the >> Hub, or the spec, but the need to crawl thousands of sites to find the >> feeds in the first place. Of course, the hub already knows about them, >> hence this discussion. >> >> Would it be crazy to then expose a mechanism to enumerate all of the >> feeds that a hub tracks, and let the client then subscribe to them? >> >> An obvious obstacle is private feeds, but I wonder if that can be >> handled as a special case? >> >> ig >> >> On Oct 21, 1:55 pm, Marcus Herou <[email protected]> wrote: >> > Well... We can as well publish a firehose but currently our business >> > model is not aimed at that. >> > >> > I was not talking about a service rather a technology which could take >> > the crawling business to another level by aggregating hundreds of hubs >> > and creating something which effectively can deliver tb/s bandwidth by >> > having decentralized servers and data. We are still limited you know >> > by our infrastructure even though it have gb/s to the internet. >> > >> > Since the realtime web is currently still very small all of us need to >> > poll something even you I presume to be able to create a pub/sub arch. >> > In that remark our companies are quite similar, you chose to aggregate >> > and publish your data and make a business of it. We aggregate and >> > refine the data and make business out of that. >> > >> > Dont take me wrong I really like what you do but i am not looking for >> > a data supplier at this time ( might change though ). But if I would >> > look in the data supplier direction you are currently in my/our top >> > ten list :) >> > >> > Skickat från min iPhone >> > >> > On Oct 21, 5:03 pm, Julien Genestoux <[email protected]> >> > wrote: >> > >> > > Hum...http://superfeedr.com? >> > >> > > "Putting ressources in common" is definetely one of the key reasons >> > > why we >> > > built superfeedr. More about that there >> > > :http://blog.superfeedr.com/gospel/something-stupid/ >> > >> > > And yes, we have a firehose available. >> > >> > > Julien >> > >> > > -- >> > > Julien Genestoux, >> > >> > >http://twitter.com/julien51http://superfeedr.com >> > >> > > +1 (415) 254 7340 >> > > +33 (0)9 70 44 76 29 >> > >> > > On Wed, Oct 21, 2009 at 5:26 AM, Marcus Herou >> > > <[email protected]>wrote: >> > >> > > > Feedtree looks cool.... but updated 2006 ? >> > >> > > > On Wed, Oct 21, 2009 at 2:20 PM, Nick Johnson (Google) < >> > > > [email protected]> wrote: >> > >> > > >> On Wed, Oct 21, 2009 at 1:14 PM, Alexis Richardson < >> > > >> [email protected]> wrote: >> > >> > > >>> Hmmm ... gossiptorrent? >> > >> > > >> Feedtree. >> > >> > > >>> On Wed, Oct 21, 2009 at 7:23 AM, Marcus Herou >> > > >>> <[email protected]> wrote: >> > >> > > >>> > Hi. >> > >> > > >>> > We host a search app which is based on feeds of >> > > >>> > blogs/twitter/forums/ >> > > >>> > news etc. We are as you are mentioning polling everything like >> > > >>> > crazy >> > > >>> > and it seems like a total waste of everyones resources. >> > >> > > >>> > So this means that subscribing to something which would >> > > >>> > potentially >> > > >>> > deliver the material to us would be great not just for us but as >> > > >>> > well >> > > >>> > all sites we are crawling. >> > >> > > >>> > However who would like to open up a firehose for free for >> > > >>> > everyone to >> > > >>> > consume ? It will for sure consume a lot of bandwidth and a few >> > > >>> > subscribers will consume most of the bandwidth with this model. >> > > >>> > I thought of something that might solve this issue. Consider the >> > > >>> > following: >> > >> > > >>> > 1) >> > > >>> > * Charge for the bandwidth (wordpress.com does this with flat >> > > >>> > fee) >> > >> > > >>> > 2) >> > > >>> > * Everyone that have firehose consuming needs should as well >> > > >>> > start a >> > > >>> > hub to show good faith and morale. >> > > >>> > * Add support in firehose enabled hubs to share state (with a >> > > >>> > master ?) >> > > >>> > * A firehose enabled hub can subscribe to a master hub which >> > > >>> > makes >> > > >>> > sure that the subscriber as well fulfils some form of contract >> > > >>> > (i.e. >> > > >>> > actually updating/delivering feeds) >> > > >>> > * Each firehose enabled hub must be public and everyone can >> > > >>> > subscribe >> > > >>> > to the feeds like as of current. >> > > >>> > * To share load equally (morale part) then subscribers should >> > > >>> > subscribe to a loadbalanced dns name or some form of delegate >> > > >>> > lb.pshb.com = master hub >> > > >>> > Example 1: lb.pshb.com resolves to pshb.tailsweep.com >> > > >>> > pshb.google.com, effectively DNS-roundrobin >> > > >>> > Example 2: lb.pshb.com delegates to any active master connected >> > > >>> > hub >> > > >>> > in some way. >> > >> > > >>> > This might be too complex to implement and bottlenecks occur at >> > > >>> > the >> > > >>> > master but systems like Hadoop have bottlenecks in terms of the >> > > >>> > NameNode (master) and it seems to perform just perfect so it can >> > > >>> > be >> > > >>> > done. However each firehose hub probably need to persist each >> > > >>> > feed for >> > > >>> > a certain amount of time before purging it. >> > >> > > >>> > Anyway this was just a thought. We at Tailsweep probably could >> > > >>> > help in >> > > >>> > making this happen if there exists some interest. >> > >> > > >>> > Cheers >> > >> > > >>> > //Marcus >> > >> > > >>> > On Oct 20, 8:41 pm, Bob Wyman <[email protected]> wrote: >> > > >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik <[email protected]> >> > > >>> >> wrote: >> > > >>> >> > Specifically, if we treat 'firehose' as any bundle of >> > > >>> >> > feeds (all, or some), then a hub could define >> > > >>> >> > multiple firehose streams. >> > >> > > >>> >> There should be no question that there is tremendous utility in >> > > >>> >> being >> > > >>> able >> > > >>> >> to compose all sorts of "bundles" of topics into distinct >> > > >>> >> feeds. It is >> > > >>> >> probably also the case that we can identify some number of such >> > > >>> bundles that >> > > >>> >> would be useful to a large number of subscribers. On the other >> > > >>> >> hand, >> > > >>> many >> > > >>> >> bundles will be very specific and only useful to one or a small >> > > >>> >> number >> > > >>> of >> > > >>> >> subscribers. In fact, I think what we'll see is that once we >> > > >>> >> have the >> > > >>> core >> > > >>> >> PSHB defined, we'll then see innovation in the definition of >> > > >>> >> "down >> > > >>> stream" >> > > >>> >> services whose function is precisely to build and deliver such >> > > >>> bundles. Some >> > > >>> >> of these services will aggregate groups of topics while others >> > > >>> >> will >> > > >>> focus >> > > >>> >> instead on creating content-based streams -- they will bundle >> > > >>> >> together >> > > >>> >> individual entries based on the content of those entries rather >> > > >>> >> than >> > > >>> simply >> > > >>> >> combining all entries from some set of topics. >> > >> > > >>> >> I think we should be careful not to force too much of the >> > > >>> >> burden of >> > > >>> bundling >> > > >>> >> or aggregating into the core PSHB hub specification. If we want >> > > >>> >> to >> > > >>> address >> > > >>> >> the challenges of building bundles or aggregations, I think it >> > > >>> >> best to >> > > >>> do so >> > > >>> >> in secondary or companion specifications. This will keep the >> > > >>> >> core >> > > >>> cleaner >> > > >>> >> and easy to understand while also allowing the core to be >> > > >>> >> deployed >> > > >>> without >> > > >>> >> being delayed by discussions over non-core issues. >> > >> > > >>> >> Having argued against making the core more complicated by >> > > >>> >> extending it >> > > >>> to >> > > >>> >> include creating aggregate topics, I still suggest that it >> > > >>> >> would be >> > > >>> useful >> > > >>> >> to have the core system define a common means to obtain a pure >> > > >>> "firehose" >> > > >>> >> feed of all topics. The current hub spec works for people who >> > > >>> >> only >> > > >>> want >> > > >>> >> "none or some" of the topics served by the hub. I suggest that >> > > >>> >> we >> > > >>> expand >> > > >>> >> this to have hubs know how to provide "none, some or all" of >> > > >>> >> the >> > > >>> topics. >> > > >>> >> The reason for adding support of "all topics" is that we know, >> > > >>> >> without >> > > >>> much >> > > >>> >> question, that such an "all topics" feed will be required by >> > > >>> >> many of >> > > >>> the >> > > >>> >> downstream services that we will one day be relying on to >> > > >>> >> create more >> > > >>> finely >> > > >>> >> defined aggregations. Given that this specific feed will be >> > > >>> >> commonly >> > > >>> >> required, it would be best if we had a common mechanism for a >> > > >>> downstream >> > > >>> >> service/subscriber to request that feed and that we set some >> > > >>> expectations >> > > >>> >> for how that feed will be formatted and delivered (i.e. Atom >> > > >>> >> entries, >> > > >>> >> persistent connections, chunked content model, ...). It would >> > > >>> >> be very >> > > >>> >> cumbersome for a downstream filtering/aggregating service to >> > > >>> >> need to >> > > >>> puzzle >> > > >>> >> through service specific mechanisms for discovering how to >> > > >>> >> obtain a >> > > >>> firehose >> > > >>> >> feed of "all topics" from many different hubs. >> > >> > > >>> >> bob wyman >> > >> > > >>> >> On Tue, Oct 20, 2009 at 11:22 AM, igrigorik <[email protected]> >> > > >>> >> wrote: >> > >> > > >>> >> > Right, so how does the smart hub aggregate the feeds? Does it >> > > >>> >> > then >> > > >>> >> > have to crawl to find the list? That wouldn't be very useful. >> > > >>> >> > Having >> > > >>> >> > said that... >> > >> > > >>> >> > +1 For 'smart, aggregating hub generating a synthetic feed' >> > > >>> >> > +1 For XRD discovery of the firehose endpoint. >> > >> > > >>> >> > Thinking a bit more about the firehose, what about making it >> > > >>> >> > more >> > > >>> >> > flexible. Specifically, if we treat 'firehose' as any bundle >> > > >>> >> > of >> > > >>> feeds >> > > >>> >> > (all, or some), then a hub could define multiple firehose >> > > >>> >> > streams. >> > > >>> For >> > > >>> >> > example, at PostRank we classify feeds by topic, so if >> > > >>> >> > someone >> > > >>> wanted >> > > >>> >> > to subscribe to "Technology", we could expose that as a >> > > >>> >> > firehose so >> > > >>> >> > the user doesn't have to subscribe to every feed in that >> > > >>> >> > topic. In >> > > >>> >> > essence, a firehose stream is then any bundle of feeds. >> > >> > > >>> >> > This may be overloading the hub spec but the overall >> > > >>> >> > mechanics would >> > > >>> >> > be: >> > > >>> >> > - A (super)user can declare a firehose endpoint >> > > >>> >> > - A (super)user is then able to add or remove subscriptions >> > > >>> >> > from >> > > >>> the >> > > >>> >> > firehose to create arbitrary aggregation streams >> > > >>> >> > - A subscriber uses XRD to discover the available >> > > >>> >> > aggregation >> > > >>> streams >> > > >>> >> > - Firehose with 'all' feeds is a special case of the above, >> > > >>> >> > where >> > > >>> all >> > > >>> >> > feeds are present >> > >> > > >>> >> > This definitely adds more complexity into the hub... The >> > > >>> >> > alternative >> > > >>> >> > is of course for the publisher to create a syndicated feed >> > > >>> >> > and >> > > >>> publish >> > > >>> >> > that directly as a standalone feed. Still trying to weight >> > > >>> >> > the up/ >> > > >>> >> > downsides in my head, but want to put it out there as an >> > > >>> >> > idea. >> > >> > > >>> >> > -------- >> > > >>> >> > Ilya Grigorik >> > > >>> >> > postrank.com >> > >> > > >> -- >> > > >> Nick Johnson, Developer Programs Engineer, App Engine >> > > >> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration >> > > >> Number: >> > > >> 368047 >> > >> > > > -- >> > > > Marcus Herou CTO and co-founder Tailsweep AB >> > > > +46702561312 >> > > > [email protected] >> > > >http://www.tailsweep.com/ >
