YOU ROCK Cheers, Chris
On Nov 10, 2011, at 2:59 PM, Lewis John Mcgibbney wrote: > So the Jenkins area for Big Top is here. > > http://bigtop01.cloudera.org:8080 > > I'll try to push through with this so heres to hoping :0) > > Thanks > > On Thu, Nov 10, 2011 at 12:02 PM, Markus Jelsma > <[email protected]>wrote: > >> >>> Hi Lewis, >>> >>> My comment was not to be taken too seriously :-) >>> >>> I think our official position is to support only the Apache distrib of >>> Hadoop (correct me if I am wrong), but when we can and if it does not >> take >>> much effort, getting Nutch to work on other distros would be a bonus as >> it >>> would facilitate its adoption. Sounds like the infra issue would be taken >>> care of, as for the 'quality' of the crawldb this is not so relevant >> here. >>> >>> Thoughts from anyone else? >> >> I would agree only to officially support Apache's own Hadoop dist, which >> can >> be difficult enough between versions. However, contribs for other dists >> could >> be shipped along with Nutch releases although i don't see how different >> API's >> could easiliy be integrated if that's the case. >> >>> >>> Thanks Lewis! >>> >>> Julien >>> >>> >>> On 10 November 2011 16:13, Lewis John Mcgibbney >>> >>> <[email protected]>wrote: >>>> Hi Julien, >>>> >>>>> you seem to imply that this is not the case but some of us do (or >> have >>>>> done) large crawls :-) >>>> >>>> Not at all, I unreservedly take my words back if they came across like >>>> that, I am well aware of the work some of you guys are doing. Please >> let >>>> me rephrase, when was the last time we said we were able to benchmark >>>> Nutch performance, given some environmenal factors X, and some Hadoop >>>> distribution Y? Although this might not directly benefit the community >> in >>>> terms of improving Nutch codebase, it might give us an indication of >>>> where we can say Nutch works and where it doesn't. Take NUTCH-839 for >>>> example. >>>> >>>>> We'd probably end up bogged down in endless discussions about >>>>> parameters tuning, feature comparison etc... >>>> >>>> Ok I agree with you here, but I think as the purpose of this would not >> be >>>> mission critical for Nutch, it would however be nice to try and set >> some >>>> 'consistent' Nutch deployment ontop of many distros of Hadoop and build >>>> recursively. >>>> >>>>> I did not know about bigtop, thanks for the pointer! >>>>> >>>>> Who would provide the cluster for running the tests? Doing large >> scale >>>>> crawls is not just about setting it up and watching it work : it does >>>>> involve a fair amount of monitoring (unless you don't mind having 90% >>>>> of your crawlDB filled by porn/junk etc... ). Not sure who would find >>>>> the >>>> >>>> time >>>> >>>>> to do that. >>>> >>>> OK so the actual testing is hosted by Cloudera on thier own Jenkins >> area. >>>> From speaking to the guys here, they mentioned that Apache >> infrastructure >>>> was not quite sufficient enough to handle the testing environment so >> they >>>> are now working from Cloudera's infrastructure. From my discussions so >>>> far, they are wanting as many application running in a distributed >>>> fashion on the platform as possible as it will also help them to >>>> identify where bugs lie in thier own code. We all know Nutch is an >> ideal >>>> candidate for this type of job so hopefully we can find some common >>>> ground beneficial to both projects. I will post the Cloudera Jenkins >> URL >>>> when I find Roman and get it from him. In terms of maintenence, I >> really >>>> don't know how that would work Julien, but I know that I've got another >>>> two days of face-to-face oppertunity with these guys so there will be >> no >>>> better time to try and sort this kind of thing out. >>>> >>>> Thank you >> > > > > -- > *Lewis* ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: [email protected] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

