YOU ROCK

Cheers,
Chris

On Nov 10, 2011, at 2:59 PM, Lewis John Mcgibbney wrote:

> So the Jenkins area for Big Top is here.
> 
> http://bigtop01.cloudera.org:8080
> 
> I'll try to push through with this so heres to hoping :0)
> 
> Thanks
> 
> On Thu, Nov 10, 2011 at 12:02 PM, Markus Jelsma
> <[email protected]>wrote:
> 
>> 
>>> Hi Lewis,
>>> 
>>> My comment was not to be taken too seriously :-)
>>> 
>>> I think our official position is to support only the Apache distrib of
>>> Hadoop (correct me if I am wrong), but when we can and if it does not
>> take
>>> much effort, getting Nutch to work on other distros would be a bonus as
>> it
>>> would facilitate its adoption. Sounds like the infra issue would be taken
>>> care of, as for the 'quality' of the crawldb this is not so relevant
>> here.
>>> 
>>> Thoughts from anyone else?
>> 
>> I would agree only to officially support Apache's own Hadoop dist, which
>> can
>> be difficult enough between versions. However, contribs for other dists
>> could
>> be shipped along with Nutch releases although i don't see how different
>> API's
>> could easiliy be integrated if that's the case.
>> 
>>> 
>>> Thanks Lewis!
>>> 
>>> Julien
>>> 
>>> 
>>> On 10 November 2011 16:13, Lewis John Mcgibbney
>>> 
>>> <[email protected]>wrote:
>>>> Hi Julien,
>>>> 
>>>>> you seem to imply that this is not the case but some of us do (or
>> have
>>>>> done) large crawls :-)
>>>> 
>>>> Not at all, I unreservedly take my words back if they came across like
>>>> that, I am well aware of the work some of you guys are doing. Please
>> let
>>>> me rephrase, when was the last time we said we were able to benchmark
>>>> Nutch performance, given some environmenal factors X, and some Hadoop
>>>> distribution Y? Although this might not directly benefit the community
>> in
>>>> terms of improving Nutch codebase, it might give us an indication of
>>>> where we can say Nutch works and where it doesn't. Take NUTCH-839 for
>>>> example.
>>>> 
>>>>> We'd probably end up bogged down in endless discussions about
>>>>> parameters tuning, feature comparison etc...
>>>> 
>>>> Ok I agree with you here, but I think as the purpose of this would not
>> be
>>>> mission critical for Nutch, it would however be nice to try and set
>> some
>>>> 'consistent' Nutch deployment ontop of many distros of Hadoop and build
>>>> recursively.
>>>> 
>>>>> I did not know about bigtop, thanks for the pointer!
>>>>> 
>>>>> Who would provide the cluster for running the tests? Doing large
>> scale
>>>>> crawls is not just about setting it up and watching it work : it does
>>>>> involve a fair amount of monitoring (unless you don't mind having 90%
>>>>> of your crawlDB filled by porn/junk etc... ). Not sure who would find
>>>>> the
>>>> 
>>>> time
>>>> 
>>>>> to do that.
>>>> 
>>>> OK so the actual testing is hosted by Cloudera on thier own Jenkins
>> area.
>>>> From speaking to the guys here, they mentioned that Apache
>> infrastructure
>>>> was not quite sufficient enough to handle the testing environment so
>> they
>>>> are now working from Cloudera's infrastructure. From my discussions so
>>>> far, they are wanting as many application running in a distributed
>>>> fashion on the platform as possible as it will also help them to
>>>> identify where bugs lie in thier own code. We all know Nutch is an
>> ideal
>>>> candidate for this type of job so hopefully we can find some common
>>>> ground beneficial to both projects. I will post the Cloudera Jenkins
>> URL
>>>> when I find Roman and get it from him. In terms of maintenence, I
>> really
>>>> don't know how that would work Julien, but I know that I've got another
>>>> two days of face-to-face oppertunity with these guys so there will be
>> no
>>>> better time to try and sort this kind of thing out.
>>>> 
>>>> Thank you
>> 
> 
> 
> 
> -- 
> *Lewis*


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [email protected]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to