So the Jenkins area for Big Top is here.

http://bigtop01.cloudera.org:8080

I'll try to push through with this so heres to hoping :0)

Thanks

On Thu, Nov 10, 2011 at 12:02 PM, Markus Jelsma
<[email protected]>wrote:

>
> > Hi Lewis,
> >
> > My comment was not to be taken too seriously :-)
> >
> > I think our official position is to support only the Apache distrib of
> > Hadoop (correct me if I am wrong), but when we can and if it does not
> take
> > much effort, getting Nutch to work on other distros would be a bonus as
> it
> > would facilitate its adoption. Sounds like the infra issue would be taken
> > care of, as for the 'quality' of the crawldb this is not so relevant
> here.
> >
> > Thoughts from anyone else?
>
> I would agree only to officially support Apache's own Hadoop dist, which
> can
> be difficult enough between versions. However, contribs for other dists
> could
> be shipped along with Nutch releases although i don't see how different
> API's
> could easiliy be integrated if that's the case.
>
> >
> > Thanks Lewis!
> >
> > Julien
> >
> >
> > On 10 November 2011 16:13, Lewis John Mcgibbney
> >
> > <[email protected]>wrote:
> > > Hi Julien,
> > >
> > > > you seem to imply that this is not the case but some of us do (or
> have
> > > > done) large crawls :-)
> > >
> > > Not at all, I unreservedly take my words back if they came across like
> > > that, I am well aware of the work some of you guys are doing. Please
> let
> > > me rephrase, when was the last time we said we were able to benchmark
> > > Nutch performance, given some environmenal factors X, and some Hadoop
> > > distribution Y? Although this might not directly benefit the community
> in
> > > terms of improving Nutch codebase, it might give us an indication of
> > > where we can say Nutch works and where it doesn't. Take NUTCH-839 for
> > > example.
> > >
> > > > We'd probably end up bogged down in endless discussions about
> > > > parameters tuning, feature comparison etc...
> > >
> > > Ok I agree with you here, but I think as the purpose of this would not
> be
> > > mission critical for Nutch, it would however be nice to try and set
> some
> > > 'consistent' Nutch deployment ontop of many distros of Hadoop and build
> > > recursively.
> > >
> > > > I did not know about bigtop, thanks for the pointer!
> > > >
> > > > Who would provide the cluster for running the tests? Doing large
> scale
> > > > crawls is not just about setting it up and watching it work : it does
> > > > involve a fair amount of monitoring (unless you don't mind having 90%
> > > > of your crawlDB filled by porn/junk etc... ). Not sure who would find
> > > > the
> > >
> > > time
> > >
> > > > to do that.
> > >
> > > OK so the actual testing is hosted by Cloudera on thier own Jenkins
> area.
> > > From speaking to the guys here, they mentioned that Apache
> infrastructure
> > > was not quite sufficient enough to handle the testing environment so
> they
> > > are now working from Cloudera's infrastructure. From my discussions so
> > > far, they are wanting as many application running in a distributed
> > > fashion on the platform as possible as it will also help them to
> > > identify where bugs lie in thier own code. We all know Nutch is an
> ideal
> > > candidate for this type of job so hopefully we can find some common
> > > ground beneficial to both projects. I will post the Cloudera Jenkins
> URL
> > > when I find Roman and get it from him. In terms of maintenence, I
> really
> > > don't know how that would work Julien, but I know that I've got another
> > > two days of face-to-face oppertunity with these guys so there will be
> no
> > > better time to try and sort this kind of thing out.
> > >
> > > Thank you
>



-- 
*Lewis*

Reply via email to