Hey, @Chris, I would love to help with the wiki (honored in fact), but my inputs are not with respect to the getting started process. More along the lines of frequent errors after that. For example, the redirect plugin doesnt work how u expect it to (not even with the latest one). Or sometimes the parsechecker will give results that a normal nutch run wont, even tho its the same regex filter, or where to check it. Or which solr you need to start with cause the 5.x has a diff file structure. Things like that on which you spend a long.
If there is a wiki for such a page I will gladly step up to the plate. It isnt exactly faq either. I was thinking I could blog about it, but I think ur idea of a wiki would be better so that it can be updated by later authors as the problems are removed. Uh so should I create one on the nutch site? Also many of the problems are questioned multiple times in the mailing grp, and google search just doesnt cut it. So maybe a repository of frequent problems? that sort? thanks for the heads up on the other guide. gave me a starting point. On Thu, Jul 23, 2015 at 6:24 AM, Mattmann, Chris A (3980) < [email protected]> wrote: > Thanks Ankit for the honest feedback. Would you be willing to update > our wiki and improve the instructions based on your experiences for > our gotchas? > > We have a guide we have been working on ourselves to getting Nutch > running and churning on ElasticMap Reduce. That’s where I’d recommend > starting. > > Cheers, > Chris > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > -----Original Message----- > From: Ankit Goel <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Wednesday, July 22, 2015 at 5:51 PM > To: "[email protected]" <[email protected]> > Subject: Nutch on the cloud > > >Hi, > >After my runs on my lappy, I'm ready to port my work to the cloud. > >Planning > >to use Amazon. One thing I noticed when I started with nutch that there > >were a lot of things unsaid on the site/wiki and took me a lot of time to > >figure out. Pitfalls if I may call them. I dont really have code or > >scripts, but I need nutch to run all the time on the cloud. > > > >So before I port to the cloud, are there any things I should beware of or > >lookout for? Like is AWS fine with nutch? Are there any configurations I > >should remember? Any advice on implementation to ease my transition and > >run > >nutch 24hrs? i will be running a seed file and crawl the net in general. > >Thanks > > > >-- > >Regards, > >Ankit Goel > >http://about.me/ankitgoel > > -- Regards, Ankit Goel http://about.me/ankitgoel

