Hi Chris, My user name is AnkitGoel. Glad to be able to contribute. Thanks.
ps: tried to send u an email and got an auto response. congrats if i may say so On Thu, Jul 23, 2015 at 8:47 PM, Mattmann, Chris A (3980) < [email protected]> wrote: > Yes that would be fantastic. How about a wiki page on getting up > and running and overcoming problems with the most recent Nutch? > > The Nutch wiki is here: > > http://wiki.apache.org/nutch/ > > Please sign up for an account and tell me your username. Then I’ll > grant you permissions to edit the wiki. > > Thank you Ankit! > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Chief Architect > Instrument Software and Science Data Systems Section (398) > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 168-519, Mailstop: 168-527 > Email: [email protected] > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Associate Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > > > -----Original Message----- > From: Ankit Goel <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Thursday, July 23, 2015 at 7:22 AM > To: "[email protected]" <[email protected]> > Subject: Re: Nutch on the cloud > > >Hey, > >@Chris, I would love to help with the wiki (honored in fact), but my > >inputs > >are not with respect to the getting started process. More along the lines > >of frequent errors after that. For example, the redirect plugin doesnt > >work > >how u expect it to (not even with the latest one). Or sometimes the > >parsechecker will give results that a normal nutch run wont, even tho its > >the same regex filter, or where to check it. Or which solr you need to > >start with cause the 5.x has a diff file structure. Things like that on > >which you spend a long. > > > >If there is a wiki for such a page I will gladly step up to the plate. It > >isnt exactly faq either. I was thinking I could blog about it, but I think > >ur idea of a wiki would be better so that it can be updated by later > >authors as the problems are removed. Uh so should I create one on the > >nutch > >site? Also many of the problems are questioned multiple times in the > >mailing grp, and google search just doesnt cut it. So maybe a repository > >of > >frequent problems? that sort? > >thanks for the heads up on the other guide. gave me a starting point. > > > > > >On Thu, Jul 23, 2015 at 6:24 AM, Mattmann, Chris A (3980) < > >[email protected]> wrote: > > > >> Thanks Ankit for the honest feedback. Would you be willing to update > >> our wiki and improve the instructions based on your experiences for > >> our gotchas? > >> > >> We have a guide we have been working on ourselves to getting Nutch > >> running and churning on ElasticMap Reduce. That’s where I’d recommend > >> starting. > >> > >> Cheers, > >> Chris > >> > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Chris Mattmann, Ph.D. > >> Chief Architect > >> Instrument Software and Science Data Systems Section (398) > >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > >> Office: 168-519, Mailstop: 168-527 > >> Email: [email protected] > >> WWW: http://sunset.usc.edu/~mattmann/ > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> Adjunct Associate Professor, Computer Science Department > >> University of Southern California, Los Angeles, CA 90089 USA > >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> > >> > >> > >> > >> > >> -----Original Message----- > >> From: Ankit Goel <[email protected]> > >> Reply-To: "[email protected]" <[email protected]> > >> Date: Wednesday, July 22, 2015 at 5:51 PM > >> To: "[email protected]" <[email protected]> > >> Subject: Nutch on the cloud > >> > >> >Hi, > >> >After my runs on my lappy, I'm ready to port my work to the cloud. > >> >Planning > >> >to use Amazon. One thing I noticed when I started with nutch that there > >> >were a lot of things unsaid on the site/wiki and took me a lot of time > >>to > >> >figure out. Pitfalls if I may call them. I dont really have code or > >> >scripts, but I need nutch to run all the time on the cloud. > >> > > >> >So before I port to the cloud, are there any things I should beware of > >>or > >> >lookout for? Like is AWS fine with nutch? Are there any configurations > >>I > >> >should remember? Any advice on implementation to ease my transition and > >> >run > >> >nutch 24hrs? i will be running a seed file and crawl the net in > >>general. > >> >Thanks > >> > > >> >-- > >> >Regards, > >> >Ankit Goel > >> >http://about.me/ankitgoel > >> > >> > > > > > >-- > >Regards, > >Ankit Goel > >http://about.me/ankitgoel > > -- Regards, Ankit Goel http://about.me/ankitgoel

