Re: Writing a Book on Nutch

2010-11-04 Thread cong liu
I want to know the schedule of fetcher which may be the graph theory? On Tue, Nov 2, 2010 at 8:30 PM, Markus Jelsma wrote: > Hello Dennis, > > How's it going? > > Cheers, > > On Monday 17 May 2010 03:27:58 Dennis Kubes wrote: > > Hi Everyone, > > > > It has been a long time coming but I have fin

Re: Writing a Book on Nutch

2010-11-02 Thread nitin hardeniya
Writing plugins is one of the most important & something on which not so many comprehending tutorials are available .we also doesn't have any video tutorial for any of them .also if you add nutch +hadoop that will be very cool . I will be available for any help. On Tue, Nov 2, 2010 at 8:30 AM,

Re: Writing a Book on Nutch

2010-11-02 Thread Markus Jelsma
Hello Dennis, How's it going? Cheers, On Monday 17 May 2010 03:27:58 Dennis Kubes wrote: > Hi Everyone, > > It has been a long time coming but I have finally started to write a > book on Nutch. It will be self published and should be available in PDF > / paperback form in less than a month ho

Re: Writing a Book on Nutch

2010-05-18 Thread Mambe Churchill Nanje
I need to know how to be able to integrate nutch with solr and also track the index time of an article on solr then sort...if you book can use such a case study you got my buy on that Mambe Churchill Nanje 237 77545907, AfroVisioN Founder, President,CEO http://www.afrovisiongroup.com | http://mam

Re: Writing a Book on Nutch

2010-05-18 Thread Dennis Kubes
I wanted to thank everyone for all the great responses. It really helps in putting together information that will be useful to everyone. I am in also process of launching a blog about nutch/hadoop too and am working to get the first post (with video) done and up. I will update the list when

RE: Writing a Book on Nutch

2010-05-17 Thread Arkadi.Kosmynin
, Arkadi > -Original Message- > From: Dennis Kubes [mailto:ku...@apache.org] > Sent: Monday, May 17, 2010 11:28 AM > To: user@nutch.apache.org > Subject: Writing a Book on Nutch > > Hi Everyone, > > It has been a long time coming but I have finally started

Re: Writing a Book on Nutch

2010-05-17 Thread Kevin Chen
Second this. Best practice in a production system, how to keep re-crawling without bloating the whole system. On 5/17/2010 3:40 AM, Piet van Remortel wrote: re-crawling and controlling that process seems like an issue in need of covering to me Thanks Piet Belgium On Mon, May 17, 2010 at 9:3

Re: Writing a Book on Nutch

2010-05-17 Thread Hemanth Yamijala
Hi, > "re-crawling and controlling that process seems like an issue in need of > covering to me" > > I am also very interested in knowing that better .. > But also better strategies for crawling a single site and some benchmarks, > linking configuration to performance. +1 for information on bench

Re: Writing a Book on Nutch

2010-05-17 Thread Ron Shigeta
I'd like to second this- ties in to hadoop and other ways to analyze your index are a big mystery to me when dealing with nutch! - Original Message From: Alex Basa To: user@nutch.apache.org Sent: Sun, May 16, 2010 9:18:01 PM Subject: Re: Writing a Book on Nutch Dennis, One

Re: Writing a Book on Nutch

2010-05-17 Thread Emmanuel de Castro Santana
t; > * Aging of your Nutch segments. When do you really need to blow away > > everything and start from scratch. > > * How do you recover from an interrupted / crashed spider / index run > that > > took days or weeks to run (so you don't want to "just start over"

Re: Writing a Book on Nutch

2010-05-17 Thread Davide Del Vecchio
IDEA-ENG / Cell: 408-829-6513 > > > On Sun, May 16, 2010 at 9:18 PM, Alex Basa wrote: > >> Dennis, >> >> One topic that had taken me a long time to figure out and lots of people >> have been having issues with is doing an incremental index.  I don't think >>

Re: Writing a Book on Nutch

2010-05-17 Thread Doğacan Güney
Hey, On Mon, May 17, 2010 at 04:27, Dennis Kubes wrote: > Hi Everyone, > > It has been a long time coming but I have finally started to write a book > on Nutch. It will be self published and should be available in PDF / > paperback form in less than a month hopefully. > > A while back we discus

Re: Writing a Book on Nutch

2010-05-17 Thread Ninad Raut
I would like one chapter on how to configure Nutch for focus crawling.. best practices and strategies... especially to avoid host-blocking. On Mon, May 17, 2010 at 6:57 AM, Dennis Kubes wrote: > Hi Everyone, > > It has been a long time coming but I have finally started to write a book > on Nutch

Re: Writing a Book on Nutch

2010-05-17 Thread Mark Bennett
think > it was documented anywhere and it would be great if you could cover it. > > Thanks, > > Alex > > --- On Sun, 5/16/10, Dennis Kubes wrote: > > > From: Dennis Kubes > > Subject: Writing a Book on Nutch > > To: user@nutch.apache.org > > Date:

Re: Writing a Book on Nutch

2010-05-17 Thread Piet van Remortel
re-crawling and controlling that process seems like an issue in need of covering to me Thanks Piet Belgium On Mon, May 17, 2010 at 9:32 AM, Alexander Aristov < alexander.aris...@gmail.com> wrote: > I would definetely want to see answers on questions about distributed > search. > > Starting from

Re: Writing a Book on Nutch

2010-05-17 Thread Alexander Aristov
I would definetely want to see answers on questions about distributed search. Starting from crawling, - how to make it in distributed mode, where to store collected pages and indexes and ending questions about relevancy of results abtained from different search servers. Best Regards Alexander Ar

Re: Writing a Book on Nutch

2010-05-16 Thread Alex Basa
gt; From: Dennis Kubes > Subject: Writing a Book on Nutch > To: user@nutch.apache.org > Date: Sunday, May 16, 2010, 8:27 PM > Hi Everyone, > > It has been a long time coming but I have finally started > to write a book on Nutch.  It will be self published > and should be avai

Writing a Book on Nutch

2010-05-16 Thread Dennis Kubes
Hi Everyone, It has been a long time coming but I have finally started to write a book on Nutch. It will be self published and should be available in PDF / paperback form in less than a month hopefully. A while back we discussed a Nutch training seminar on the list. I am not ready to do a