Hello Yossi, That should only be the case if the CrawlDB is updated by the generator, which is not a default.
Regards, Markus -----Original message----- > From:Yossi Tamari <yossi.tam...@pipl.com> > Sent: Monday 19th November 2018 14:04 > To: user@nutch.apache.org > Subject: RE: RE: unexpected Nutch crawl interruption > > I think in the case that you interrupt the fetcher, you'll have the problem > that URLs that where scheduled to be fetched on the interrupted cycle will > never be fetched (because of NUTCH-1842). > > Yossi. > > > -----Original Message----- > > From: Markus Jelsma <markus.jel...@openindex.io> > > Sent: 19 November 2018 14:52 > > To: user@nutch.apache.org > > Subject: RE: RE: unexpected Nutch crawl interruption > > > > Hello Hany, > > > > That depends. If you interrupt the fetcher, the segment being fetched can be > > thrown away. But if you interrupt updatedb, you can remove the temp > > directory > > and must get rid of the lock file. The latter is also true if you interrupt > > the > > generator. > > > > Regards, > > Markus > > > > > > > > -----Original message----- > > > From:hany.n...@hsbc.com <hany.n...@hsbc.com> > > > Sent: Monday 19th November 2018 13:30 > > > To: user@nutch.apache.org > > > Subject: RE: RE: unexpected Nutch crawl interruption > > > > > > This means there is nothing called corrupted db by any mean? > > > > > > > > > Kind regards, > > > Hany Shehata > > > Solutions Architect, Marketing and Communications IT Corporate > > > Functions | HSBC Operations, Services and Technology (HOST) ul. > > > Kapelanka 42A, 30-347 Kraków, Poland > > > > > _________________________________________________________________ > > _ > > > > > > Tie line: 7148 7689 4698 > > > External: +48 123 42 0698 > > > Mobile: +48 723 680 278 > > > E-mail: hany.n...@hsbc.com > > > > > _________________________________________________________________ > > _ > > > Protect our environment - please only print this if you have to! > > > > > > > > > -----Original Message----- > > > From: Semyon Semyonov [mailto:semyon.semyo...@mail.com] > > > Sent: Monday, November 19, 2018 12:59 PM > > > To: user@nutch.apache.org > > > Subject: Re: RE: unexpected Nutch crawl interruption > > > > > > From the most recent updated crawldb. > > > > > > > > > Sent: Monday, November 19, 2018 at 12:35 PM > > > From: hany.n...@hsbc.com > > > To: "user@nutch.apache.org" <user@nutch.apache.org> > > > Subject: RE: unexpected Nutch crawl interruption Hello Semyon, > > > > > > Does it means that if I re-run crawl command it will continue from where > > > it has > > been stopped from the previous run? > > > > > > Kind regards, > > > Hany Shehata > > > Solutions Architect, Marketing and Communications IT Corporate > > > Functions | HSBC Operations, Services and Technology (HOST) ul. > > > Kapelanka 42A, 30-347 Kraków, Poland > > > > > _________________________________________________________________ > > _ > > > > > > Tie line: 7148 7689 4698 > > > External: +48 123 42 0698 > > > Mobile: +48 723 680 278 > > > E-mail: hany.n...@hsbc.com > > > > > _________________________________________________________________ > > _ > > > Protect our environment - please only print this if you have to! > > > > > > > > > -----Original Message----- > > > From: Semyon Semyonov [mailto:semyon.semyo...@mail.com] > > > Sent: Monday, November 19, 2018 12:06 PM > > > To: user@nutch.apache.org > > > Subject: Re: unexpected Nutch crawl interruption > > > > > > Hi Hany, > > > > > > If you open the script code you will reach that line: > > > > > > # main loop : rounds of generate - fetch - parse - update for ((a=1; ; > > > a++)) with > > number of break conditions. > > > > > > For each iteration it calls n-independent map jobs. > > > If it breaks it stops. > > > You should finish the loop either with manual nutch commands, or start > > > with > > the new call of crawl script using the past iteration crawldb. > > > Semyon. > > > > > > > > > > > > Sent: Monday, November 19, 2018 at 11:41 AM > > > From: hany.n...@hsbc.com > > > To: "user@nutch.apache.org" <user@nutch.apache.org> > > > Subject: unexpected Nutch crawl interruption Hello, > > > > > > What will happen if bin/crawl command is forced to be stopped by any > > reason? Server restart.... > > > > > > Kind regards, > > > Hany Shehata > > > Solutions Architect, Marketing and Communications IT Corporate > > > Functions | HSBC Operations, Services and Technology (HOST) ul. > > > Kapelanka 42A, 30-347 Kraków, Poland > > > > > _________________________________________________________________ > > _ > > > > > > Tie line: 7148 7689 4698 > > > External: +48 123 42 0698 > > > Mobile: +48 723 680 278 > > > E-mail: hany.n...@hsbc.com<mailto:hany.n...@hsbc.com> > > > > > _________________________________________________________________ > > _ > > > Protect our environment - please only print this if you have to! > > > > > > > > > > > > ----------------------------------------- > > > SAVE PAPER - THINK BEFORE YOU PRINT! > > > > > > This E-mail is confidential. > > > > > > It may also be legally privileged. If you are not the addressee you may > > > not > > copy, forward, disclose or use any part of it. If you have received this > > message in > > error, please delete it and all copies from your system and notify the > > sender > > immediately by return E-mail. > > > > > > Internet communications cannot be guaranteed to be timely secure, error or > > virus-free. > > > The sender does not accept liability for any errors or omissions. > > > > > > > > > *************************************************** > > > This message originated from the Internet. Its originator may or may not > > > be > > who they claim to be and the information contained in the message and any > > attachments may or may not be accurate. > > > **************************************************** > > > > > > > > > > > > > > > ----------------------------------------- > > > SAVE PAPER - THINK BEFORE YOU PRINT! > > > > > > This E-mail is confidential. > > > > > > It may also be legally privileged. If you are not the addressee you may > > > not > > copy, forward, disclose or use any part of it. If you have received this > > message in > > error, please delete it and all copies from your system and notify the > > sender > > immediately by return E-mail. > > > > > > Internet communications cannot be guaranteed to be timely secure, error or > > virus-free. > > > The sender does not accept liability for any errors or omissions. > > > > > > > > > *************************************************** > > > This message originated from the Internet. Its originator may or may not > > > be > > who they claim to be and the information contained in the message and any > > attachments may or may not be accurate. > > > **************************************************** > > > > > > > > > ----------------------------------------- > > > SAVE PAPER - THINK BEFORE YOU PRINT! > > > > > > This E-mail is confidential. > > > > > > It may also be legally privileged. If you are not the addressee you > > > may not copy, forward, disclose or use any part of it. If you have > > > received this message in error, please delete it and all copies from > > > your system and notify the sender immediately by return E-mail. > > > > > > Internet communications cannot be guaranteed to be timely secure, error or > > virus-free. > > > The sender does not accept liability for any errors or omissions. > > > > >