Re: Nutch not crawling all pages

2019-10-30 Thread Dave Beckstrom
not having ambiguous URLs, redirects or 404s or otherwise bogus entries. > > Markus > > > -Original message- > > From:Bruno Osiek > > Sent: Wednesday 30th October 2019 23:51 > > To: user@nutch.apache.org > > Subject: Re: Nutch not crawling all pages >

RE: Nutch not crawling all pages

2019-10-30 Thread Markus Jelsma
entries. Markus -Original message- > From:Bruno Osiek > Sent: Wednesday 30th October 2019 23:51 > To: user@nutch.apache.org > Subject: Re: Nutch not crawling all pages > > What is the output of the inject command, ie, when you inject the 5 > seeds justo before

Re: Nutch not crawling all pages

2019-10-30 Thread Bruno Osiek
> number of documents indexed. > > > > Regards, > > Markus > > > > > > > > -Original message- > > > From:Dave Beckstrom > > > Sent: Wednesday 30th October 2019 20:00 > > > To: user@nutch.apache.org > > > Su

Re: Nutch not crawling all pages

2019-10-30 Thread Dave Beckstrom
> > Sent: Wednesday 30th October 2019 20:00 > > To: user@nutch.apache.org > > Subject: Nutch not crawling all pages > > > > Hi Everyone, > > > > I googled and researched and I am not finding any solutions. I'm hoping > > someone here can help

RE: Nutch not crawling all pages

2019-10-30 Thread Markus Jelsma
-Original message- > From:Dave Beckstrom > Sent: Wednesday 30th October 2019 20:00 > To: user@nutch.apache.org > Subject: Nutch not crawling all pages > > Hi Everyone, > > I googled and researched and I am not finding any solutions. I'm hoping > someone here can

Nutch not crawling all pages

2019-10-30 Thread Dave Beckstrom
Hi Everyone, I googled and researched and I am not finding any solutions. I'm hoping someone here can help. I have txt files with about 50,000 seed urls that are fed to Nutch for crawling and then indexing in SOLR. However, it will not index more than about 39,000 pages no matter what I do.