not having ambiguous URLs, redirects or 404s or otherwise bogus entries.
>
> Markus
>
>
> -Original message-
> > From:Bruno Osiek
> > Sent: Wednesday 30th October 2019 23:51
> > To: user@nutch.apache.org
> > Subject: Re: Nutch not crawling all pages
>
entries.
Markus
-Original message-
> From:Bruno Osiek
> Sent: Wednesday 30th October 2019 23:51
> To: user@nutch.apache.org
> Subject: Re: Nutch not crawling all pages
>
> What is the output of the inject command, ie, when you inject the 5
> seeds justo before
> number of documents indexed.
> >
> > Regards,
> > Markus
> >
> >
> >
> > -Original message-
> > > From:Dave Beckstrom
> > > Sent: Wednesday 30th October 2019 20:00
> > > To: user@nutch.apache.org
> > > Su
> > Sent: Wednesday 30th October 2019 20:00
> > To: user@nutch.apache.org
> > Subject: Nutch not crawling all pages
> >
> > Hi Everyone,
> >
> > I googled and researched and I am not finding any solutions. I'm hoping
> > someone here can help
-Original message-
> From:Dave Beckstrom
> Sent: Wednesday 30th October 2019 20:00
> To: user@nutch.apache.org
> Subject: Nutch not crawling all pages
>
> Hi Everyone,
>
> I googled and researched and I am not finding any solutions. I'm hoping
> someone here can
Hi Everyone,
I googled and researched and I am not finding any solutions. I'm hoping
someone here can help.
I have txt files with about 50,000 seed urls that are fed to Nutch for
crawling and then indexing in SOLR. However, it will not index more than
about 39,000 pages no matter what I do.
6 matches
Mail list logo