You will need to run Nutch several times in order to fetch everything. If you have one URL in your seed.txt, it will only index ONE page/file ie: Index.html of that URL - then process this page and add all links it finds in index.html to the database. On the next run it will then fetch the links it found in the first run, on the 3rd run it will fetch the links it found on the 2nd run and so forth...
Have a great weekend everyone ! On Fri, Sep 9, 2016 at 9:05 PM, Comcast <[email protected]> wrote: > Tried that. Same result > > Sent from my iPhone > > > On Sep 9, 2016, at 3:04 PM, BlackIce <[email protected]> wrote: > > > > Change the -1 to a positive number like 5 or so.... (In the command) > > > >> On Sep 9, 2016 8:20 PM, "KRIS MUSSHORN" <[email protected]> wrote: > >> > >> Executing this does NOT index everything in and under seed.txt. > >> > >> ./bin/crawl -i -D solr.server.url=http://localhost:8983/solr/TEST_CORE > >> urls/ crawl -1 > >> > >> I have to run it multiple times to get all content. > >> > >> Is it possible related to this setting in nutch-site.xml? > >> > >> <property> > >> <name>db.max.outlinks.per.page</name> > >> <value>-1</value> > >> <description> > >> allow unlimited outlinks with -1 > >> </description> > >> </property> > >> > >> Thx, > >> > >> Kris > >> >

