Got errors: when run this command: ./crawl ../urls/ TestCrawler http://localhost:8983/solr 1 Have any idea where to go from here?
thank you. Tom ParserJob: finished at 2016-02-22 20:22:47, time elapsed: 00:00:11 CrawlDB update for TestCrawler /home/nutch/runtime/local/bin/nutch updatedb -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true 1456190521-30847 -crawlId TestCrawler DbUpdaterJob: starting at 2016-02-22 20:22:49 DbUpdaterJob: batchId: 1456190521-30847 DbUpdaterJob: finished at 2016-02-22 20:22:58, time elapsed: 00:00:09 Indexing TestCrawler on SOLR index -> http://localhost:8983/solr /home/nutch/runtime/local/bin/nutch index -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true -D solr.server.url= http://localhost:8983/solr -all -crawlId TestCrawler IndexingJob: starting SolrIndexerJob: java.lang.RuntimeException: job failed: name=[TestCrawler]Indexer, jobid=job_local1592190856_0001 at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:119) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:154) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:176) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:202) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:211) *Error running: /home/nutch/runtime/local/bin/nutch index -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true -D solr.server.url=http://localhost:8983/solr <http://localhost:8983/solr> -all -crawlId TestCrawlerFailed with exit value 255.* On Mon, Feb 22, 2016 at 1:58 AM, Binoy Dalal <[email protected]> wrote: > CrawlID put any number > Number of rounds >=1 > Seed dir and solr URL is proper > > On Mon, 22 Feb 2016, 12:17 Tom Running <[email protected]> wrote: > > > Binoy, > > > > I do see the information on the console and also lot of information in > > hbase. > > > > I tried ./crawl but not quite sure where to location the following > > information: > > > > Usage: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds> > > > > seedDir ../urls/seed.txt ? > > crawID ? > > solrUrl I am guessing this will be http://localhost:8983/solr/ > > numberOfRounds ? > > > > Could you provide some advice on how to determine the above information. > > > > > > Usage: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds> > > > > Thanks, > > Tom > > > > > > > > On Mon, Feb 22, 2016 at 1:19 AM, Binoy Dalal <[email protected]> > > wrote: > > > > > When you run the inject and generate commands, in the console output do > > you > > > see your site being added? > > > Also while fetching and parsing you should be able to see the number of > > > successful fetches and parse actions in your console. Ideally this > should > > > be equal to or more than the number of sites you've put in the seed.txt > > > file. > > > If this is not the case then there is some issue with either your > > seed.txt > > > file or the regex-urlfilter file. > > > > > > While running the crawl command, you doing need to index to solr > > > separately. The command will do it for you. > > > Run ./crawl to see usage instructions. > > > > > > On Mon, 22 Feb 2016, 11:41 Tom Running <[email protected]> wrote: > > > > > > > Yes, I did ran these before run ./nutch solrindex > > > > http://localhost:8983/solr/ -all and get nothing. > > > > > > > > > > > > From /home/nutch/runtime/local/bin/ > > > > > > > > ./nutch inject ../urls/seed.txt > > > > ./nutch readdb > > > > ./nutch generate -topN 2500 > > > > ./nutch fetch -all > > > > ./nutch parse -all > > > > ./nutch updatedb > > > > > > > > Did not run the crawl command. > > > > > > > > Would I just run ./crawl ?? > > > > then run this again ./nutch solrindex http://localhost:8983/solr/ > -all > > > > > > > > Thank you very much for response to my questions. > > > > > > > > Tom > > > > > > > > > > > > On Sun, Feb 21, 2016 at 11:25 PM, Binoy Dalal < > [email protected]> > > > > wrote: > > > > > > > > > Just to be clear, you did run the preceding nutch commands to > inject, > > > > > generate, fetch and parse the URLs right? > > > > > > > > > > Additionally try with the ./crawl command to directly crawl and > index > > > > > everything to solr without having to manually run all the steps. > > > > > > > > > > On Mon, 22 Feb 2016, 07:24 Tom Running <[email protected]> > > wrote: > > > > > > > > > > > I am trying to get Nutch to run solrindex and having problem. I > am > > > > using > > > > > > the following instruction from > > > > > > this document http://wiki.apache.org/nutch/Nutch2Tutorial. > > > Everything > > > > > > are working except when I ran the following command. > > > > > > > > > > > > > > > > > > *./nutch solrindex http://localhost:8983/solr < > > > > > http://localhost:8983/solr> > > > > > > -all* > > > > > > > > > > > > > > > > > > > > > > > > ****** it came back with the following info ***** > > > > > > ****** It seems to have problem with indexing **** > > > > > > IndexingJob: starting > > > > > > Active IndexWriters : > > > > > > SOLRIndexWriter > > > > > > solr.server.url : URL of the SOLR instance (mandatory) > > > > > > solr.commit.size : buffer size when sending to SOLR > > (default > > > > > 1000) > > > > > > solr.mapping.file : name of the mapping file for fields > > > > (default > > > > > > solrindex-mapping.xml) > > > > > > solr.auth : use authentication* (default false)* > > > > > > solr.auth.username : username for authentication > > > > > > solr.auth.password : password for authentication > > > > > > IndexingJob: done. > > > > > > > > > > > > > > > > > > When I launch the SOLR Web UI interface can not query or find any > > > > things > > > > > > under the default collection1 or the > gettingstarted_shard1_replica1 > > > or > > > > > > gettingstarted_shard2_replica1 > > > > > > > > > > > > > > > > > > I have also tried with this option (with the colletion1) and > still > > > not > > > > > > able to query anything. > > > > > > ./nutch solrindex http://localhost:8983/solr/collection1 -all > > > > > > > > > > > > > > > > > > > > > > > > After download SOLR 4.10.3 and start it as it with command > > > > > > /home/solr/bin/solr start -e cloud -noprompt > > > > > > > > > > > > I did not modify any configuration file not posting any file or > > > > directory > > > > > > from within SOLR. I am assuming this command ./nutch solrindex > > > > > > http://localhost:8983/solr/collection1 will do all the posting > and > > > > index > > > > > > for SOLR. > > > > > > > > > > > > Any ideas what am I missing here. Any advice where to go from > here > > > > > would > > > > > > be greatly appreciate. > > > > > > > > > > > > I Did tried copy /nutch/runtime/local/conf/*.* into SOLR and it > > did > > > > not > > > > > > make any different. > > > > > > > > > > > > Thank you. > > > > > > > > > > > > Tom > > > > > > > > > > > > -- > > > > > Regards, > > > > > Binoy Dalal > > > > > > > > > > > > -- > > > Regards, > > > Binoy Dalal > > > > > > -- > Regards, > Binoy Dalal >

