Binoy, I do see the information on the console and also lot of information in hbase.
I tried ./crawl but not quite sure where to location the following information: Usage: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds> seedDir ../urls/seed.txt ? crawID ? solrUrl I am guessing this will be http://localhost:8983/solr/ numberOfRounds ? Could you provide some advice on how to determine the above information. Usage: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds> Thanks, Tom On Mon, Feb 22, 2016 at 1:19 AM, Binoy Dalal <[email protected]> wrote: > When you run the inject and generate commands, in the console output do you > see your site being added? > Also while fetching and parsing you should be able to see the number of > successful fetches and parse actions in your console. Ideally this should > be equal to or more than the number of sites you've put in the seed.txt > file. > If this is not the case then there is some issue with either your seed.txt > file or the regex-urlfilter file. > > While running the crawl command, you doing need to index to solr > separately. The command will do it for you. > Run ./crawl to see usage instructions. > > On Mon, 22 Feb 2016, 11:41 Tom Running <[email protected]> wrote: > > > Yes, I did ran these before run ./nutch solrindex > > http://localhost:8983/solr/ -all and get nothing. > > > > > > From /home/nutch/runtime/local/bin/ > > > > ./nutch inject ../urls/seed.txt > > ./nutch readdb > > ./nutch generate -topN 2500 > > ./nutch fetch -all > > ./nutch parse -all > > ./nutch updatedb > > > > Did not run the crawl command. > > > > Would I just run ./crawl ?? > > then run this again ./nutch solrindex http://localhost:8983/solr/ -all > > > > Thank you very much for response to my questions. > > > > Tom > > > > > > On Sun, Feb 21, 2016 at 11:25 PM, Binoy Dalal <[email protected]> > > wrote: > > > > > Just to be clear, you did run the preceding nutch commands to inject, > > > generate, fetch and parse the URLs right? > > > > > > Additionally try with the ./crawl command to directly crawl and index > > > everything to solr without having to manually run all the steps. > > > > > > On Mon, 22 Feb 2016, 07:24 Tom Running <[email protected]> wrote: > > > > > > > I am trying to get Nutch to run solrindex and having problem. I am > > using > > > > the following instruction from > > > > this document http://wiki.apache.org/nutch/Nutch2Tutorial. > Everything > > > > are working except when I ran the following command. > > > > > > > > > > > > *./nutch solrindex http://localhost:8983/solr < > > > http://localhost:8983/solr> > > > > -all* > > > > > > > > > > > > > > > > ****** it came back with the following info ***** > > > > ****** It seems to have problem with indexing **** > > > > IndexingJob: starting > > > > Active IndexWriters : > > > > SOLRIndexWriter > > > > solr.server.url : URL of the SOLR instance (mandatory) > > > > solr.commit.size : buffer size when sending to SOLR (default > > > 1000) > > > > solr.mapping.file : name of the mapping file for fields > > (default > > > > solrindex-mapping.xml) > > > > solr.auth : use authentication* (default false)* > > > > solr.auth.username : username for authentication > > > > solr.auth.password : password for authentication > > > > IndexingJob: done. > > > > > > > > > > > > When I launch the SOLR Web UI interface can not query or find any > > things > > > > under the default collection1 or the gettingstarted_shard1_replica1 > or > > > > gettingstarted_shard2_replica1 > > > > > > > > > > > > I have also tried with this option (with the colletion1) and still > not > > > > able to query anything. > > > > ./nutch solrindex http://localhost:8983/solr/collection1 -all > > > > > > > > > > > > > > > > After download SOLR 4.10.3 and start it as it with command > > > > /home/solr/bin/solr start -e cloud -noprompt > > > > > > > > I did not modify any configuration file not posting any file or > > directory > > > > from within SOLR. I am assuming this command ./nutch solrindex > > > > http://localhost:8983/solr/collection1 will do all the posting and > > index > > > > for SOLR. > > > > > > > > Any ideas what am I missing here. Any advice where to go from here > > > would > > > > be greatly appreciate. > > > > > > > > I Did tried copy /nutch/runtime/local/conf/*.* into SOLR and it did > > not > > > > make any different. > > > > > > > > Thank you. > > > > > > > > Tom > > > > > > > > -- > > > Regards, > > > Binoy Dalal > > > > > > -- > Regards, > Binoy Dalal >

