Binoy,

I do see the information on the console and also lot of information in
hbase.

I tried ./crawl  but not quite sure where to location the following
information:

Usage: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds>

seedDir   ../urls/seed.txt  ?
crawID  ?
solrUrl     I am guessing this will be http://localhost:8983/solr/
numberOfRounds  ?

Could you provide some advice on how to determine the above information.


Usage: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds>

Thanks,
Tom



On Mon, Feb 22, 2016 at 1:19 AM, Binoy Dalal <[email protected]> wrote:

> When you run the inject and generate commands, in the console output do you
> see your site being added?
> Also while fetching and parsing you should be able to see the number of
> successful fetches and parse actions in your console. Ideally this should
> be equal to or more than the number of sites you've put in the seed.txt
> file.
> If this is not the case then there is some issue with either your seed.txt
> file or the regex-urlfilter file.
>
> While running the crawl command, you doing need to index to solr
> separately. The command will do it for you.
> Run ./crawl to see usage instructions.
>
> On Mon, 22 Feb 2016, 11:41 Tom Running <[email protected]> wrote:
>
> > Yes, I did ran these before run ./nutch solrindex
> > http://localhost:8983/solr/ -all and get nothing.
> >
> >
> > From /home/nutch/runtime/local/bin/
> >
> > ./nutch inject ../urls/seed.txt
> > ./nutch readdb
> > ./nutch generate -topN 2500
> > ./nutch fetch -all
> > ./nutch parse -all
> > ./nutch updatedb
> >
> > Did not run the crawl command.
> >
> > Would I just run ./crawl ??
> > then run this again ./nutch solrindex http://localhost:8983/solr/ -all
> >
> > Thank you very much for response to my questions.
> >
> > Tom
> >
> >
> > On Sun, Feb 21, 2016 at 11:25 PM, Binoy Dalal <[email protected]>
> > wrote:
> >
> > > Just to be clear, you did run the preceding nutch commands to inject,
> > > generate, fetch and parse the URLs right?
> > >
> > > Additionally try with the ./crawl command to directly crawl and index
> > > everything to solr without having to manually run all the steps.
> > >
> > > On Mon, 22 Feb 2016, 07:24 Tom Running <[email protected]> wrote:
> > >
> > > > I am trying to get Nutch to run solrindex and having problem.  I am
> > using
> > > > the following instruction from
> > > > this document http://wiki.apache.org/nutch/Nutch2Tutorial.
> Everything
> > > > are working except when I ran the following command.
> > > >
> > > >
> > > > *./nutch solrindex http://localhost:8983/solr <
> > > http://localhost:8983/solr>
> > > > -all*
> > > >
> > > >
> > > >
> > > > ****** it came back with the following info  *****
> > > > ****** It seems to have problem with indexing ****
> > > > IndexingJob: starting
> > > > Active IndexWriters :
> > > > SOLRIndexWriter
> > > >         solr.server.url : URL of the SOLR instance (mandatory)
> > > >         solr.commit.size : buffer size when sending to SOLR (default
> > > 1000)
> > > >         solr.mapping.file : name of the mapping file for fields
> > (default
> > > > solrindex-mapping.xml)
> > > >         solr.auth : use authentication* (default false)*
> > > >         solr.auth.username : username for authentication
> > > >         solr.auth.password : password for authentication
> > > > IndexingJob: done.
> > > >
> > > >
> > > > When I launch the SOLR Web UI interface can not query or find any
> > things
> > > > under the default collection1 or the gettingstarted_shard1_replica1
> or
> > > > gettingstarted_shard2_replica1
> > > >
> > > >
> > > > I have also tried with this option (with the colletion1) and still
> not
> > > > able to query anything.
> > > > ./nutch solrindex http://localhost:8983/solr/collection1 -all
> > > >
> > > >
> > > >
> > > > After download SOLR 4.10.3 and start it as it with command
> > > > /home/solr/bin/solr start -e cloud -noprompt
> > > >
> > > > I did not modify any configuration file not posting any file or
> > directory
> > > > from within SOLR. I am assuming this command ./nutch solrindex
> > > > http://localhost:8983/solr/collection1 will do all the posting and
> > index
> > > > for SOLR.
> > > >
> > > > Any ideas what am I missing here.  Any advice where to go from here
> > > would
> > > > be greatly appreciate.
> > > >
> > > > I Did tried copy /nutch/runtime/local/conf/*.*   into SOLR and it did
> > not
> > > > make any different.
> > > >
> > > > Thank you.
> > > >
> > > > Tom
> > > >
> > > > --
> > > Regards,
> > > Binoy Dalal
> > >
> >
> --
> Regards,
> Binoy Dalal
>

Reply via email to