CrawlID put any number
Number of rounds >=1
Seed dir and solr URL is proper

On Mon, 22 Feb 2016, 12:17 Tom Running <[email protected]> wrote:

> Binoy,
>
> I do see the information on the console and also lot of information in
> hbase.
>
> I tried ./crawl  but not quite sure where to location the following
> information:
>
> Usage: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds>
>
> seedDir   ../urls/seed.txt  ?
> crawID  ?
> solrUrl     I am guessing this will be http://localhost:8983/solr/
> numberOfRounds  ?
>
> Could you provide some advice on how to determine the above information.
>
>
> Usage: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds>
>
> Thanks,
> Tom
>
>
>
> On Mon, Feb 22, 2016 at 1:19 AM, Binoy Dalal <[email protected]>
> wrote:
>
> > When you run the inject and generate commands, in the console output do
> you
> > see your site being added?
> > Also while fetching and parsing you should be able to see the number of
> > successful fetches and parse actions in your console. Ideally this should
> > be equal to or more than the number of sites you've put in the seed.txt
> > file.
> > If this is not the case then there is some issue with either your
> seed.txt
> > file or the regex-urlfilter file.
> >
> > While running the crawl command, you doing need to index to solr
> > separately. The command will do it for you.
> > Run ./crawl to see usage instructions.
> >
> > On Mon, 22 Feb 2016, 11:41 Tom Running <[email protected]> wrote:
> >
> > > Yes, I did ran these before run ./nutch solrindex
> > > http://localhost:8983/solr/ -all and get nothing.
> > >
> > >
> > > From /home/nutch/runtime/local/bin/
> > >
> > > ./nutch inject ../urls/seed.txt
> > > ./nutch readdb
> > > ./nutch generate -topN 2500
> > > ./nutch fetch -all
> > > ./nutch parse -all
> > > ./nutch updatedb
> > >
> > > Did not run the crawl command.
> > >
> > > Would I just run ./crawl ??
> > > then run this again ./nutch solrindex http://localhost:8983/solr/ -all
> > >
> > > Thank you very much for response to my questions.
> > >
> > > Tom
> > >
> > >
> > > On Sun, Feb 21, 2016 at 11:25 PM, Binoy Dalal <[email protected]>
> > > wrote:
> > >
> > > > Just to be clear, you did run the preceding nutch commands to inject,
> > > > generate, fetch and parse the URLs right?
> > > >
> > > > Additionally try with the ./crawl command to directly crawl and index
> > > > everything to solr without having to manually run all the steps.
> > > >
> > > > On Mon, 22 Feb 2016, 07:24 Tom Running <[email protected]>
> wrote:
> > > >
> > > > > I am trying to get Nutch to run solrindex and having problem.  I am
> > > using
> > > > > the following instruction from
> > > > > this document http://wiki.apache.org/nutch/Nutch2Tutorial.
> > Everything
> > > > > are working except when I ran the following command.
> > > > >
> > > > >
> > > > > *./nutch solrindex http://localhost:8983/solr <
> > > > http://localhost:8983/solr>
> > > > > -all*
> > > > >
> > > > >
> > > > >
> > > > > ****** it came back with the following info  *****
> > > > > ****** It seems to have problem with indexing ****
> > > > > IndexingJob: starting
> > > > > Active IndexWriters :
> > > > > SOLRIndexWriter
> > > > >         solr.server.url : URL of the SOLR instance (mandatory)
> > > > >         solr.commit.size : buffer size when sending to SOLR
> (default
> > > > 1000)
> > > > >         solr.mapping.file : name of the mapping file for fields
> > > (default
> > > > > solrindex-mapping.xml)
> > > > >         solr.auth : use authentication* (default false)*
> > > > >         solr.auth.username : username for authentication
> > > > >         solr.auth.password : password for authentication
> > > > > IndexingJob: done.
> > > > >
> > > > >
> > > > > When I launch the SOLR Web UI interface can not query or find any
> > > things
> > > > > under the default collection1 or the gettingstarted_shard1_replica1
> > or
> > > > > gettingstarted_shard2_replica1
> > > > >
> > > > >
> > > > > I have also tried with this option (with the colletion1) and still
> > not
> > > > > able to query anything.
> > > > > ./nutch solrindex http://localhost:8983/solr/collection1 -all
> > > > >
> > > > >
> > > > >
> > > > > After download SOLR 4.10.3 and start it as it with command
> > > > > /home/solr/bin/solr start -e cloud -noprompt
> > > > >
> > > > > I did not modify any configuration file not posting any file or
> > > directory
> > > > > from within SOLR. I am assuming this command ./nutch solrindex
> > > > > http://localhost:8983/solr/collection1 will do all the posting and
> > > index
> > > > > for SOLR.
> > > > >
> > > > > Any ideas what am I missing here.  Any advice where to go from here
> > > > would
> > > > > be greatly appreciate.
> > > > >
> > > > > I Did tried copy /nutch/runtime/local/conf/*.*   into SOLR and it
> did
> > > not
> > > > > make any different.
> > > > >
> > > > > Thank you.
> > > > >
> > > > > Tom
> > > > >
> > > > > --
> > > > Regards,
> > > > Binoy Dalal
> > > >
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
>
-- 
Regards,
Binoy Dalal

Reply via email to