Re: Nutch 2.3.1 doesn't work with Solr 4.10.3 and Hbase

Binoy Dalal Mon, 22 Feb 2016 18:53:29 -0800

What errors do you see in hadoop.log and solr's solr.log?
Post that stack trace.


On Tue, 23 Feb 2016, 07:29 Tom Running <[email protected]> wrote:

> Got errors:  when run this command:
>  ./crawl ../urls/ TestCrawler http://localhost:8983/solr 1
> Have any idea where to go from here?
>
> thank you.
> Tom
>
>
>
> ParserJob: finished at 2016-02-22 20:22:47, time elapsed: 00:00:11
> CrawlDB update for TestCrawler
> /home/nutch/runtime/local/bin/nutch updatedb -D mapred.reduce.tasks=2 -D
> mapred.child.java.opts=-Xmx1000m -D
> mapred.reduce.tasks.speculative.execution=false -D
> mapred.map.tasks.speculative.execution=false -D
> mapred.compress.map.output=true 1456190521-30847 -crawlId TestCrawler
> DbUpdaterJob: starting at 2016-02-22 20:22:49
> DbUpdaterJob: batchId: 1456190521-30847
> DbUpdaterJob: finished at 2016-02-22 20:22:58, time elapsed: 00:00:09
> Indexing TestCrawler on SOLR index -> http://localhost:8983/solr
> /home/nutch/runtime/local/bin/nutch index -D mapred.reduce.tasks=2 -D
> mapred.child.java.opts=-Xmx1000m -D
> mapred.reduce.tasks.speculative.execution=false -D
> mapred.map.tasks.speculative.execution=false -D
> mapred.compress.map.output=true -D solr.server.url=
> http://localhost:8983/solr -all -crawlId TestCrawler
> IndexingJob: starting
> SolrIndexerJob: java.lang.RuntimeException: job failed:
> name=[TestCrawler]Indexer, jobid=job_local1592190856_0001
>         at
> org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:119)
>         at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:154)
>         at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:176)
>         at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:202)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:211)
>
>
>
>
> *Error running:  /home/nutch/runtime/local/bin/nutch index -D
> mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D
> mapred.reduce.tasks.speculative.execution=false -D
> mapred.map.tasks.speculative.execution=false -D
> mapred.compress.map.output=true -D
> solr.server.url=http://localhost:8983/solr <http://localhost:8983/solr>
> -all -crawlId TestCrawlerFailed with exit value 255.*
>
> On Mon, Feb 22, 2016 at 1:58 AM, Binoy Dalal <[email protected]>
> wrote:
>
> > CrawlID put any number
> > Number of rounds >=1
> > Seed dir and solr URL is proper
> >
> > On Mon, 22 Feb 2016, 12:17 Tom Running <[email protected]> wrote:
> >
> > > Binoy,
> > >
> > > I do see the information on the console and also lot of information in
> > > hbase.
> > >
> > > I tried ./crawl  but not quite sure where to location the following
> > > information:
> > >
> > > Usage: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds>
> > >
> > > seedDir   ../urls/seed.txt  ?
> > > crawID  ?
> > > solrUrl     I am guessing this will be http://localhost:8983/solr/
> > > numberOfRounds  ?
> > >
> > > Could you provide some advice on how to determine the above
> information.
> > >
> > >
> > > Usage: crawl <seedDir> <crawlID> [<solrUrl>] <numberOfRounds>
> > >
> > > Thanks,
> > > Tom
> > >
> > >
> > >
> > > On Mon, Feb 22, 2016 at 1:19 AM, Binoy Dalal <[email protected]>
> > > wrote:
> > >
> > > > When you run the inject and generate commands, in the console output
> do
> > > you
> > > > see your site being added?
> > > > Also while fetching and parsing you should be able to see the number
> of
> > > > successful fetches and parse actions in your console. Ideally this
> > should
> > > > be equal to or more than the number of sites you've put in the
> seed.txt
> > > > file.
> > > > If this is not the case then there is some issue with either your
> > > seed.txt
> > > > file or the regex-urlfilter file.
> > > >
> > > > While running the crawl command, you doing need to index to solr
> > > > separately. The command will do it for you.
> > > > Run ./crawl to see usage instructions.
> > > >
> > > > On Mon, 22 Feb 2016, 11:41 Tom Running <[email protected]>
> wrote:
> > > >
> > > > > Yes, I did ran these before run ./nutch solrindex
> > > > > http://localhost:8983/solr/ -all and get nothing.
> > > > >
> > > > >
> > > > > From /home/nutch/runtime/local/bin/
> > > > >
> > > > > ./nutch inject ../urls/seed.txt
> > > > > ./nutch readdb
> > > > > ./nutch generate -topN 2500
> > > > > ./nutch fetch -all
> > > > > ./nutch parse -all
> > > > > ./nutch updatedb
> > > > >
> > > > > Did not run the crawl command.
> > > > >
> > > > > Would I just run ./crawl ??
> > > > > then run this again ./nutch solrindex http://localhost:8983/solr/
> > -all
> > > > >
> > > > > Thank you very much for response to my questions.
> > > > >
> > > > > Tom
> > > > >
> > > > >
> > > > > On Sun, Feb 21, 2016 at 11:25 PM, Binoy Dalal <
> > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Just to be clear, you did run the preceding nutch commands to
> > inject,
> > > > > > generate, fetch and parse the URLs right?
> > > > > >
> > > > > > Additionally try with the ./crawl command to directly crawl and
> > index
> > > > > > everything to solr without having to manually run all the steps.
> > > > > >
> > > > > > On Mon, 22 Feb 2016, 07:24 Tom Running <[email protected]>
> > > wrote:
> > > > > >
> > > > > > > I am trying to get Nutch to run solrindex and having problem.
> I
> > am
> > > > > using
> > > > > > > the following instruction from
> > > > > > > this document http://wiki.apache.org/nutch/Nutch2Tutorial.
> > > > Everything
> > > > > > > are working except when I ran the following command.
> > > > > > >
> > > > > > >
> > > > > > > *./nutch solrindex http://localhost:8983/solr <
> > > > > > http://localhost:8983/solr>
> > > > > > > -all*
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > ****** it came back with the following info  *****
> > > > > > > ****** It seems to have problem with indexing ****
> > > > > > > IndexingJob: starting
> > > > > > > Active IndexWriters :
> > > > > > > SOLRIndexWriter
> > > > > > >         solr.server.url : URL of the SOLR instance (mandatory)
> > > > > > >         solr.commit.size : buffer size when sending to SOLR
> > > (default
> > > > > > 1000)
> > > > > > >         solr.mapping.file : name of the mapping file for fields
> > > > > (default
> > > > > > > solrindex-mapping.xml)
> > > > > > >         solr.auth : use authentication* (default false)*
> > > > > > >         solr.auth.username : username for authentication
> > > > > > >         solr.auth.password : password for authentication
> > > > > > > IndexingJob: done.
> > > > > > >
> > > > > > >
> > > > > > > When I launch the SOLR Web UI interface can not query or find
> any
> > > > > things
> > > > > > > under the default collection1 or the
> > gettingstarted_shard1_replica1
> > > > or
> > > > > > > gettingstarted_shard2_replica1
> > > > > > >
> > > > > > >
> > > > > > > I have also tried with this option (with the colletion1) and
> > still
> > > > not
> > > > > > > able to query anything.
> > > > > > > ./nutch solrindex http://localhost:8983/solr/collection1 -all
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > After download SOLR 4.10.3 and start it as it with command
> > > > > > > /home/solr/bin/solr start -e cloud -noprompt
> > > > > > >
> > > > > > > I did not modify any configuration file not posting any file or
> > > > > directory
> > > > > > > from within SOLR. I am assuming this command ./nutch solrindex
> > > > > > > http://localhost:8983/solr/collection1 will do all the posting
> > and
> > > > > index
> > > > > > > for SOLR.
> > > > > > >
> > > > > > > Any ideas what am I missing here.  Any advice where to go from
> > here
> > > > > > would
> > > > > > > be greatly appreciate.
> > > > > > >
> > > > > > > I Did tried copy /nutch/runtime/local/conf/*.*   into SOLR and
> it
> > > did
> > > > > not
> > > > > > > make any different.
> > > > > > >
> > > > > > > Thank you.
> > > > > > >
> > > > > > > Tom
> > > > > > >
> > > > > > > --
> > > > > > Regards,
> > > > > > Binoy Dalal
> > > > > >
> > > > >
> > > > --
> > > > Regards,
> > > > Binoy Dalal
> > > >
> > >
> > --
> > Regards,
> > Binoy Dalal
> >
>
-- 
Regards,
Binoy Dalal

Re: Nutch 2.3.1 doesn't work with Solr 4.10.3 and Hbase

Reply via email to