Hi,
I rebuild all the project as NutchTutorial, it works now!!!
Thank you for your kindly support, thought I don't know why it work...I'll
still check this and hope we can have further discussion about Nutch
development.
Br,
Mick
--
View this message in context:
logging explicitly states that no solrUrl is set.
On Sunday, April 21, 2013, kiran chitturi chitturikira...@gmail.com wrote:
Hi Mick,
Since this is an error with Indexing, Can you check the logs from Solr
side
?
On Sun, Apr 21, 2013 at 4:15 AM, micklai lailixi...@gmail.com wrote:
HI
*java.io.IOException: Job failed!*
SolrDeleteDuplicates: starting at 2013-04-21 02:23:39
SolrDeleteDuplicates: Solr url: http://localhost:8080/solr/
*Exception in thread main java.io.IOException: Job failed!*
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265
: URL normalizing: false*
Indexing 4 documents
*java.io.IOException: Job failed!*
SolrDeleteDuplicates: starting at 2013-04-21 02:23:39
SolrDeleteDuplicates: Solr url: http://localhost:8080/solr/
*Exception in thread main java.io.IOException: Job failed
!*
SolrDeleteDuplicates: starting at 2013-04-21 02:23:39
SolrDeleteDuplicates: Solr url: http://localhost:8080/solr/
*Exception in thread main java.io.IOException: Job failed!*
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265
Hi, I thinkI managed to address this issue.
What i did was to also add
+^http://([a-z0-9]*\.)*apache.org/
in the regex-urlfilter.txt in $NUTCH_HOME/conf.
I guess both files regex-urlfilter.txt AND nutch-site.xml need to be
concurrently updated in both locations, i.e.
$NUTCH_HOME/conf
Hi.
I am having the same problem (newbie to nutch too)
Using nutch 1.4 on Windows 7 with Cygwin
If I understand correctly, the crawling process should create segments and
each one of those segments corresponds to a folder under
NUTCH_HOME/runtime/local/crawl/segment_number.
Then under each
-angers.fr/
Parsing: http://www.face-ecran.fr/
Exception in thread main java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:157)
at org.apache.nutch.crawl.Crawl.run
Hi Markus
Thx for help.
(Hope i'm not boring everybody)
I've erase everything in crawl/
Launching my nutch, got now
-
CrawlDb update: 404 purging: false
CrawlDb update: Merging segment data into db.
Exception in thread main java.io.IOException: Job failed
CrawlDb update: Merging segment data into db.
Exception in thread main java.io.IOException: Job failed!
at org.apache.hadoop.mapred.**JobClient.runJob(JobClient.**java:1252)
at org.apache.nutch.crawl.**CrawlDb.update(CrawlDb.java:**105)
at org.apache.nutch.crawl.**CrawlDb.update(CrawlDb.java
-angers.fr/
Parsing: http://www.face-ecran.fr/
Exception in thread main java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:157)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:138
11 matches
Mail list logo