If you follow the tutorial then the command should be:
$ bin/nutch generate crawl/crawldb crawldb/segments

On Wed, 9 May 2012 17:05:51 +0100, Lewis John Mcgibbney <[email protected]> wrote:
Which segments are you trying to generate from? Do you maybe need to
include them individually? or use a wildcard?

bin/nutch generate crawldb crawldb/segments/*
bin/nutch generate crawldb crawldb/segments/segmentNo

?

On Wed, May 9, 2012 at 3:33 PM, Stephan Kristyn  wrote:

 Ok now at the heading "Step-by-Step: Fetching" I get

 -bash-4.1$ bin/nutch generate crawldb crawldb/segments
 Generator: starting at 2012-05-09 14:32:44
 Generator: Selecting best-scoring urls due for fetch.
 Generator: filtering: true
 Generator: normalizing: true
 Generator: jobtracker is 'local', generating exactly one partition.
 Generator: org.apache.hadoop.mapred.InvalidInputException: Input
path does not exist:

file:/home/kristyns/apache-nutch-1.4-bin/runtime/local/crawldb/current
         at

org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
         at

org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
         at

org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
         at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
         at

org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
         at
org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
         at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
         at
org.apache.nutch.crawl.Generator.generate(Generator.java:538)
         at
org.apache.nutch.crawl.Generator.run(Generator.java:704)
         at
org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
         at
org.apache.nutch.crawl.Generator.main(Generator.java:660)

 Strange...

 Am 09.05.2012 16:04, schrieb Stephan Kristyn:  Hi, it seems like I
forgot to fetch the crawled URLs, as mentioned in the tutorial:

http://wiki.apache.org/nutch/NutchTutorial [2]

 I'll let you know if and how that worked out for me.

 Am 09.05.2012 14:28, schrieb Stephan Kristyn:

This is the query that the SOLR interface generates when I enter
"test" and hit the serach button:

http://myDomain:8983/solr/select/?q=test&version=2.2&start=0&rows=10&indent=on
[3]

Maybe this is a question better suited for the Solr ML?

From: Lewis John Mcgibbney [mailto:[email protected] [4]]
Sent: Mittwoch, 9. Mai 2012 13:34
To: [email protected] [5]
Subject: Re: HTTP ERROR 400

are you attempting to index to Solr or is this simply when you start
you solr server?
On Wed, May 9, 2012 at 12:21 PM, Stephan Kristyn  wrote:
I copied over the schema and everything else in conf from nutch.

$cp apache-nutch-1.4-bin/runtime/local/conf/*
apache-solr-3.6.0/example/solr/conf/

Am 09.05.2012 12:32, schrieb Lewis John Mcgibbney:

Which schema are you using with your SOlr server?

On Wed, May 9, 2012 at 11:17 AM, Stephan Kristyn  [8] [9] wrote:

Also.. entering

java -jar post.jar *.xml on RHEL6 I get a

INFO: [] webapp=/solr path=/update params={} status=400 QTime=42

SimplePostTool: FATAL: Solr returned an error #400 ERROR:

[doc=GB18030TEST] unknown field 'name'

Thanks,

Stephan

Am 09.05.2012 12:11, schrieb Stephan Kristyn:

Hi,

after installing Nutch and Solr I get a

 HTTP ERROR 400

Problem accessing /solr/select/. Reason:

 undefined field text


------------------------------------------------------------------------

/Powered by Jetty://

/Any ideas how to fix this?

Thanks,

Stephan

--

stephan
kristyn
partner operations manager

"The Internet? Is that thing still around?" - Homer Simpson

[email protected] [10] [11]
direct +49 (0)89 231 97 207 [12] mobile +49 (0) 162 28899 02 [13]

yahoo! deutschland gmbh theresienhoehe 12, munich, 80339, germany
phone (408) 349 3300 [14] fax (408) 349 3301 [15]

[cid:[email protected]]

--
Lewis

--

 

STEPHAN
 KRISTYN
 partner operations manager
  
 "The Internet? Is that thing still around?" - Homer Simpson
  
 [email protected] [16]
 direct +49 (0)89 231 97 207 [17]    mobile +49 (0) 162 28899 02
[18]
  
 yahoo! deutschland gmbh theresienhoehe 12, munich, 80339, germany
 phone (408) 349 3300 [19]    fax (408) 349 3301 [20]
  

  

 

--

 

STEPHAN
 KRISTYN
 partner operations manager
  
 "The Internet? Is that thing still around?" - Homer Simpson
  
 [email protected] [21]
 direct +49 (0)89 231 97 207 [22]    mobile +49 (0) 162 28899 02
[23]
  
 yahoo! deutschland gmbh theresienhoehe 12, munich, 80339, germany
 phone (408) 349 3300 [24]    fax (408) 349 3301 [25]
  

  

 

--
_Lewis_



Links:
------
[1] mailto:[email protected]
[2] http://wiki.apache.org/nutch/NutchTutorial
[3]

http://myDomain:8983/solr/select/?q=test&version=2.2&start=0&rows=10&indent=on
[4] mailto:[email protected]
[5] mailto:[email protected]
[6] mailto:[email protected]
[7] mailto:[email protected]
[8] mailto:[email protected]
[9] mailto:[email protected]
[10] mailto:[email protected]
[11] mailto:[email protected]
[12]
http://webmail.openindex.io/tel:%2B49%20%280%2989%20231%2097%20207
[13]
http://webmail.openindex.io/tel:%2B49%20%280%29%20162%2028899%2002
[14] http://webmail.openindex.io/tel:%28408%29%20349%203300
[15] http://webmail.openindex.io/tel:%28408%29%20349%203301
[16] mailto:[email protected]
[17]
http://webmail.openindex.io/tel:%2B49%20%280%2989%20231%2097%20207
[18]
http://webmail.openindex.io/tel:%2B49%20%280%29%20162%2028899%2002
[19] http://webmail.openindex.io/tel:%28408%29%20349%203300
[20] http://webmail.openindex.io/tel:%28408%29%20349%203301
[21] mailto:[email protected]
[22]
http://webmail.openindex.io/tel:%2B49%20%280%2989%20231%2097%20207
[23]
http://webmail.openindex.io/tel:%2B49%20%280%29%20162%2028899%2002
[24] http://webmail.openindex.io/tel:%28408%29%20349%203300
[25] http://webmail.openindex.io/tel:%28408%29%20349%203301

--
Markus Jelsma - CTO - Openindex

Reply via email to