From the changelog:
http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?view=markup

111     * NUTCH-1054 LinkDB optional during indexing (jnioche) 

With your command, the given linkdb is interpreted as a segment. 

https://issues.apache.org/jira/browse/NUTCH-1054

This is the new command:

Usage: SolrIndexer <solr url> <crawldb> [-linkdb <linkdb>] (<segment> ... | -
dir <segments>) [-noCommit

On Tuesday 25 October 2011 18:41:09 Bai Shen wrote:
> I'm having a similar issue.  I'm using 1.4 and getting these errors with
> linkdb.  The segments seem fine.
> 
> 2011-10-25 10:10:20,060 INFO  solr.SolrIndexer - SolrIndexer: starting at
> 2011-10-25 10:10:20
> 2011-10-25 10:10:20,110 INFO  indexer.IndexerMapReduce - IndexerMapReduce:
> crawldb: crawl/crawldb
> 2011-10-25 10:10:20,110 INFO  indexer.IndexerMapReduce - IndexerMapReduces:
> adding segment: crawl/linkdb
> 2011-10-25 10:10:20,136 INFO  indexer.IndexerMapReduce - IndexerMapReduces:
> adding segment: crawl/segments/20111025095216
> 2011-10-25 10:10:20,138 INFO  indexer.IndexerMapReduce - IndexerMapReduces:
> adding segment: crawl/segments/20111025100004
> 2011-10-25 10:10:20,207 ERROR solr.SolrIndexer -
> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
> file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch
> Input path does not exist:
> file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse
> Input path does not exist:
> file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data
> Input path does not exist:
> file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text
> 
> 
> Did something change with 1.4?
> 
> On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney <
> 
> [email protected]> wrote:
> > Hi Fred,
> > 
> > How many individual directories do you have under
> > /runtime/local/crawl/segments/
> > ?
> > 
> > Another thing that raises alarms is the nohup.out dir's! Are these
> > intentional? Interestingly, missing segment data is not the same with
> > these dir's.
> > 
> > Does your log output indicate any discrepancies between various command
> > transitions?
> > 
> > 
> > 
> > bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ bin/nutch
> > 
> > >> solrindex
> > >> http://zimzazsearch3-1.bitnamiapp.com:8983/solr/crawl/crawldb
> > >> crawl/linkdb crawl/segments/*
> > >> SolrIndexer: starting at 2011-10-09 00:13:24
> > >> org.apache.hadoop.mapred.InvalidInputException: Input path does not
> > 
> > exist:
> > 
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > 922143907/crawl_fetch
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > 922143907/crawl_parse
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > 922143907/parse_data
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > 922143907/parse_text
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > 922144329/crawl_fetch
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > 922144329/crawl_parse
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > 922144329/parse_data
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110
> > 922144329/parse_text
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111
> > 008015309/crawl_parse
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111
> > 008015309/parse_data
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111
> > 008015309/parse_text
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup
> > .out/crawl_fetch
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup
> > .out/crawl_parse
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup
> > .out/parse_data
> > 
> > >> Input path does not exist:
> > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup
> > .out/parse_text
> > 
> > > -----------------------------------------------------
> > > Subscribe to the Nimble Books Mailing List  http://eepurl.com/czS- for
> > > monthly updates
> > > 
> > > 
> > > 
> > > On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney <
> > > 
> > > [email protected]> wrote:
> > >> Hi guys,
> > >> 
> > >> I have been watching this thread intently and I am very happy to see
> > 
> > that
> > 
> > >> there is some progress :0)
> > >> 
> > >> Radim,
> > >> 
> > >> Can I ask that you open a JIRA issue and submit a patch, this way we
> > >> can not
> > >> only track it, but it will also give the community a chance to test
> > >> and validate the patch prior to integration into the source.
> > >> 
> > >> Thanks
> > >> 
> > >> Lewis
> > >> 
> > >> On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh <
> > >> 
> > >> [email protected]> wrote:
> > >> > Hi Radim,
> > >> > 
> > >> >  Thank you so much for this. I am not familiar with commit process
> > >> >  to
> > >> 
> > >> the
> > >> 
> > >> > core.
> > >> > 
> > >> >  Is there someone who can help us get this committed and help
> > >> >  resolve
> > >> 
> > >> this
> > >> 
> > >> > issue?
> > >> > 
> > >> > Thanks for all your help.
> > >> > 
> > >> > Rajesh Ramana
> > >> > 
> > >> > -----Original Message-----
> > >> > From: Radim Kolar [mailto:[email protected]]
> > >> > Sent: Thursday, October 06, 2011 2:18 PM
> > >> > To: [email protected]
> > >> > Subject: Re: Nutch not crawling URLs with spanish accented
> > >> > characters
> > 
> > (
> > 
> > >> ñ)
> > >> 
> > >> > - The REGEX normalizer transforms the special characters, but fails
> > >> > to substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’
> > >> > 
> > >> >  - The fetcher is having trouble interpreting the links with special
> > >> > 
> > >> > character ‘ñ’.
> > >> > 
> > >> > i can add this transformation to basic-url normalizer if somebody is
> > >> > willing to commit it.
> > >> 
> > >> --
> > >> *Lewis*
> > 
> > --
> > *Lewis*

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to