Besises, the -linkdb param is 1.4 not 1.3
that's what's wrong here. Bai explicitely mentioned 1.4

> Hi Fred,
> 
> Please ensure that the linkdb command was executed succesfully. The output
> logs do not indicate this.
> Looks like you've got a '-' minus character in from of the relative linkdb
> directory as well.
> 
> HTH
> 
> On Wed, Oct 26, 2011 at 1:27 AM, Fred Zimmerman <[email protected]>wrote:
> > I'm still having trouble with this in 1.3. looks as if there's something
> > dumb with syntax or file structure but can't get it.
> > 
> > $ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb
> > -linkdb crawl/linkdb crawl/segments/*
> > 
> > SolrIndexer: starting at 2011-10-25 23:26:02
> > org.apache.hadoop.mapred.InvalidInputException: Input path does not
> > exist:
> > file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_fetch
> > Input path does not exist:
> > file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_parse
> > Input path does not exist:
> > file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_data
> > Input path does not exist:
> > file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_text
> > Input path does not exist:
> > file:/home/bitnami/nutch-1.3/runtime/local/-linkdb/current
> > 
> > 
> > On Tue, Oct 25, 2011 at 12:49 PM, Markus Jelsma
> > 
> > <[email protected]>wrote:
> > > From the changelog:
> > > http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?view=markup
> > > 
> > > 111     * NUTCH-1054 LinkDB optional during indexing (jnioche)
> > > 
> > > With your command, the given linkdb is interpreted as a segment.
> > > 
> > > https://issues.apache.org/jira/browse/NUTCH-1054
> > > 
> > > This is the new command:
> > > 
> > > Usage: SolrIndexer <solr url> <crawldb> [-linkdb <linkdb>] (<segment>
> > > ...
> > > 
> > > -
> > > dir <segments>) [-noCommit
> > > 
> > > On Tuesday 25 October 2011 18:41:09 Bai Shen wrote:
> > > > I'm having a similar issue.  I'm using 1.4 and getting these errors
> > 
> > with
> > 
> > > > linkdb.  The segments seem fine.
> > > > 
> > > > 2011-10-25 10:10:20,060 INFO  solr.SolrIndexer - SolrIndexer:
> > > > starting
> > 
> > at
> > 
> > > > 2011-10-25 10:10:20
> > > > 2011-10-25 10:10:20,110 INFO  indexer.IndexerMapReduce -
> > > 
> > > IndexerMapReduce:
> > > > crawldb: crawl/crawldb
> > > > 2011-10-25 10:10:20,110 INFO  indexer.IndexerMapReduce -
> > > 
> > > IndexerMapReduces:
> > > > adding segment: crawl/linkdb
> > > > 2011-10-25 10:10:20,136 INFO  indexer.IndexerMapReduce -
> > > 
> > > IndexerMapReduces:
> > > > adding segment: crawl/segments/20111025095216
> > > > 2011-10-25 10:10:20,138 INFO  indexer.IndexerMapReduce -
> > > 
> > > IndexerMapReduces:
> > > > adding segment: crawl/segments/20111025100004
> > > > 2011-10-25 10:10:20,207 ERROR solr.SolrIndexer -
> > > > org.apache.hadoop.mapred.InvalidInputException: Input path does not
> > > 
> > > exist:
> > > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch
> > > > Input path does not exist:
> > > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse
> > > > Input path does not exist:
> > > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data
> > > > Input path does not exist:
> > > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text
> > > > 
> > > > 
> > > > Did something change with 1.4?
> > > > 
> > > > On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney <
> > > > 
> > > > [email protected]> wrote:
> > > > > Hi Fred,
> > > > > 
> > > > > How many individual directories do you have under
> > > > > /runtime/local/crawl/segments/
> > > > > ?
> > > > > 
> > > > > Another thing that raises alarms is the nohup.out dir's! Are these
> > > > > intentional? Interestingly, missing segment data is not the same
> > > > > with these dir's.
> > > > > 
> > > > > Does your log output indicate any discrepancies between various
> > 
> > command
> > 
> > > > > transitions?
> > > > > 
> > > > > 
> > > > > 
> > > > > bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$
> > > 
> > > bin/nutch
> > > 
> > > > > >> solrindex
> > > > > >> http://zimzazsearch3-1.bitnamiapp.com:8983/solr/crawl/crawldb
> > > > > >> crawl/linkdb crawl/segments/*
> > > > > >> SolrIndexer: starting at 2011-10-09 00:13:24
> > > > > >> org.apache.hadoop.mapred.InvalidInputException: Input path does
> > 
> > not
> > 
> > > > > exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/201
> > > 10
> > > 
> > > > > 922143907/crawl_fetch
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/201
> > > 10
> > > 
> > > > > 922143907/crawl_parse
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/201
> > > 10
> > > 
> > > > > 922143907/parse_data
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/201
> > > 10
> > > 
> > > > > 922143907/parse_text
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/201
> > > 10
> > > 
> > > > > 922144329/crawl_fetch
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/201
> > > 10
> > > 
> > > > > 922144329/crawl_parse
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/201
> > > 10
> > > 
> > > > > 922144329/parse_data
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/201
> > > 10
> > > 
> > > > > 922144329/parse_text
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/201
> > > 11
> > > 
> > > > > 008015309/crawl_parse
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/201
> > > 11
> > > 
> > > > > 008015309/parse_data
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/201
> > > 11
> > > 
> > > > > 008015309/parse_text
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/noh
> > > up
> > > 
> > > > > .out/crawl_fetch
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/noh
> > > up
> > > 
> > > > > .out/crawl_parse
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/noh
> > > up
> > > 
> > > > > .out/parse_data
> > > > > 
> > > > > >> Input path does not exist:
> > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/noh
> > > up
> > > 
> > > > > .out/parse_text
> > > > > 
> > > > > > -----------------------------------------------------
> > > > > > Subscribe to the Nimble Books Mailing List
> > 
> > http://eepurl.com/czS-for
> > 
> > > > > > monthly updates
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney <
> > > > > > 
> > > > > > [email protected]> wrote:
> > > > > >> Hi guys,
> > > > > >> 
> > > > > >> I have been watching this thread intently and I am very happy to
> > 
> > see
> > 
> > > > > that
> > > > > 
> > > > > >> there is some progress :0)
> > > > > >> 
> > > > > >> Radim,
> > > > > >> 
> > > > > >> Can I ask that you open a JIRA issue and submit a patch, this
> > > > > >> way
> > 
> > we
> > 
> > > > > >> can not
> > > > > >> only track it, but it will also give the community a chance to
> > 
> > test
> > 
> > > > > >> and validate the patch prior to integration into the source.
> > > > > >> 
> > > > > >> Thanks
> > > > > >> 
> > > > > >> Lewis
> > > > > >> 
> > > > > >> On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh <
> > > > > >> 
> > > > > >> [email protected]> wrote:
> > > > > >> > Hi Radim,
> > > > > >> > 
> > > > > >> >  Thank you so much for this. I am not familiar with commit
> > 
> > process
> > 
> > > > > >> >  to
> > > > > >> 
> > > > > >> the
> > > > > >> 
> > > > > >> > core.
> > > > > >> > 
> > > > > >> >  Is there someone who can help us get this committed and help
> > > > > >> >  resolve
> > > > > >> 
> > > > > >> this
> > > > > >> 
> > > > > >> > issue?
> > > > > >> > 
> > > > > >> > Thanks for all your help.
> > > > > >> > 
> > > > > >> > Rajesh Ramana
> > > > > >> > 
> > > > > >> > -----Original Message-----
> > > > > >> > From: Radim Kolar [mailto:[email protected]]
> > > > > >> > Sent: Thursday, October 06, 2011 2:18 PM
> > > > > >> > To: [email protected]
> > > > > >> > Subject: Re: Nutch not crawling URLs with spanish accented
> > > > > >> > characters
> > > > > 
> > > > > (
> > > > > 
> > > > > >> ñ)
> > > > > >> 
> > > > > >> > - The REGEX normalizer transforms the special characters, but
> > > 
> > > fails
> > > 
> > > > > >> > to substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’
> > > > > >> > 
> > > > > >> >  - The fetcher is having trouble interpreting the links with
> > > 
> > > special
> > > 
> > > > > >> > character ‘ñ’.
> > > > > >> > 
> > > > > >> > i can add this transformation to basic-url normalizer if
> > 
> > somebody
> > 
> > > is
> > > 
> > > > > >> > willing to commit it.
> > > > > >> 
> > > > > >> --
> > > > > >> *Lewis*
> > > > > 
> > > > > --
> > > > > *Lewis*
> > > 
> > > --
> > > Markus Jelsma - CTO - Openindex
> > > http://www.linkedin.com/in/markus17
> > > 050-8536620 / 06-50258350

Reply via email to