From the changelog: http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?view=markup
111 * NUTCH-1054 LinkDB optional during indexing (jnioche) With your command, the given linkdb is interpreted as a segment. https://issues.apache.org/jira/browse/NUTCH-1054 This is the new command: Usage: SolrIndexer <solr url> <crawldb> [-linkdb <linkdb>] (<segment> ... | - dir <segments>) [-noCommit On Tuesday 25 October 2011 18:41:09 Bai Shen wrote: > I'm having a similar issue. I'm using 1.4 and getting these errors with > linkdb. The segments seem fine. > > 2011-10-25 10:10:20,060 INFO solr.SolrIndexer - SolrIndexer: starting at > 2011-10-25 10:10:20 > 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - IndexerMapReduce: > crawldb: crawl/crawldb > 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - IndexerMapReduces: > adding segment: crawl/linkdb > 2011-10-25 10:10:20,136 INFO indexer.IndexerMapReduce - IndexerMapReduces: > adding segment: crawl/segments/20111025095216 > 2011-10-25 10:10:20,138 INFO indexer.IndexerMapReduce - IndexerMapReduces: > adding segment: crawl/segments/20111025100004 > 2011-10-25 10:10:20,207 ERROR solr.SolrIndexer - > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch > Input path does not exist: > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse > Input path does not exist: > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data > Input path does not exist: > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text > > > Did something change with 1.4? > > On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney < > > [email protected]> wrote: > > Hi Fred, > > > > How many individual directories do you have under > > /runtime/local/crawl/segments/ > > ? > > > > Another thing that raises alarms is the nohup.out dir's! Are these > > intentional? Interestingly, missing segment data is not the same with > > these dir's. > > > > Does your log output indicate any discrepancies between various command > > transitions? > > > > > > > > bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ bin/nutch > > > > >> solrindex > > >> http://zimzazsearch3-1.bitnamiapp.com:8983/solr/crawl/crawldb > > >> crawl/linkdb crawl/segments/* > > >> SolrIndexer: starting at 2011-10-09 00:13:24 > > >> org.apache.hadoop.mapred.InvalidInputException: Input path does not > > > > exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922143907/crawl_fetch > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922143907/crawl_parse > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922143907/parse_data > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922143907/parse_text > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922144329/crawl_fetch > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922144329/crawl_parse > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922144329/parse_data > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > 922144329/parse_text > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111 > > 008015309/crawl_parse > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111 > > 008015309/parse_data > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111 > > 008015309/parse_text > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup > > .out/crawl_fetch > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup > > .out/crawl_parse > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup > > .out/parse_data > > > > >> Input path does not exist: > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup > > .out/parse_text > > > > > ----------------------------------------------------- > > > Subscribe to the Nimble Books Mailing List http://eepurl.com/czS- for > > > monthly updates > > > > > > > > > > > > On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney < > > > > > > [email protected]> wrote: > > >> Hi guys, > > >> > > >> I have been watching this thread intently and I am very happy to see > > > > that > > > > >> there is some progress :0) > > >> > > >> Radim, > > >> > > >> Can I ask that you open a JIRA issue and submit a patch, this way we > > >> can not > > >> only track it, but it will also give the community a chance to test > > >> and validate the patch prior to integration into the source. > > >> > > >> Thanks > > >> > > >> Lewis > > >> > > >> On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh < > > >> > > >> [email protected]> wrote: > > >> > Hi Radim, > > >> > > > >> > Thank you so much for this. I am not familiar with commit process > > >> > to > > >> > > >> the > > >> > > >> > core. > > >> > > > >> > Is there someone who can help us get this committed and help > > >> > resolve > > >> > > >> this > > >> > > >> > issue? > > >> > > > >> > Thanks for all your help. > > >> > > > >> > Rajesh Ramana > > >> > > > >> > -----Original Message----- > > >> > From: Radim Kolar [mailto:[email protected]] > > >> > Sent: Thursday, October 06, 2011 2:18 PM > > >> > To: [email protected] > > >> > Subject: Re: Nutch not crawling URLs with spanish accented > > >> > characters > > > > ( > > > > >> ñ) > > >> > > >> > - The REGEX normalizer transforms the special characters, but fails > > >> > to substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’ > > >> > > > >> > - The fetcher is having trouble interpreting the links with special > > >> > > > >> > character ‘ñ’. > > >> > > > >> > i can add this transformation to basic-url normalizer if somebody is > > >> > willing to commit it. > > >> > > >> -- > > >> *Lewis* > > > > -- > > *Lewis* -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

