I'm still having trouble with this in 1.3. looks as if there's something dumb with syntax or file structure but can't get it.
$ bin/nutch solrindex http://search.zimzaz.com:8983/solr crawl/crawldb -linkdb crawl/linkdb crawl/segments/* SolrIndexer: starting at 2011-10-25 23:26:02 org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_fetch Input path does not exist: file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/crawl_parse Input path does not exist: file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_data Input path does not exist: file:/home/bitnami/nutch-1.3/runtime/local/crawl/linkdb/parse_text Input path does not exist: file:/home/bitnami/nutch-1.3/runtime/local/-linkdb/current On Tue, Oct 25, 2011 at 12:49 PM, Markus Jelsma <[email protected]>wrote: > From the changelog: > http://svn.apache.org/viewvc/nutch/trunk/CHANGES.txt?view=markup > > 111 * NUTCH-1054 LinkDB optional during indexing (jnioche) > > With your command, the given linkdb is interpreted as a segment. > > https://issues.apache.org/jira/browse/NUTCH-1054 > > This is the new command: > > Usage: SolrIndexer <solr url> <crawldb> [-linkdb <linkdb>] (<segment> ... | > - > dir <segments>) [-noCommit > > On Tuesday 25 October 2011 18:41:09 Bai Shen wrote: > > I'm having a similar issue. I'm using 1.4 and getting these errors with > > linkdb. The segments seem fine. > > > > 2011-10-25 10:10:20,060 INFO solr.SolrIndexer - SolrIndexer: starting at > > 2011-10-25 10:10:20 > > 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - > IndexerMapReduce: > > crawldb: crawl/crawldb > > 2011-10-25 10:10:20,110 INFO indexer.IndexerMapReduce - > IndexerMapReduces: > > adding segment: crawl/linkdb > > 2011-10-25 10:10:20,136 INFO indexer.IndexerMapReduce - > IndexerMapReduces: > > adding segment: crawl/segments/20111025095216 > > 2011-10-25 10:10:20,138 INFO indexer.IndexerMapReduce - > IndexerMapReduces: > > adding segment: crawl/segments/20111025100004 > > 2011-10-25 10:10:20,207 ERROR solr.SolrIndexer - > > org.apache.hadoop.mapred.InvalidInputException: Input path does not > exist: > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_fetch > > Input path does not exist: > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/crawl_parse > > Input path does not exist: > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_data > > Input path does not exist: > > file:/opt/nutch-1.4/runtime/local/crawl/linkdb/parse_text > > > > > > Did something change with 1.4? > > > > On Sun, Oct 9, 2011 at 6:15 AM, lewis john mcgibbney < > > > > [email protected]> wrote: > > > Hi Fred, > > > > > > How many individual directories do you have under > > > /runtime/local/crawl/segments/ > > > ? > > > > > > Another thing that raises alarms is the nohup.out dir's! Are these > > > intentional? Interestingly, missing segment data is not the same with > > > these dir's. > > > > > > Does your log output indicate any discrepancies between various command > > > transitions? > > > > > > > > > > > > bitnami@ip-10-202-202-68:~/nutch-1.3/nutch-1.3/runtime/local$ > bin/nutch > > > > > > >> solrindex > > > >> http://zimzazsearch3-1.bitnamiapp.com:8983/solr/crawl/crawldb > > > >> crawl/linkdb crawl/segments/* > > > >> SolrIndexer: starting at 2011-10-09 00:13:24 > > > >> org.apache.hadoop.mapred.InvalidInputException: Input path does not > > > > > > exist: > > > > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > > 922143907/crawl_fetch > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > > 922143907/crawl_parse > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > > 922143907/parse_data > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > > 922143907/parse_text > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > > 922144329/crawl_fetch > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > > 922144329/crawl_parse > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > > 922144329/parse_data > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20110 > > > 922144329/parse_text > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111 > > > 008015309/crawl_parse > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111 > > > 008015309/parse_data > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/20111 > > > 008015309/parse_text > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup > > > .out/crawl_fetch > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup > > > .out/crawl_parse > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup > > > .out/parse_data > > > > > > >> Input path does not exist: > > > > file:/home/bitnami/nutch-1.3/nutch-1.3/runtime/local/crawl/segments/nohup > > > .out/parse_text > > > > > > > ----------------------------------------------------- > > > > Subscribe to the Nimble Books Mailing List http://eepurl.com/czS-for > > > > monthly updates > > > > > > > > > > > > > > > > On Sat, Oct 8, 2011 at 14:22, lewis john mcgibbney < > > > > > > > > [email protected]> wrote: > > > >> Hi guys, > > > >> > > > >> I have been watching this thread intently and I am very happy to see > > > > > > that > > > > > > >> there is some progress :0) > > > >> > > > >> Radim, > > > >> > > > >> Can I ask that you open a JIRA issue and submit a patch, this way we > > > >> can not > > > >> only track it, but it will also give the community a chance to test > > > >> and validate the patch prior to integration into the source. > > > >> > > > >> Thanks > > > >> > > > >> Lewis > > > >> > > > >> On Fri, Oct 7, 2011 at 5:49 PM, Ramanathapuram, Rajesh < > > > >> > > > >> [email protected]> wrote: > > > >> > Hi Radim, > > > >> > > > > >> > Thank you so much for this. I am not familiar with commit process > > > >> > to > > > >> > > > >> the > > > >> > > > >> > core. > > > >> > > > > >> > Is there someone who can help us get this committed and help > > > >> > resolve > > > >> > > > >> this > > > >> > > > >> > issue? > > > >> > > > > >> > Thanks for all your help. > > > >> > > > > >> > Rajesh Ramana > > > >> > > > > >> > -----Original Message----- > > > >> > From: Radim Kolar [mailto:[email protected]] > > > >> > Sent: Thursday, October 06, 2011 2:18 PM > > > >> > To: [email protected] > > > >> > Subject: Re: Nutch not crawling URLs with spanish accented > > > >> > characters > > > > > > ( > > > > > > >> ñ) > > > >> > > > >> > - The REGEX normalizer transforms the special characters, but > fails > > > >> > to substitute ‘%F1’ or ‘%C3%B1’ for ‘ñ’ > > > >> > > > > >> > - The fetcher is having trouble interpreting the links with > special > > > >> > > > > >> > character ‘ñ’. > > > >> > > > > >> > i can add this transformation to basic-url normalizer if somebody > is > > > >> > willing to commit it. > > > >> > > > >> -- > > > >> *Lewis* > > > > > > -- > > > *Lewis* > > -- > Markus Jelsma - CTO - Openindex > http://www.linkedin.com/in/markus17 > 050-8536620 / 06-50258350 >

