Thanks a lot Talat :), I truly appreciate your help, and the others persons that gave me ideas
I fixed Solr schema, following the Nutch Tutorial I had changed the line: <field name="content" type="text_general" stored="true" indexed="true"/> for <field name="content" type="text" stored="true" indexed="true"/>, but this is wrong I fixed that and ran again the nutch 1.7 but still getting problems :( , you can see a new hadoop.log here: http://pastebin.com/2qY0sUJh The errors are: Exception in thread "main" java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373) at org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353) at org.apache.nutch.crawl.Crawl.run(Crawl.java:160) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) Any ideas are wellcomed!!! Thanks in advance, Luis Armando ________________________________________ De: Talat UYARER [[email protected]] Enviado el: viernes, 18 de octubre de 2013 03:39 p.m. Para: [email protected] Asunto: Re: Nutch 1.7 and Solr 4.4.0 Integrate Ok Luis, I found your problem. :) You have a problem about Solr Schema. In your hadoop.log you can see this line: 1. org.apache.solr.common.SolrException: {msg=SolrCore 'collection1' is not available due to init failure: Unknown fieldType 'text' specified on field content,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Unknown fieldType 'text' specified on field content at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:251) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) a As you see, When nutch try to commit Solr throw an exception. You should check your Solr schema. You can ask me why does solrdedup throw an exception. Because IndexerJob didnt commit your document to Solr. When try to run dedup it didnt find any document check for duplication. Talat La Universidad Central "Marta Abreu" de Las Villas en su 60 Aniversario. Fundada el 30 de noviembre de 1952. VisÃtenos en: http://www.uclv.edu.cu Participe en Universidad 2014, del 10 al 14 de febrero de 2014. Habana. Cuba. http://www.congresouniversidad.cu/

