I need to integrate nutch with solr 4.4.0. Do you think that nutch 1.7 works well with solr 4.4.0??? ________________________________________ De: Julien Nioche [[email protected]] Enviado el: jueves, 17 de octubre de 2013 02:01 p.m. Para: [email protected] Asunto: Re: crawling with Nutch 2.2.1
Memstore is certainly not persistent across jobs. Try using a different backend like HBase or Cassandra (see tutorials on the wiki) or switch to Nutch 1.x On 17 October 2013 19:52, Luis Armando Roca Fumero <[email protected]>wrote: > Hello friends, > I configured nutch 2.2.1 to crwal the web page http://intranet.uclv.edu.cu > . > I get the results located below in this page when I ran this command: > ./bin/crawl urls crawlId http://localhost:8983/solr/ 3 > I need to know if I wrong, but I feel like something is not working well, > I attached the config files too. > Please, write me, this is my 3rd mail and I haven't answers or suggestions > from these mailing list > Thanks in advance, > Luis Armando > > > > root@solr1:/opt/apache-nutch-2.2.1/runtime/local# ./bin/crawl urls > crawlId http://localhost:8983/solr/ 3 > InjectorJob: starting at 2013-10-17 18:43:13 > InjectorJob: Injecting urlDir: urls > InjectorJob: Using class org.apache.gora.memory.store.MemStore as the Gora > storage class. > InjectorJob: total number of urls rejected by filters: 0 > InjectorJob: total number of urls injected after normalization and > filtering: 1 > Injector: finished at 2013-10-17 18:43:15, elapsed: 00:00:02 > Thu Oct 17 18:43:15 UTC 2013 : Iteration 1 of 3 > Generating batchId > Generating a new fetchlist > GeneratorJob: starting at 2013-10-17 18:43:16 > GeneratorJob: Selecting best-scoring urls due for fetch. > GeneratorJob: starting > GeneratorJob: filtering: false > GeneratorJob: normalizing: false > GeneratorJob: topN: 50000 > GeneratorJob: finished at 2013-10-17 18:43:19, time elapsed: 00:00:02 > GeneratorJob: generated batch id: 1382035395-32147 > Fetching : > FetcherJob: starting > FetcherJob: batchId: 1382035395-32147 > Fetcher: Your 'http.agent.name' value should be listed first in > 'http.robots.agents' property. > FetcherJob: threads: 50 > FetcherJob: parsing: false > FetcherJob: resuming: false > FetcherJob : timelimit set for : 1382046200181 > Using queue mode : byHost > Fetcher: threads: 50 > QueueFeeder finished: total 0 records. Hit by time limit :0 > -finishing thread FetcherThread0, activeThreads=0 > -finishing thread FetcherThread1, activeThreads=0 > -finishing thread FetcherThread2, activeThreads=0 > -finishing thread FetcherThread3, activeThreads=0 > -finishing thread FetcherThread4, activeThreads=0 > -finishing thread FetcherThread6, activeThreads=0 > -finishing thread FetcherThread5, activeThreads=0 > -finishing thread FetcherThread7, activeThreads=0 > -finishing thread FetcherThread8, activeThreads=1 > -finishing thread FetcherThread9, activeThreads=0 > -finishing thread FetcherThread10, activeThreads=0 > -finishing thread FetcherThread11, activeThreads=0 > -finishing thread FetcherThread12, activeThreads=0 > -finishing thread FetcherThread13, activeThreads=0 > -finishing thread FetcherThread15, activeThreads=0 > -finishing thread FetcherThread14, activeThreads=0 > -finishing thread FetcherThread16, activeThreads=0 > -finishing thread FetcherThread17, activeThreads=0 > -finishing thread FetcherThread18, activeThreads=0 > -finishing thread FetcherThread19, activeThreads=0 > -finishing thread FetcherThread20, activeThreads=0 > -finishing thread FetcherThread21, activeThreads=0 > -finishing thread FetcherThread23, activeThreads=0 > -finishing thread FetcherThread22, activeThreads=0 > -finishing thread FetcherThread24, activeThreads=0 > -finishing thread FetcherThread26, activeThreads=0 > -finishing thread FetcherThread25, activeThreads=0 > -finishing thread FetcherThread27, activeThreads=0 > -finishing thread FetcherThread28, activeThreads=0 > -finishing thread FetcherThread29, activeThreads=0 > -finishing thread FetcherThread30, activeThreads=0 > -finishing thread FetcherThread31, activeThreads=0 > -finishing thread FetcherThread32, activeThreads=0 > -finishing thread FetcherThread33, activeThreads=0 > -finishing thread FetcherThread34, activeThreads=0 > -finishing thread FetcherThread35, activeThreads=0 > -finishing thread FetcherThread36, activeThreads=0 > -finishing thread FetcherThread38, activeThreads=0 > -finishing thread FetcherThread37, activeThreads=0 > -finishing thread FetcherThread39, activeThreads=0 > -finishing thread FetcherThread40, activeThreads=0 > -finishing thread FetcherThread41, activeThreads=0 > -finishing thread FetcherThread42, activeThreads=0 > -finishing thread FetcherThread43, activeThreads=0 > -finishing thread FetcherThread44, activeThreads=0 > -finishing thread FetcherThread45, activeThreads=0 > -finishing thread FetcherThread46, activeThreads=0 > -finishing thread FetcherThread47, activeThreads=0 > -finishing thread FetcherThread48, activeThreads=0 > Fetcher: throughput threshold: -1 > Fetcher: throughput threshold sequence: 5 > -finishing thread FetcherThread49, activeThreads=0 > 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs > in 0 queues > -activeThreads=0 > FetcherJob: done > Parsing : > ParserJob: starting > ParserJob: resuming: false > ParserJob: forced reparse: false > ParserJob: batchId: 1382035395-32147 > ParserJob: success > CrawlDB update for crawlId > DbUpdaterJob: starting > DbUpdaterJob: done > Indexing crawlId on SOLR index -> http://localhost:8983/solr/ > SolrIndexerJob: starting > SolrIndexerJob: done. > SOLR dedup -> http://localhost:8983/solr/ > > La Universidad Central "Marta Abreu" de Las Villas en su 60 Aniversario. > Fundada el 30 de noviembre de 1952. Visítenos en: http://www.uclv.edu.cu > Participe en Universidad 2014, del 10 al 14 de febrero de 2014. Habana. > Cuba. http://www.congresouniversidad.cu/ > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble La Universidad Central "Marta Abreu" de Las Villas en su 60 Aniversario. Fundada el 30 de noviembre de 1952. Visítenos en: http://www.uclv.edu.cu Participe en Universidad 2014, del 10 al 14 de febrero de 2014. Habana. Cuba. http://www.congresouniversidad.cu/ La Universidad Central "Marta Abreu" de Las Villas en su 60 Aniversario. Fundada el 30 de noviembre de 1952. Visítenos en: http://www.uclv.edu.cu Participe en Universidad 2014, del 10 al 14 de febrero de 2014. Habana. Cuba. http://www.congresouniversidad.cu/

