Hi, I have good experience with Solr 4.0 and Nutch 2.2.1 combination. Jan
2013/10/17 Luis Armando Roca Fumero <[email protected]> > I need to integrate nutch with solr 4.4.0. Do you think that nutch 1.7 > works well with solr 4.4.0??? > ________________________________________ > De: Julien Nioche [[email protected]] > Enviado el: jueves, 17 de octubre de 2013 02:01 p.m. > Para: [email protected] > Asunto: Re: crawling with Nutch 2.2.1 > > Memstore is certainly not persistent across jobs. Try using a different > backend like HBase or Cassandra (see tutorials on the wiki) or switch to > Nutch 1.x > > > On 17 October 2013 19:52, Luis Armando Roca Fumero <[email protected] > >wrote: > > > Hello friends, > > I configured nutch 2.2.1 to crwal the web page > http://intranet.uclv.edu.cu > > . > > I get the results located below in this page when I ran this command: > > ./bin/crawl urls crawlId http://localhost:8983/solr/ 3 > > I need to know if I wrong, but I feel like something is not working well, > > I attached the config files too. > > Please, write me, this is my 3rd mail and I haven't answers or > suggestions > > from these mailing list > > Thanks in advance, > > Luis Armando > > > > > > > > root@solr1:/opt/apache-nutch-2.2.1/runtime/local# ./bin/crawl urls > > crawlId http://localhost:8983/solr/ 3 > > InjectorJob: starting at 2013-10-17 18:43:13 > > InjectorJob: Injecting urlDir: urls > > InjectorJob: Using class org.apache.gora.memory.store.MemStore as the > Gora > > storage class. > > InjectorJob: total number of urls rejected by filters: 0 > > InjectorJob: total number of urls injected after normalization and > > filtering: 1 > > Injector: finished at 2013-10-17 18:43:15, elapsed: 00:00:02 > > Thu Oct 17 18:43:15 UTC 2013 : Iteration 1 of 3 > > Generating batchId > > Generating a new fetchlist > > GeneratorJob: starting at 2013-10-17 18:43:16 > > GeneratorJob: Selecting best-scoring urls due for fetch. > > GeneratorJob: starting > > GeneratorJob: filtering: false > > GeneratorJob: normalizing: false > > GeneratorJob: topN: 50000 > > GeneratorJob: finished at 2013-10-17 18:43:19, time elapsed: 00:00:02 > > GeneratorJob: generated batch id: 1382035395-32147 > > Fetching : > > FetcherJob: starting > > FetcherJob: batchId: 1382035395-32147 > > Fetcher: Your 'http.agent.name' value should be listed first in > > 'http.robots.agents' property. > > FetcherJob: threads: 50 > > FetcherJob: parsing: false > > FetcherJob: resuming: false > > FetcherJob : timelimit set for : 1382046200181 > > Using queue mode : byHost > > Fetcher: threads: 50 > > QueueFeeder finished: total 0 records. Hit by time limit :0 > > -finishing thread FetcherThread0, activeThreads=0 > > -finishing thread FetcherThread1, activeThreads=0 > > -finishing thread FetcherThread2, activeThreads=0 > > -finishing thread FetcherThread3, activeThreads=0 > > -finishing thread FetcherThread4, activeThreads=0 > > -finishing thread FetcherThread6, activeThreads=0 > > -finishing thread FetcherThread5, activeThreads=0 > > -finishing thread FetcherThread7, activeThreads=0 > > -finishing thread FetcherThread8, activeThreads=1 > > -finishing thread FetcherThread9, activeThreads=0 > > -finishing thread FetcherThread10, activeThreads=0 > > -finishing thread FetcherThread11, activeThreads=0 > > -finishing thread FetcherThread12, activeThreads=0 > > -finishing thread FetcherThread13, activeThreads=0 > > -finishing thread FetcherThread15, activeThreads=0 > > -finishing thread FetcherThread14, activeThreads=0 > > -finishing thread FetcherThread16, activeThreads=0 > > -finishing thread FetcherThread17, activeThreads=0 > > -finishing thread FetcherThread18, activeThreads=0 > > -finishing thread FetcherThread19, activeThreads=0 > > -finishing thread FetcherThread20, activeThreads=0 > > -finishing thread FetcherThread21, activeThreads=0 > > -finishing thread FetcherThread23, activeThreads=0 > > -finishing thread FetcherThread22, activeThreads=0 > > -finishing thread FetcherThread24, activeThreads=0 > > -finishing thread FetcherThread26, activeThreads=0 > > -finishing thread FetcherThread25, activeThreads=0 > > -finishing thread FetcherThread27, activeThreads=0 > > -finishing thread FetcherThread28, activeThreads=0 > > -finishing thread FetcherThread29, activeThreads=0 > > -finishing thread FetcherThread30, activeThreads=0 > > -finishing thread FetcherThread31, activeThreads=0 > > -finishing thread FetcherThread32, activeThreads=0 > > -finishing thread FetcherThread33, activeThreads=0 > > -finishing thread FetcherThread34, activeThreads=0 > > -finishing thread FetcherThread35, activeThreads=0 > > -finishing thread FetcherThread36, activeThreads=0 > > -finishing thread FetcherThread38, activeThreads=0 > > -finishing thread FetcherThread37, activeThreads=0 > > -finishing thread FetcherThread39, activeThreads=0 > > -finishing thread FetcherThread40, activeThreads=0 > > -finishing thread FetcherThread41, activeThreads=0 > > -finishing thread FetcherThread42, activeThreads=0 > > -finishing thread FetcherThread43, activeThreads=0 > > -finishing thread FetcherThread44, activeThreads=0 > > -finishing thread FetcherThread45, activeThreads=0 > > -finishing thread FetcherThread46, activeThreads=0 > > -finishing thread FetcherThread47, activeThreads=0 > > -finishing thread FetcherThread48, activeThreads=0 > > Fetcher: throughput threshold: -1 > > Fetcher: throughput threshold sequence: 5 > > -finishing thread FetcherThread49, activeThreads=0 > > 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 > URLs > > in 0 queues > > -activeThreads=0 > > FetcherJob: done > > Parsing : > > ParserJob: starting > > ParserJob: resuming: false > > ParserJob: forced reparse: false > > ParserJob: batchId: 1382035395-32147 > > ParserJob: success > > CrawlDB update for crawlId > > DbUpdaterJob: starting > > DbUpdaterJob: done > > Indexing crawlId on SOLR index -> http://localhost:8983/solr/ > > SolrIndexerJob: starting > > SolrIndexerJob: done. > > SOLR dedup -> http://localhost:8983/solr/ > > > > La Universidad Central "Marta Abreu" de Las Villas en su 60 Aniversario. > > Fundada el 30 de noviembre de 1952. Visítenos en: > http://www.uclv.edu.cu > > Participe en Universidad 2014, del 10 al 14 de febrero de 2014. Habana. > > Cuba. http://www.congresouniversidad.cu/ > > > > > > > > > -- > * > *Open Source Solutions for Text Engineering > > http://digitalpebble.blogspot.com/ > http://www.digitalpebble.com > http://twitter.com/digitalpebble > > La Universidad Central "Marta Abreu" de Las Villas en su 60 Aniversario. > Fundada el 30 de noviembre de 1952. Visítenos en: http://www.uclv.edu.cu > Participe en Universidad 2014, del 10 al 14 de febrero de 2014. Habana. > Cuba. http://www.congresouniversidad.cu/ > > > > La Universidad Central "Marta Abreu" de Las Villas en su 60 Aniversario. > Fundada el 30 de noviembre de 1952. Visítenos en: http://www.uclv.edu.cu > Participe en Universidad 2014, del 10 al 14 de febrero de 2014. Habana. > Cuba. http://www.congresouniversidad.cu/ > > >

