Re: [MASSMAIL][VOTE] Release Apache Nutch 1.15 RC#1

2018-07-26 Thread Roannel Fernández Hernández
+1 Great work, folks - Mensaje original - > De: "Sebastian Nagel" > Para: user@nutch.apache.org > CC: d...@nutch.apache.org > Enviados: Jueves, 26 de Julio 2018 11:05:06 > Asunto: [MASSMAIL][VOTE] Release Apache Nutch 1.15 RC#1 > > Hi Folks, > > A first candidate for the Nutch 1.15

Re: [MASSMAIL]RE: Events out-of-the-box

2018-07-04 Thread Roannel Fernández Hernández
you > can simply add it to the crawl script, no need to touch the Java code. > > > -Original Message- > > From: Roannel Fernández Hernández > > Sent: 29 June 2018 06:24 > > To: user@nutch.apache.org > > Subject: Events out-of-the-box > > > &

Events out-of-the-box

2018-06-28 Thread Roannel Fernández Hernández
Hi folks, I'm using Nutch 1.14 and I have to send notifications to a RabbitMQ queue when a every step starts and ends. So, my question is: Do I have to change the code to achieve this or is there an easier way? How can I do this? If code should be changed I think is a good idea

Re: [MASSMAIL][ANNOUNCE] New Nutch committer and PMC -

2018-06-26 Thread Roannel Fernández Hernández
29 > Asunto: [MASSMAIL][ANNOUNCE] New Nutch committer and PMC - > > Dear all, > > it is my pleasure to announce that Roannel Fernández Hernández > has joined us as a committer and member of the Nutch PMC. > > Recently, Roannel contributed a long list of improvements r

Re: [MASSMAIL]Re: Preparing to release Nutch 1.15 ?

2018-06-11 Thread Roannel Fernández Hernández
+1 Regards - Chris Mattmann escribió: > ++1! > > > > Sounds great. > > > > Cheers, > > Chris > > > > > > > > > > From: Sebastian Nagel > Reply-To: "d...@nutch.apache.org" > Date: Monday, June 11, 2018 at 7:35 AM > To: "user@nutch.apache.org" > Cc:

Re: [MASSMAIL]Certificates

2017-11-28 Thread Roannel Fernández Hernández
Hi Sadiki: You must add your Solr's certificate into cacerts (keystore by default) of your Java distribution. Under Linux you can know where your cacerts file is, with: echo $(readlink -f /usr/bin/java | sed "s:bin/java::")lib/security/cacerts as is described on

Re: [MASSMAIL]RE: Exchange documents in indexing job

2017-08-23 Thread Roannel Fernández Hernández
super.write or not. > > (One terrible way to do it with configuration only would be to configure > > only one of the indexers and use mimetype-filter to filter the matching > > type, and then reconfigure for the other indexer and change > > mimetype-filter.txt to the other mime ty

Exchange documents in indexing job

2017-08-23 Thread Roannel Fernández Hernández
Hi folks: There is some way in Nutch to send some documents to a particular index writer according to particular values of fields? I explain myself better. I have a document with a field called "mimetype" and I want to send to Solr only the documents with value "text/plain" for this field

Re: [MASSMAIL]Re: Many indexers

2017-06-15 Thread Roannel Fernández Hernández
t 7:42 AM, <user-digest-h...@nutch.apache.org> wrote: > > > > > From: "Roannel Fernández Hernández" <roan...@uci.cu> > > To: user@nutch.apache.org > > Cc: > > Bcc: > > Date: Mon, 12 Jun 2017 10:28:42 -0400 (CDT) > > Subject: Many indexers > &g

Re: [MASSMAIL]efficient way to create an index out of crawled documents from nutch

2017-06-15 Thread Roannel Fernández Hernández
Hi Srinivasan, Comments in line. Regards - Original Message - > From: "Srinivasan Ramaswamy" > To: user@nutch.apache.org > Sent: Thursday, June 15, 2017 4:40:44 AM > Subject: [MASSMAIL]efficient way to create an index out of crawled documents > from nutch > > Hi

Many indexers

2017-06-12 Thread Roannel Fernández Hernández
Hi folks I'm using Nutch 1.12 and I have to send the all documents to different Solr servers (3 servers). Each Solr server is for different purposes, so the schemas isn't the same in each server. So I need to remove some fields before send it to a particular Solr server. How can I do that?

Re: [MASSMAIL]How can I send nutch docs to rabbit mq?

2017-01-10 Thread Roannel Fernández Hernández
Hi Matt Joseph I wrote an indexer for rabbit. Just take a look and tell us what you think about it and if it meets your requirements. Look it up here: https://issues.apache.org/jira/browse/NUTCH-2333 Regards - Original Message - > From: "Matt Joseph" > To:

Re: [MASSMAIL][Exception] Nutch 1.7, Solr 4.7

2016-01-19 Thread Roannel Fernández Hernández
Hi Murali: Check it out: http://stackoverflow.com/questions/35186/how-do-i-fix-a-nosuchmethoderror Regards - Original Message - > From: "Ganji Muralikrishna | BDD" > To: user@nutch.apache.org > Sent: Monday, December 28, 2015 2:23:54 AM > Subject:

Re: [MASSMAIL]cannot crawl with inject

2015-12-01 Thread Roannel Fernández Hernández
Hi Dan: You should specify only the parent directory of your seed files. In that case, your command must be: bin/nutch inject urls/ Regards - Mensaje original - > De: "Dan Wu" > Para: user@nutch.apache.org > Enviados: Martes, 1 de Diciembre 2015 5:05:56 > Asunto:

Re: [MASSMAIL]Re: Nutch 1.10 in Eclipse

2015-11-23 Thread Roannel Fernández Hernández
Hi Ganji: If you want to make all operations in only one script, Nutch provides you an script named "crawl" for this task. All steps are executed individually in this script. So, the Crawl class doesn't exist. Regards - Mensaje original - > De: "Ganji Muralikrishna | BDD"

Re: [MASSMAIL]fetcher.server.delay configuration not working

2015-11-23 Thread Roannel Fernández Hernández
Hi Andrés: The fetcher.server.delay property as its description says is the number of seconds the fetcher will delay between successive requests to the same server. So, if you configure the fetcher.server.delay property with 2.5 as value, Nutch will wait for 2.5 seconds to make another request

Re: [MASSMAIL]Crawling focused only over seed file

2015-11-19 Thread Roannel Fernández Hernández
oblem. > > How I can avoid add any newly discovered URLs during fetch process? I want > > that nutch process only urls of seed file. > > > > Thanks. > > > > > > 2015-11-18 9:22 GMT-05:00 Roannel Fernández Hernández <roan...@uci.cu>: &g

Re: [MASSMAIL]Crawl Command - Getting Exception While Indexing With Solr

2015-11-18 Thread Roannel Fernández Hernández
Hi What version of Nutch you downloaded exactly? Regards - Mensaje original - > De: "Manish Verma" ve...@apple.com> > Para: user@nutch.apache.org > Enviados: Lunes, 16 de Noviembre 2015 12:36:46 > Asunto: [MASSMAIL]Crawl Command - Getting Exception While Indexing With Solr > > Hi , >

Re: [MASSMAIL]Crawl Command - Getting Exception While Indexing With Solr

2015-11-18 Thread Roannel Fernández Hernández
Hi Check into the folder of the indexer-solr plugin whether exist the solr-solrj-4.10.2.jar library. Regards - Mensaje original - > De: "Roannel Fernández Hernández" <roan...@uci.cu> > Para: user@nutch.apache.org > Enviados: Miércoles, 18 de Noviembre

Re: [MASSMAIL]Crawling focused only over seed file

2015-11-18 Thread Roannel Fernández Hernández
Hi Andrés, Change in your nutch-site.xml the property db.ignore.external.links to true. Regards - Mensaje original - > De: "Andrés Rincón Pacheco" > Para: user@nutch.apache.org > Enviados: Sábado, 14 de Noviembre 2015 19:51:54 > Asunto: [MASSMAIL]Crawling focused

Re: [MASSMAIL]index-more Filer Not Working

2015-11-18 Thread Roannel Fernández Hernández
Hi Manish, If you need to index an static value you can use the index-static plugin. Add the index-static plugin in your plugin.includes property and change the value of the index.static property. For example: index.static myname:manish Regards - Mensaje original - > De:

Re: Need To Index URL Strings

2015-11-18 Thread Roannel Fernández Hernández
Hi Manish: What index server are you using??. In Solr you can use the tokenizers for this task. Regards - Mensaje original - > De: "Manish Verma" > Para: user@nutch.apache.org > Enviados: Viernes, 13 de Noviembre 2015 21:13:56 > Asunto: Need To Index URL Strings >

Re: [MASSMAIL]Nutch only fetch and parse the third part of urls

2015-10-09 Thread Roannel Fernández Hernández
Hi Andres, Check your rules in the URL filters. Roannel - Mensaje original - > De: "Andrés Rincón Pacheco" > Para: user@nutch.apache.org > Enviados: Jueves, 8 de Octubre 2015 9:26:11 > Asunto: [MASSMAIL]Nutch only fetch and parse the third part of urls > > Hi, > >