RE: [MASSMAIL]RE: Exchange documents in indexing job

2017-08-23 Thread Markus Jelsma
For examples you can look at CrawlDbReader/CrawlDatum and Generator, Regards, Markus -Original message- > From:Roannel Fernández Hernández > Sent: Wednesday 23rd August 2017 21:31 > To: user@nutch.apache.org > Subject: Re: [MASSMAIL]RE: Exchange documents in indexing job > > Hi. >

Re: [MASSMAIL]RE: Exchange documents in indexing job

2017-08-23 Thread Roannel Fernández Hernández
Hi. Thanks for your tips. I like the idea of JEXL expressions. I'm going to create the ticket and I'll putting to work. Thanks a lot. - Original Message - > From: "Markus Jelsma" > To: user@nutch.apache.org > Sent: Wednesday, August 23, 2017 2:05:21 PM > Subject: [MASSMAIL]RE: Exchange

RE: Exchange documents in indexing job

2017-08-23 Thread Markus Jelsma
I think MIME-type filter is a fine method this, the only drawback is that you need to run the indexer twice. Althouh a better solution would be to support JEXL expressions in IndexWriters and IndexerMapReduce to allow global filtering and per-IndexWriter filtering. This would not be very hard t

RE: Exchange documents in indexing job

2017-08-23 Thread Yossi Tamari
I don't see a good way to do it in configuration, but it should be very easy to override the write method in the two plugins to have it check the mime type and decide whether to call super.write or not. (One terrible way to do it with configuration only would be to configure only one of the inde

Exchange documents in indexing job

2017-08-23 Thread Roannel Fernández Hernández
Hi folks: There is some way in Nutch to send some documents to a particular index writer according to particular values of fields? I explain myself better. I have a document with a field called "mimetype" and I want to send to Solr only the documents with value "text/plain" for this field an