For examples you can look at CrawlDbReader/CrawlDatum and Generator,

Regards,
Markus

 
 
-----Original message-----
> From:Roannel Fernández Hernández <[email protected]>
> Sent: Wednesday 23rd August 2017 21:31
> To: [email protected]
> Subject: Re: [MASSMAIL]RE: Exchange documents in indexing job
> 
> Hi.
> 
> Thanks for your tips. I like the idea of JEXL expressions. I'm going to 
> create the ticket and I'll putting to work.
> 
> Thanks a lot.
> 
> ----- Original Message -----
> > From: "Markus Jelsma" <[email protected]>
> > To: [email protected]
> > Sent: Wednesday, August 23, 2017 2:05:21 PM
> > Subject: [MASSMAIL]RE: Exchange documents in indexing job
> > 
> > I think MIME-type filter is a fine method this, the only drawback is that 
> > you
> > need to run the indexer twice.
> > 
> > Althouh a better solution would be to support JEXL expressions in
> > IndexWriters and IndexerMapReduce to allow global filtering and
> > per-IndexWriter filtering. This would not be very hard to patch in.
> >  
> > -----Original message-----
> > > From:Yossi Tamari <[email protected]>
> > > Sent: Wednesday 23rd August 2017 19:40
> > > To: [email protected]
> > > Subject: RE: Exchange documents in indexing job
> > > 
> > > I don't see a good way to do it in configuration, but it should be very
> > > easy to override the write method in the two plugins to have it check the
> > > mime type and decide whether to call super.write or not.
> > > (One terrible way to do it with configuration only would be to configure
> > > only one of the indexers and use mimetype-filter to filter the matching
> > > type, and then reconfigure for the other indexer and change
> > > mimetype-filter.txt to the other mime type and index again...)
> > > 
> > > -----Original Message-----
> > > From: Roannel Fernández Hernández [mailto:[email protected]]
> > > Sent: 23 August 2017 18:05
> > > To: [email protected]
> > > Subject: Exchange documents in indexing job
> > > 
> > > Hi folks:
> > > 
> > > There is some way in Nutch to send some documents to a particular index
> > > writer according to particular values of fields?
> > > 
> > > I explain myself better. I have a document with a field called "mimetype"
> > > and I want to send to Solr only the documents with value "text/plain" for
> > > this field and send to RabbitMQ the documents with value "text/html". How
> > > can I do that?
> > > 
> > > Regards
> > > 
> > > La @universidad_uci es Fidel. Los jóvenes no fallaremos.
> > > #HastaSiempreComandante
> > > #HastalaVictoriaSiempre
> > > 
> > > 
> > 
> La @universidad_uci es Fidel. Los jóvenes no fallaremos.
> #HastaSiempreComandante
> #HastalaVictoriaSiempre
> 
> 

Reply via email to