Re: Is Boilerpipe usable through Solr ExtractingUpdateHandler or the DIH?

Lance Norskog Sun, 09 Sep 2012 18:52:47 -0700

Cool! I have since learned another method for handling the redundant templated 
spew in html pages: crawl the mobile site instead.


----- Original Message -----
| From: "Markus Jelsma" <markus.jel...@openindex.io>
| To: solr-user@lucene.apache.org
| Sent: Friday, September 7, 2012 3:05:40 AM
| Subject: RE: Is Boilerpipe usable through Solr ExtractingUpdateHandler or the 
DIH?
| 
| It works indeed:
| https://issues.apache.org/jira/browse/SOLR-3808
|  
|  
| -----Original message-----
| > From:Markus Jelsma <markus.jel...@openindex.io>
| > Sent: Fri 07-Sep-2012 10:40
| > To: solr-user@lucene.apache.org
| > Subject: RE: Is Boilerpipe usable through Solr
| > ExtractingUpdateHandler or the DIH?
| > 
| > Hi,
| > 
| > It should not be so hard but it looks like the current
| > SolrContentHandler builds up the document via SAX-events. You
| > could pass a
| > BoilerpipeContentHandler((ContentHandler)parsingHandler,
| > BoilerpipeExtractor) to the parser in
| > ExtractingDocumentLoader.java. It should work.
| > 
| > Markus
| > 
| >  
| >  
| > -----Original message-----
| > > From:Lance Norskog <goks...@gmail.com>
| > > Sent: Thu 06-Sep-2012 05:51
| > > To: solr-user@lucene.apache.org
| > > Subject: Is Boilerpipe usable through Solr
| > > ExtractingUpdateHandler or the DIH?
| > > 
| > > Tika integrated Boilerpipe a few releases back. Is it possible to
| > > invoke it when using the ExtractingUpdateHandler (simple Tika)
| > > or the DataImportHandler?
| > > 
| > > http://code.google.com/p/boilerpipe/
| > > 
| > > 
| > > 
| > 
|

Re: Is Boilerpipe usable through Solr ExtractingUpdateHandler or the DIH?

Reply via email to