There is a nice HTML stripper inside Solr.
"solr.HTMLStripStandardTokenizerFactory" 

-----Original Message-----
From: Ahmed Hammad [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 05, 2008 10:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Regex Transformer Error

Hi,

It works with the attribute regex="<(.|\n)*?>"

Sorry for the disturbance.

Regards,

ahmd


On Wed, Nov 5, 2008 at 8:18 PM, Ahmed Hammad <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am using Solr 1.3 data import handler. One of my table fields has 
> html tags, I want to strip it of the field text. So obviously I need 
> the Regex Transformer.
>
> I added transformer="RegexTransformer" attribute to my entity and a 
> new field with:
>
> <field sourceColName="content" column="content" regex="English"
> replaceWith="XXXXX"/>
>
> Every thing works fine. The text is replace without any problem. The 
> provlem happend with my regular experession to strip html tags. So I 
> use regex="<(.|\n)*?>". Of course the charecters '<' and '>' are not 
> allowed in XML. I tried the following regex="&lt;(.|\n)*?&gt;" and 
> regex="&#3C;(.|\n)*?&#3E;" but I get the following error:
>
> The value of attribute "regex" associated with an element type "field"

> must not contain the '<' character. at 
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown 
> Source) ...
>
> The full stack trace is following:
>
> *FATAL: Could not create importer. DataImporter config invalid
> org.apache.solr.common.SolrException: FATAL: Could not create
importer.
> DataImporter config invalid at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> Handler.java:114)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> (DataImportHandler.java:206)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:131) at 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:232)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> cationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> lterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> lve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> lve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> va:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> va:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> e.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> :286)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> .java:857)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> cess(Http11AprProtocol.java:565) at 
> org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> 9) at java.lang.Thread.run(Unknown Source) Caused by:
> org.apache.solr.handler.dataimport.DataImportHandlerException: 
> Exception occurred while initializing context Processing Document # at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> orter.java:176)
> at
> org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
> va:93)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> Handler.java:106) ... 17 more Caused by: 
> org.xml.sax.SAXParseException: The value of attribute "regex" 
> associated with an element type "field" must not contain the '<'
> character. at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown 
> Source) at 
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
> own
> Source) at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> orter.java:166)
> ... 19 more *
>
> *description* *The server encountered an internal error (FATAL: Could 
> not create importer. DataImporter config invalid
> org.apache.solr.common.SolrException: FATAL: Could not create
importer.
> DataImporter config invalid at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> Handler.java:114)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody
> (DataImportHandler.java:206)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:131) at 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.
> java:303)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:232)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Appli
> cationFilterChain.java:235)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFi
> lterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperVa
> lve.java:233)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextVa
> lve.java:191)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.ja
> va:128)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.ja
> va:102)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValv
> e.java:109)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
> :286)
> at
> org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor
> .java:857)
> at
> org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.pro
> cess(Http11AprProtocol.java:565) at 
> org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:150
> 9) at java.lang.Thread.run(Unknown Source) Caused by:
> org.apache.solr.handler.dataimport.DataImportHandlerException: 
> Exception occurred while initializing context Processing Document # at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> orter.java:176)
> at
> org.apache.solr.handler.dataimport.DataImporter.<init>(DataImporter.ja
> va:93)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImport
> Handler.java:106) ... 17 more Caused by: 
> org.xml.sax.SAXParseException: The value of attribute "regex" 
> associated with an element type "field" must not contain the '<'
> character. at
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown 
> Source) at 
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unkn
> own
> Source) at
> org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataImp
> orter.java:166) ... 19 more ) that prevented it from fulfilling this 
> request.*
>
> I appreciate your help.
>
> Regards,
> ahmd
>
>

Reply via email to