Re: Cant get HTMLStripTransformer's stripHTML to work in DIH.

2009-01-21 Thread Fergus McMenemie
Shalin Downloaded nightly for 21jan and tried DIH again. Its better but still broken. Dozens of embeded tags are stripped from documents but it now fails every few documents for no reason I can see. Manually removing embeded tags causes a given problem document to be indexed, only to have a it

Re: Cant get HTMLStripTransformer's stripHTML to work in DIH.

2009-01-21 Thread Shalin Shekhar Mangar
Hi Fergus, It seems a field it is expecting is missing from the XML. field column=fileAbsPath template=${jcurrent.fileAbsolutePath} / field column=fileWebPath regex=/Volumes/spare/ts/(.*) replaceWith=$1 sourceColName=*fileAbsePath*/ I guess fileAbsePath is a typo? Can you check if that is the

Re: Cant get HTMLStripTransformer's stripHTML to work in DIH.

2009-01-21 Thread Fergus McMenemie
Hi Fergus, It seems a field it is expecting is missing from the XML. You mean there is some field in the document we are indexing that is missing? field column=fileAbsPath template=${jcurrent.fileAbsolutePath} / field column=fileWebPath regex=/Volumes/spare/ts/(.*) replaceWith=$1

Cant get HTMLStripTransformer's stripHTML to work in DIH.

2009-01-19 Thread Fergus McMenemie
Hello all, I have the following DIH data-config.xml file. Adding HTMLStripTransformer and the associated stripHTML on the para tag seems to have broke things. I am using a nightly build from 12-jan-2009 The /record/sect1/para contains HTML sub tags which need to be discarded. Is my use of

Re: Cant get HTMLStripTransformer's stripHTML to work in DIH.

2009-01-19 Thread Fergus McMenemie
This looks fine. Can you post the stack trace? Yep, here is the juicy bit. Let me know if you need more. Jan 19, 2009 11:08:03 AM org.apache.catalina.startup.Catalina start INFO: Server startup in 2390 ms Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrCore execute INFO: [janesdocs]

Re: Cant get HTMLStripTransformer's stripHTML to work in DIH.

2009-01-19 Thread Fergus McMenemie
Hmmm, Just to clarify I retested the thing using the nightly as of today 18-jan-2009. The problem is still there and this traceback is from that nightly. This looks fine. Can you post the stack trace? Yep, here is the juicy bit. Let me know if you need more. Jan 19, 2009 11:08:03 AM

Re: Cant get HTMLStripTransformer's stripHTML to work in DIH.

2009-01-19 Thread Shalin Shekhar Mangar
Ah, it needs a null check for multi valued fields. I've committed a fix to trunk. The next nightly build should have it. You can checkout and build from the trunk if need this immediately. On Mon, Jan 19, 2009 at 7:02 PM, Fergus McMenemie fer...@twig.me.uk wrote: Hmmm, Just to clarify I