Hmmm,

Just to clarify I retested the thing using the nightly as of today
18-jan-2009. The problem is still there and this traceback is from
that nightly. 

>>This looks fine. Can you post the stack trace?
>>
>Yep, here is the juicy bit. Let me know if you need more.
>
>Jan 19, 2009 11:08:03 AM org.apache.catalina.startup.Catalina start
>INFO: Server startup in 2390 ms
>Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrCore execute
>INFO: [janesdocs] webapp=/solr path=/dataimport params={command=full-import} 
>status=0 QTime=12 
>Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.SolrWriter 
>readIndexerProperties
>INFO: Read dataimport.properties
>Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DataImporter 
>doFullImport
>INFO: Starting Full Import
>Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll
>INFO: [janesdocs] REMOVING ALL DOCUMENTS FROM INDEX
>Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrDeletionPolicy onInit
>INFO: SolrDeletionPolicy.onInit: commits:num=2
>       
> commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_1,version=1232363283058,generation=1,filenames=[segments_1]
>       
> commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_2,version=1232363283059,generation=2,filenames=[segments_2]
>Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrDeletionPolicy updateCommits
>INFO: last commit = 1232363283059
>Jan 19, 2009 11:14:06 AM 
>org.apache.solr.handler.dataimport.EntityProcessorBase applyTransformer
>WARNING: transformer threw error
>java.lang.NullPointerException
>       at java.io.StringReader.<init>(StringReader.java:33)
>       at 
> org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71)
>       at 
> org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54)
>       at 
> org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
>       at 
> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
>       at 
> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
>       at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
>       at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
>       at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
>       at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
>       at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
>       at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
>       at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
>Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DocBuilder 
>buildDocument
>SEVERE: Exception while processing: janescurrent document : null
>org.apache.solr.handler.dataimport.DataImportHandlerException: 
>java.lang.NullPointerException
>       at 
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64)
>       at 
> org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:203)
>       at 
> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
>       at 
> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
>       at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
>       at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
>       at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
>       at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
>       at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
>       at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
>       at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
>Caused by: java.lang.NullPointerException
>       at java.io.StringReader.<init>(StringReader.java:33)
>       at 
> org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71)
>       at 
> org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54)
>       at 
> org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
>       ... 9 more
>Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DataImporter 
>doFullImport
>SEVERE: Full Import failed
>org.apache.solr.handler.dataimport.DataImportHandlerException: 
>java.lang.NullPointerException
>       at 
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64)
>       at 
> org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:203)
>       at 
> org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197)
>       at 
> org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160)
>       at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313)
>       at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339)
>       at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202)
>       at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147)
>       at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321)
>       at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381)
>       at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362)
>Caused by: java.lang.NullPointerException
>       at java.io.StringReader.<init>(StringReader.java:33)
>       at 
> org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71)
>       at 
> org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54)
>       at 
> org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187)
>       ... 9 more
>Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 rollback
>INFO: start rollback
>Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 rollback
>INFO: end_rollback
>
>
>>On Mon, Jan 19, 2009 at 4:14 PM, Fergus McMenemie <fer...@twig.me.uk> wrote:
>>
>>> Hello all,
>>>
>>> I have the following DIH data-config.xml file. Adding
>>> HTMLStripTransformer and the associated stripHTML on the
>>> para tag seems to have broke things. I am using a nightly
>>> build from 12-jan-2009
>>>
>>> The /record/sect1/para contains HTML sub tags which need
>>> to be discarded. Is my use of stripHTML correct?
>>>
>>> <dataConfig>
>>>  <dataSource name="myfilereader" type="FileDataSource"/>
>>>  <document>
>>>     <entity name="jcurrent"
>>>        processor="FileListEntityProcessor"
>>>        fileName=".*xml"
>>>        newerThan="'NOW-1000DAYS'"
>>>        recursive="true"
>>>        rootEntity="false"
>>>        dataSource="null"
>>>        baseDir="/Volumes/spare/ts/jxml/data/news/groups">
>>>
>>>        <entity name="x"
>>>           dataSource="myfilereader"
>>>           processor="XPathEntityProcessor"
>>>           url="${jcurrent.fileAbsolutePath}"
>>>           stream="false"
>>>           forEach="/record"
>>>
>>> transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer">
>>>
>>>           <field column="fileAbsPath"
>>> template="${jcurrent.fileAbsolutePath}" />
>>>           <field column="fileWebPath" regex="/Volumes/spare/ts/(.*)"
>>> replaceWith="$1" sourceColName="fileAbsePath"/>
>>>           <field column="title"    xpath="/record/title" />
>>>           <field column="para"     xpath="/record/sect1/para"
>>> stripHTML="true" />
>>>           <field column="subject"
>>>  xpath="/record/metadata/subje...@qualifier='fullTitle']"   />
>>>           <field column="pubname"
>>>  xpath="/record/metadata/subje...@qualifier='publication']" />
>>>           <field column="pubdate"
>>>  xpath="/record/metadata/da...@qualifier='pubDate']"
>>> dateTimeFormat="yyyyMMdd"   />
>>>           </entity>
>>>        </entity>
>>>     </document>
>>>  </dataConfig>
>>>
>>> --
>>>
>>> ===============================================================
>>> Fergus McMenemie               
>>> Email:fer...@twig.me.uk<email%3afer...@twig.me.uk>
>>> Techmore Ltd                   Phone:(UK) 07721 376021
>>>
>>> Unix/Mac/Intranets             Analyst Programmer
>>> ===============================================================
>>>
>>
>>
>>
>>-- 
>>Regards,
>>Shalin Shekhar Mangar.
>
>-- 
>
>===============================================================
>Fergus McMenemie               Email:fer...@twig.me.uk
>Techmore Ltd                   Phone:(UK) 07721 376021
>
>Unix/Mac/Intranets             Analyst Programmer
>===============================================================

-- 

===============================================================
Fergus McMenemie               Email:fer...@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Reply via email to