Hmmm, Just to clarify I retested the thing using the nightly as of today 18-jan-2009. The problem is still there and this traceback is from that nightly.
>>This looks fine. Can you post the stack trace? >> >Yep, here is the juicy bit. Let me know if you need more. > >Jan 19, 2009 11:08:03 AM org.apache.catalina.startup.Catalina start >INFO: Server startup in 2390 ms >Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrCore execute >INFO: [janesdocs] webapp=/solr path=/dataimport params={command=full-import} >status=0 QTime=12 >Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.SolrWriter >readIndexerProperties >INFO: Read dataimport.properties >Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DataImporter >doFullImport >INFO: Starting Full Import >Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 deleteAll >INFO: [janesdocs] REMOVING ALL DOCUMENTS FROM INDEX >Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrDeletionPolicy onInit >INFO: SolrDeletionPolicy.onInit: commits:num=2 > > commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_1,version=1232363283058,generation=1,filenames=[segments_1] > > commit{dir=/Volumes/spare/ts/solrnightlyjanes/data/index,segFN=segments_2,version=1232363283059,generation=2,filenames=[segments_2] >Jan 19, 2009 11:14:06 AM org.apache.solr.core.SolrDeletionPolicy updateCommits >INFO: last commit = 1232363283059 >Jan 19, 2009 11:14:06 AM >org.apache.solr.handler.dataimport.EntityProcessorBase applyTransformer >WARNING: transformer threw error >java.lang.NullPointerException > at java.io.StringReader.<init>(StringReader.java:33) > at > org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71) > at > org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54) > at > org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362) >Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DocBuilder >buildDocument >SEVERE: Exception while processing: janescurrent document : null >org.apache.solr.handler.dataimport.DataImportHandlerException: >java.lang.NullPointerException > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64) > at > org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:203) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362) >Caused by: java.lang.NullPointerException > at java.io.StringReader.<init>(StringReader.java:33) > at > org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71) > at > org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54) > at > org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187) > ... 9 more >Jan 19, 2009 11:14:06 AM org.apache.solr.handler.dataimport.DataImporter >doFullImport >SEVERE: Full Import failed >org.apache.solr.handler.dataimport.DataImportHandlerException: >java.lang.NullPointerException > at > org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64) > at > org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:203) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:197) > at > org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:160) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:313) > at > org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:339) > at > org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:202) > at > org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:147) > at > org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:321) > at > org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:381) > at > org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:362) >Caused by: java.lang.NullPointerException > at java.io.StringReader.<init>(StringReader.java:33) > at > org.apache.solr.handler.dataimport.HTMLStripTransformer.stripHTML(HTMLStripTransformer.java:71) > at > org.apache.solr.handler.dataimport.HTMLStripTransformer.transformRow(HTMLStripTransformer.java:54) > at > org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransformer(EntityProcessorBase.java:187) > ... 9 more >Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 rollback >INFO: start rollback >Jan 19, 2009 11:14:06 AM org.apache.solr.update.DirectUpdateHandler2 rollback >INFO: end_rollback > > >>On Mon, Jan 19, 2009 at 4:14 PM, Fergus McMenemie <fer...@twig.me.uk> wrote: >> >>> Hello all, >>> >>> I have the following DIH data-config.xml file. Adding >>> HTMLStripTransformer and the associated stripHTML on the >>> para tag seems to have broke things. I am using a nightly >>> build from 12-jan-2009 >>> >>> The /record/sect1/para contains HTML sub tags which need >>> to be discarded. Is my use of stripHTML correct? >>> >>> <dataConfig> >>> <dataSource name="myfilereader" type="FileDataSource"/> >>> <document> >>> <entity name="jcurrent" >>> processor="FileListEntityProcessor" >>> fileName=".*xml" >>> newerThan="'NOW-1000DAYS'" >>> recursive="true" >>> rootEntity="false" >>> dataSource="null" >>> baseDir="/Volumes/spare/ts/jxml/data/news/groups"> >>> >>> <entity name="x" >>> dataSource="myfilereader" >>> processor="XPathEntityProcessor" >>> url="${jcurrent.fileAbsolutePath}" >>> stream="false" >>> forEach="/record" >>> >>> transformer="DateFormatTransformer,TemplateTransformer,RegexTransformer,HTMLStripTransformer"> >>> >>> <field column="fileAbsPath" >>> template="${jcurrent.fileAbsolutePath}" /> >>> <field column="fileWebPath" regex="/Volumes/spare/ts/(.*)" >>> replaceWith="$1" sourceColName="fileAbsePath"/> >>> <field column="title" xpath="/record/title" /> >>> <field column="para" xpath="/record/sect1/para" >>> stripHTML="true" /> >>> <field column="subject" >>> xpath="/record/metadata/subje...@qualifier='fullTitle']" /> >>> <field column="pubname" >>> xpath="/record/metadata/subje...@qualifier='publication']" /> >>> <field column="pubdate" >>> xpath="/record/metadata/da...@qualifier='pubDate']" >>> dateTimeFormat="yyyyMMdd" /> >>> </entity> >>> </entity> >>> </document> >>> </dataConfig> >>> >>> -- >>> >>> =============================================================== >>> Fergus McMenemie >>> Email:fer...@twig.me.uk<email%3afer...@twig.me.uk> >>> Techmore Ltd Phone:(UK) 07721 376021 >>> >>> Unix/Mac/Intranets Analyst Programmer >>> =============================================================== >>> >> >> >> >>-- >>Regards, >>Shalin Shekhar Mangar. > >-- > >=============================================================== >Fergus McMenemie Email:fer...@twig.me.uk >Techmore Ltd Phone:(UK) 07721 376021 > >Unix/Mac/Intranets Analyst Programmer >=============================================================== -- =============================================================== Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================