Thanks Paul, I upgraded to solr 1.4 and used the flatten attribute as you suggested. It works well.
> From: noble.p...@corp.aol.com > Date: Wed, 19 Aug 2009 15:05:48 +0530 > Subject: Re: Problems importing HTML content contained within XML document > To: solr-user@lucene.apache.org > > try this > <field column="textContent" xpath="/document/category/BODY" faltten="true"/> > > this should slurp al the tags under body > > On Wed, Aug 19, 2009 at 1:44 PM, venn hardy<venn.ha...@hotmail.com> wrote: > > > > Hello, > > > > I have just started trying out SOLR to index some XML documents that I > > receive. I am > > using the SOLR 1.3 and its HttpDataSource in conjunction with the > > XPathEntityProcessor. > > > > > > > > I am finding the data import really useful so far, but I am having a few > > problems when > > I try and import HTML contained within one of the XML tags <BODY>. The data > > import just seems > > to ignore the textContent silently but it imports everything else. > > > > > > > > When I do a query through the SOLR admin interface, only the id and author > > fields are displayed. > > > > Any ideas what I am doing wrong? > > > > > > > > Thanks > > > > > > > > This is what my dataConfig looks like: > > <dataConfig> > > <dataSource type="HttpDataSource" /> > > <document> > > <entity name="archive" pk="id" > > url="http://localhost:9080/data/20090817070752.xml" > > processor="XPathEntityProcessor" forEach="/document/category" > > transformer="DateFormatTransformer" stream="true" dataSource="dataSource"> > > <field column="id" xpath="/document/category/reference" /> > > <field column="textContent" xpath="/document/category/BODY" /> > > <field column="author" xpath="/document/category/author" /> > > </entity> > > </document> > > </dataConfig> > > > > > > > > This is how I have specified my schema > > <fields> > > <field name="id" type="string" indexed="true" stored="true" > > required="true" /> > > <field name="author" type="string" indexed="true" stored="true"/> > > <field name="textContent" type="text" indexed="true" stored="true" /> > > </fields> > > > > <uniqueKey>id</uniqueKey> > > <defaultSearchField>id</defaultSearchField> > > > > > > > > And this is what my XML document looks like: > > > > <document> > > <category> > > <reference>123456</reference> > > <author>Authori name</author> > > <BODY> > > <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit. > > Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus > > varius varius felis ut vestibulum</P> > > <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem > > elit, > > lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut > > vestibulum</P> > > <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem > > elit, > > lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut > > vestibulum</P> > > </BODY> > > </category> > > </document> > > > > _________________________________________________________________ > > Looking for a place to rent, share or buy this winter? Find your next place > > with Ninemsn property > > http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline&_t=774152450&_r=Domain_tagline&_m=EXT > > > > -- > ----------------------------------------------------- > Noble Paul | Principal Engineer| AOL | http://aol.com _________________________________________________________________ View photos of singles in your area Click Here http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fdating%2Eninemsn%2Ecom%2Eau%2Fsearch%2Fsearch%2Easpx%3Fexec%3Dgo%26tp%3Dq%26gc%3D2%26tr%3D1%26lage%3D18%26uage%3D55%26cl%3D14%26sl%3D0%26dist%3D50%26po%3D1%26do%3D2%26trackingid%3D1046138%26r2s%3D1&_t=773166090&_r=Hotmail_Endtext&_m=EXT