Thanks Paul,
I upgraded to solr 1.4 and used the flatten attribute as you suggested. It 
works well.

> From: noble.p...@corp.aol.com
> Date: Wed, 19 Aug 2009 15:05:48 +0530
> Subject: Re: Problems importing HTML content contained within XML document
> To: solr-user@lucene.apache.org
> 
> try this
> <field column="textContent" xpath="/document/category/BODY" faltten="true"/>
> 
> this should slurp al the tags under body
> 
> On Wed, Aug 19, 2009 at 1:44 PM, venn hardy<venn.ha...@hotmail.com> wrote:
> >
> > Hello,
> >
> > I have just started trying out SOLR to index some XML documents that I 
> > receive. I am
> > using the SOLR 1.3 and its HttpDataSource in conjunction with the 
> > XPathEntityProcessor.
> >
> >
> >
> > I am finding the data import really useful so far, but I am having a few 
> > problems when
> > I try and import HTML contained within one of the XML tags <BODY>. The data 
> > import just seems
> > to ignore the textContent silently but it imports everything else.
> >
> >
> >
> > When I do a query through the SOLR admin interface, only the id and author 
> > fields are displayed.
> >
> > Any ideas what I am doing wrong?
> >
> >
> >
> > Thanks
> >
> >
> >
> > This is what my dataConfig looks like:
> > <dataConfig>
> >  <dataSource type="HttpDataSource" />
> >  <document>
> >  <entity name="archive" pk="id" 
> > url="http://localhost:9080/data/20090817070752.xml"; 
> > processor="XPathEntityProcessor" forEach="/document/category" 
> > transformer="DateFormatTransformer" stream="true" dataSource="dataSource">
> >         <field column="id" xpath="/document/category/reference" />
> >  <field column="textContent" xpath="/document/category/BODY" />
> >  <field column="author" xpath="/document/category/author" />
> >  </entity>
> >  </document>
> > </dataConfig>
> >
> >
> >
> > This is how I have specified my schema
> > <fields>
> >   <field name="id" type="string" indexed="true" stored="true" 
> > required="true" />
> >   <field name="author" type="string" indexed="true" stored="true"/>
> >   <field name="textContent" type="text" indexed="true" stored="true" />
> > </fields>
> >
> >  <uniqueKey>id</uniqueKey>
> >  <defaultSearchField>id</defaultSearchField>
> >
> >
> >
> > And this is what my XML document looks like:
> >
> > <document>
> >  <category>
> >  <reference>123456</reference>
> >  <author>Authori name</author>
> >  <BODY>
> >  <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
> >  Morbi lorem elit, lacinia ac blandit ac, tristique et ante. Phasellus 
> > varius varius felis ut vestibulum</P>
> >  <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem 
> > elit,
> >  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
> > vestibulum</P>
> >  <P>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi lorem 
> > elit,
> >  lacinia ac blandit ac, tristique et ante. Phasellus varius varius felis ut 
> > vestibulum</P>
> >  </BODY>
> >  </category>
> > </document>
> >
> > _________________________________________________________________
> > Looking for a place to rent, share or buy this winter? Find your next place 
> > with Ninemsn property
> > http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fninemsn%2Edomain%2Ecom%2Eau%2F%3Fs%5Fcid%3DFDMedia%3ANineMSN%5FHotmail%5FTagline&_t=774152450&_r=Domain_tagline&_m=EXT
> 
> 
> 
> -- 
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com

_________________________________________________________________
View photos of singles in your area Click Here
http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fdating%2Eninemsn%2Ecom%2Eau%2Fsearch%2Fsearch%2Easpx%3Fexec%3Dgo%26tp%3Dq%26gc%3D2%26tr%3D1%26lage%3D18%26uage%3D55%26cl%3D14%26sl%3D0%26dist%3D50%26po%3D1%26do%3D2%26trackingid%3D1046138%26r2s%3D1&_t=773166090&_r=Hotmail_Endtext&_m=EXT

Reply via email to