Hi Fergus,

When I debugged in the development console 
http://localhost:9080/solr/admin/dataimport.jsp?handler=/dataimport

I had no problems. Each category/item seems to be only indexed once, and no 
parent fields are available (except the category name).

I am not entirely sure how the forEach statement works, but my interpretation 
of forEach="/document/category/item | /document/category" is something like 
this:

1. Whenever DIH encounters a document/category it will extract the 
/document/category/

name field as a common field
2. Whenever DIH encounters a document/category/item it will extract all of the 
item fields.
3. When all fields have been encountered, save the document in solr and go to 
the next category/item

 
> Date: Thu, 10 Sep 2009 14:19:31 +0100
> To: solr-user@lucene.apache.org
> From: fer...@twig.me.uk
> Subject: RE: Extract info from parent node during data import
> 
> >Hi Paul,
> >The forEach="/document/category/item | /document/category/name" didn't work 
> >(no categoryname was stored or indexed).
> >However forEach="/document/category/item | /document/category" seems to work 
> >well. I am not sure why category on its own works, but not category/name...
> >But thanks for tip. It wasn't as painful as I thought it would be.
> >Venn
> 
> Hmmm, I had bother with this. Although each occurance of 
> /document/category/item 
> causes a new solr document to indexed, that document contained all the fields 
> from
> the parent element as well.
> 
> Did you see this?
> 
> >
> >> From: noble.p...@corp.aol.com
> >> Date: Thu, 10 Sep 2009 09:58:21 +0530
> >> Subject: Re: Extract info from parent node during data import
> >> To: solr-user@lucene.apache.org
> >> 
> >> try this
> >> 
> >> add two xpaths in your forEach
> >> 
> >> forEach="/document/category/item | /document/category/name"
> >> 
> >> and add a field as follows
> >> 
> >> <field column="catgoryname" xpath ="/document/category/name"
> >> commonField="true"/>
> >> 
> >> Please try it out and let me know.
> >> 
> >> On Thu, Sep 10, 2009 at 7:30 AM, venn hardy <venn.ha...@hotmail.com> wrote:
> >> >
> >> > Hello,
> >> >
> >> >
> >> >
> >> > I am using SOLR 1.4 (from nighly build) and its URLDataSource in 
> >> > conjunction with the XPathEntityProcessor. I have successfully imported 
> >> > XML content, but I think I may have found a limitation when it comes to 
> >> > the commonField attribute in the DataImportHandler.
> >> >
> >> >
> >> >
> >> > Before writing my own parser to read in a whole XML document, I thought 
> >> > I'd post the question here (since I got some great advice last time).
> >> >
> >> >
> >> >
> >> > The bulk of my content is contained within each <item> tag. However, 
> >> > each item has a parent called <category> and each category has a name 
> >> > which I would like to import. In my forEach loop I specify the 
> >> > /document/category/item as the collection of items I am interested in. 
> >> > Is there anyway to extract an element from underneath a parent node? To 
> >> > be a more more specific (see eg xml below). I would like to index the 
> >> > following:
> >> >
> >> > - category: Category 1; id: 1; author: Author 1
> >> >
> >> > - category: Category 1; id: 2; author: Author 2
> >> >
> >> > - category: Category 2; id: 3; author: Author 3
> >> >
> >> > - category: Category 2; id: 4; author: Author 4
> >> >
> >> >
> >> >
> >> > Any ideas on how I can get to a parent node from within a child during 
> >> > data import? If it cant be done, what do you suggest would be the best 
> >> > way so I can keep using the DataImportHandler... would XSLT be a good 
> >> > idea to 'flatten out' the structure a bit?
> >> >
> >> >
> >> >
> >> > Thanks
> >> >
> >> >
> >> >
> >> > This is what my XML document looks like:
> >> >
> >> > <document>
> >> > <category>
> >> > <name>Category 1</name>
> >> > <item>
> >> > <id>1</id>
> >> > <author>Author 1</author>
> >> > </item>
> >> > <item>
> >> > <id>2</id>
> >> > <author>Author 2</author>
> >> > </item>
> >> > </category>
> >> > <category>
> >> > <name>Category 2</name>
> >> > <item>
> >> > <id>3</id>
> >> > <author>Author 3</author>
> >> > </item>
> >> > <item>
> >> > <id>4</id>
> >> > <author>Author 4</author>
> >> > </item>
> >> > </category>
> >> > </document>
> >> >
> >> >
> >> >
> >> > And this is what my dataConfig looks like:
> >> > <dataConfig>
> >> > <dataSource type="URLDataSource" />
> >> > <document>
> >> > <entity name="archive" pk="id" 
> >> > url="http://localhost:9080/data/20090817070752.xml"; 
> >> > processor="XPathEntityProcessor" forEach="/document/category/item" 
> >> > transformer="DateFormatTransformer" stream="true" 
> >> > dataSource="dataSource">
> >> > <field column="category" xpath="/document/category/name" 
> >> > commonField="true" />
> >> > <field column="id" xpath="/document/category/item/id" />
> >> > <field column="author" xpath="/document/category/item/author" />
> >> > </entity>
> >> > </document>
> >> > </dataConfig>
> >> >
> >> >
> >> >
> >> > This is how I have specified my schema
> >> > <fields>
> >> > <field name="id" type="string" indexed="true" stored="true" 
> >> > required="true" />
> >> > <field name="author" type="string" indexed="true" stored="true"/>
> >> > <field name="category" type="string" indexed="true" stored="true"/>
> >> > </fields>
> >> >
> >> > <uniqueKey>id</uniqueKey>
> >> > <defaultSearchField>id</defaultSearchField>
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > _________________________________________________________________
> >> > Need a place to rent, buy or share? Let us find your next place for you!
> >> > http://clk.atdmt.com/NMN/go/157631292/direct/01/
> >> 
> >> 
> >> 
> >> -- 
> >> -----------------------------------------------------
> >> Noble Paul | Principal Engineer| AOL | http://aol.com
> >
> >_________________________________________________________________
> >Get Hotmail on your iPhone Find out how here
> >http://windowslive.ninemsn.com.au/article.aspx?id=845706
> 
> -- 
> 
> ===============================================================
> Fergus McMenemie Email:fer...@twig.me.uk
> Techmore Ltd Phone:(UK) 07721 376021
> 
> Unix/Mac/Intranets Analyst Programmer
> ===============================================================

_________________________________________________________________
Need a place to rent, buy or share? Let us find your next place for you! 
http://clk.atdmt.com/NMN/go/157631292/direct/01/

Reply via email to