Hi Fergus, When I debugged in the development console http://localhost:9080/solr/admin/dataimport.jsp?handler=/dataimport
I had no problems. Each category/item seems to be only indexed once, and no parent fields are available (except the category name). I am not entirely sure how the forEach statement works, but my interpretation of forEach="/document/category/item | /document/category" is something like this: 1. Whenever DIH encounters a document/category it will extract the /document/category/ name field as a common field 2. Whenever DIH encounters a document/category/item it will extract all of the item fields. 3. When all fields have been encountered, save the document in solr and go to the next category/item > Date: Thu, 10 Sep 2009 14:19:31 +0100 > To: solr-user@lucene.apache.org > From: fer...@twig.me.uk > Subject: RE: Extract info from parent node during data import > > >Hi Paul, > >The forEach="/document/category/item | /document/category/name" didn't work > >(no categoryname was stored or indexed). > >However forEach="/document/category/item | /document/category" seems to work > >well. I am not sure why category on its own works, but not category/name... > >But thanks for tip. It wasn't as painful as I thought it would be. > >Venn > > Hmmm, I had bother with this. Although each occurance of > /document/category/item > causes a new solr document to indexed, that document contained all the fields > from > the parent element as well. > > Did you see this? > > > > >> From: noble.p...@corp.aol.com > >> Date: Thu, 10 Sep 2009 09:58:21 +0530 > >> Subject: Re: Extract info from parent node during data import > >> To: solr-user@lucene.apache.org > >> > >> try this > >> > >> add two xpaths in your forEach > >> > >> forEach="/document/category/item | /document/category/name" > >> > >> and add a field as follows > >> > >> <field column="catgoryname" xpath ="/document/category/name" > >> commonField="true"/> > >> > >> Please try it out and let me know. > >> > >> On Thu, Sep 10, 2009 at 7:30 AM, venn hardy <venn.ha...@hotmail.com> wrote: > >> > > >> > Hello, > >> > > >> > > >> > > >> > I am using SOLR 1.4 (from nighly build) and its URLDataSource in > >> > conjunction with the XPathEntityProcessor. I have successfully imported > >> > XML content, but I think I may have found a limitation when it comes to > >> > the commonField attribute in the DataImportHandler. > >> > > >> > > >> > > >> > Before writing my own parser to read in a whole XML document, I thought > >> > I'd post the question here (since I got some great advice last time). > >> > > >> > > >> > > >> > The bulk of my content is contained within each <item> tag. However, > >> > each item has a parent called <category> and each category has a name > >> > which I would like to import. In my forEach loop I specify the > >> > /document/category/item as the collection of items I am interested in. > >> > Is there anyway to extract an element from underneath a parent node? To > >> > be a more more specific (see eg xml below). I would like to index the > >> > following: > >> > > >> > - category: Category 1; id: 1; author: Author 1 > >> > > >> > - category: Category 1; id: 2; author: Author 2 > >> > > >> > - category: Category 2; id: 3; author: Author 3 > >> > > >> > - category: Category 2; id: 4; author: Author 4 > >> > > >> > > >> > > >> > Any ideas on how I can get to a parent node from within a child during > >> > data import? If it cant be done, what do you suggest would be the best > >> > way so I can keep using the DataImportHandler... would XSLT be a good > >> > idea to 'flatten out' the structure a bit? > >> > > >> > > >> > > >> > Thanks > >> > > >> > > >> > > >> > This is what my XML document looks like: > >> > > >> > <document> > >> > <category> > >> > <name>Category 1</name> > >> > <item> > >> > <id>1</id> > >> > <author>Author 1</author> > >> > </item> > >> > <item> > >> > <id>2</id> > >> > <author>Author 2</author> > >> > </item> > >> > </category> > >> > <category> > >> > <name>Category 2</name> > >> > <item> > >> > <id>3</id> > >> > <author>Author 3</author> > >> > </item> > >> > <item> > >> > <id>4</id> > >> > <author>Author 4</author> > >> > </item> > >> > </category> > >> > </document> > >> > > >> > > >> > > >> > And this is what my dataConfig looks like: > >> > <dataConfig> > >> > <dataSource type="URLDataSource" /> > >> > <document> > >> > <entity name="archive" pk="id" > >> > url="http://localhost:9080/data/20090817070752.xml" > >> > processor="XPathEntityProcessor" forEach="/document/category/item" > >> > transformer="DateFormatTransformer" stream="true" > >> > dataSource="dataSource"> > >> > <field column="category" xpath="/document/category/name" > >> > commonField="true" /> > >> > <field column="id" xpath="/document/category/item/id" /> > >> > <field column="author" xpath="/document/category/item/author" /> > >> > </entity> > >> > </document> > >> > </dataConfig> > >> > > >> > > >> > > >> > This is how I have specified my schema > >> > <fields> > >> > <field name="id" type="string" indexed="true" stored="true" > >> > required="true" /> > >> > <field name="author" type="string" indexed="true" stored="true"/> > >> > <field name="category" type="string" indexed="true" stored="true"/> > >> > </fields> > >> > > >> > <uniqueKey>id</uniqueKey> > >> > <defaultSearchField>id</defaultSearchField> > >> > > >> > > >> > > >> > > >> > > >> > > >> > _________________________________________________________________ > >> > Need a place to rent, buy or share? Let us find your next place for you! > >> > http://clk.atdmt.com/NMN/go/157631292/direct/01/ > >> > >> > >> > >> -- > >> ----------------------------------------------------- > >> Noble Paul | Principal Engineer| AOL | http://aol.com > > > >_________________________________________________________________ > >Get Hotmail on your iPhone Find out how here > >http://windowslive.ninemsn.com.au/article.aspx?id=845706 > > -- > > =============================================================== > Fergus McMenemie Email:fer...@twig.me.uk > Techmore Ltd Phone:(UK) 07721 376021 > > Unix/Mac/Intranets Analyst Programmer > =============================================================== _________________________________________________________________ Need a place to rent, buy or share? Let us find your next place for you! http://clk.atdmt.com/NMN/go/157631292/direct/01/