I have an XML file that I would like to index, that has a structure similar to
this:
<data>
<user id="[id-num]">
<message date="[date]">[message text]</message>
...
</user>
...
</data>
I would like to have the documents in the index correspond to the messages in
the xml file, and have the user's [id-num] value stored as a field in each of
the user's documents. I think this means that I have to define an entity for
message that looks like this:
<dataConfig>
<dataSource type="FileDataSource" encoding="UTF-8" />
<document>
<entity name="message"
processor="XPathEntityProcessor"
stream="true"
forEach="/data/user/message/"
url="message-data.xml">
<field column="date" xpath="/data/user/message/@date"
dateTimeFormat="yyyy-MM-dd'T'hh:mm:ss"/>
<field column="text" xpath="/data/user/message" />
</entity>
</document>
</dataConfig>
but I don't know where to put the field definition for the user id. It would
look like
<field column="id" xpath="/data/user/@id" />
I can't put it within the message entity, because it is defined with
forEach="/data/user/message/" and the id field's xpath value is outside of the
entity's scope. Putting the id field definition there causes a null pointer
exception. I don't think I want to create a "user" entity that the "message"
entity is nested inside of, or is there a way to do that and still have the
index documents correspond to messages from the file? Are there one or more
attributes or values of attribute that I haven't run across in my searching
that provide a way to do what I need to do?
Thanks,
Mike