On Sat, Apr 5, 2008 at 5:16 AM, Werner Guttmann <[EMAIL PROTECTED]> wrote:
>
>  Given that I would like to know what your question really is about, let's
> see what your reply is.
>

Let me give a concrete, practical example.

I'd like to process the XML that comes from this URL:

http://www.worldcat.org/webservices/registry/search/Institutions?version=1.1&operation=searchRetrieve&recordSchema=info%3Arfa%2FrfaRegistry%2FschemaInfos%2FadminData&maximumRecords=10&startRecord=1&resultSetTTL=300&recordPacking=xml&query=local.oclcAccountName+all+%22Virginia+Polytechnic+Institute+and+State+University%22+or+local.institutionAlias+all+%22Virginia+Polytechnic+Institute+and+State+University%22+or+local.institutionName+all+%22Virginia+Polytechnic+Institute+and+State+University%22

This is a XML format that bundles a set of search results. Let's focus
on the first record retrieved. I extracted it and placed it here:
http://top.cs.vt.edu/~gback/srw-record.xml
The important excerpt looks like this:

<searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/";>
..... elements elided ....
<recordData>
        <adminData:adminData>

<adminData:resourceID>info:rfa/localhost/Institutions/3064</adminData:resourceID>
                 <adminData:briefLabel>VIRGINIA POLYTECHNIC
                                                    INSTITUTE AND
STATE UNIVERSITY
                 </adminData:briefLabel>
                  ... details elided ....
        </adminData:adminData>
</recordData>
....

The Schema that describes this XML is here:
http://www.loc.gov/standards/sru/sru1-1archive/xml-files/srw-types.xsd

It describes <recordData> as:

<xsd:element name="recordData" type="stringOrXmlFragment" nillable="false"/>
<xsd:complexType name="stringOrXmlFragment" mixed="true">
    <xsd:sequence>
      <xsd:any namespace="##any" processContents="lax" minOccurs="0"
maxOccurs="unbounded"/>
    </xsd:sequence>
  </xsd:complexType>

My code to process this XML starts out clean (I'm giving an
abbreviated version here:)

     SearchRetrieveResponse srr =
(SearchRetrieveResponse)SearchRetrieveResponse.unmarshal(
                new InputStreamReader(urlconn.getInputStream()));
     for (RecordType r : srr.getRecords().getRecord()) {
            RecordData rdata = r.getRecordData();

and now the ugliness starts:

             for (Object o : rdata.getAnyObject()) {
                 AnyNode node = (AnyNode)o;

and subsequently, I need to work with "AnyNode", getNamespacePrefix(),
getLocalName(), getFirstChild(), getNextSibling(), getNodeType() etc.
I needed to write a recursive function in order to retrieve, for
instance, the value of the "adminData:briefLabel" child. It's a lot
more complicated, btw, then if I used XOM and XPath.

Obviously, what I would like is to be able to specify a schema that
describes the structure of the XML fragment within
<recordData></recordData>, then tell Castor that this XML can occur
there, and generate code for the entire thing and call, for instance
rdata.getAdminData().getBriefData()

Is it possible to combine schemas in this way?

Note that this should be a pretty common challenge for XML processing
- there are many formats where a schema describes the structure of an
outer carrier format but allows for XML on the inside which is
described separately elsewhere. (OAI-PMH and Atom are two other
examples.)

 - Godmar

---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email


Reply via email to