PS:
On Sat, Apr 5, 2008 at 9:31 AM, Godmar Back <[EMAIL PROTECTED]> wrote:
> On Sat, Apr 5, 2008 at 5:16 AM, Werner Guttmann <[EMAIL PROTECTED]> wrote:
> >
> > Given that I would like to know what your question really is about, let's
> > see what your reply is.
> >
>
> Let me give a concrete, practical example.
>
> I'd like to process the XML that comes from this URL:
>
>
> http://www.worldcat.org/webservices/registry/search/Institutions?version=1.1&operation=searchRetrieve&recordSchema=info%3Arfa%2FrfaRegistry%2FschemaInfos%2FadminData&maximumRecords=10&startRecord=1&resultSetTTL=300&recordPacking=xml&query=local.oclcAccountName+all+%22Virginia+Polytechnic+Institute+and+State+University%22+or+local.institutionAlias+all+%22Virginia+Polytechnic+Institute+and+State+University%22+or+local.institutionName+all+%22Virginia+Polytechnic+Institute+and+State+University%22
>
> This is a XML format that bundles a set of search results. Let's focus
> on the first record retrieved. I extracted it and placed it here:
> http://top.cs.vt.edu/~gback/srw-record.xml
> The important excerpt looks like this:
>
> <searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/">
> ..... elements elided ....
> <recordData>
> <adminData:adminData>
>
>
> <adminData:resourceID>info:rfa/localhost/Institutions/3064</adminData:resourceID>
> <adminData:briefLabel>VIRGINIA POLYTECHNIC
> INSTITUTE AND
> STATE UNIVERSITY
> </adminData:briefLabel>
> ... details elided ....
> </adminData:adminData>
> </recordData>
> ....
>
> The Schema that describes this XML is here:
> http://www.loc.gov/standards/sru/sru1-1archive/xml-files/srw-types.xsd
>
> It describes <recordData> as:
>
>
> <xsd:element name="recordData" type="stringOrXmlFragment" nillable="false"/>
>
> <xsd:complexType name="stringOrXmlFragment" mixed="true">
> <xsd:sequence>
> <xsd:any namespace="##any" processContents="lax" minOccurs="0"
> maxOccurs="unbounded"/>
> </xsd:sequence>
> </xsd:complexType>
>
> My code to process this XML starts out clean (I'm giving an
> abbreviated version here:)
>
> SearchRetrieveResponse srr =
> (SearchRetrieveResponse)SearchRetrieveResponse.unmarshal(
> new InputStreamReader(urlconn.getInputStream()));
> for (RecordType r : srr.getRecords().getRecord()) {
> RecordData rdata = r.getRecordData();
>
> and now the ugliness starts:
>
> for (Object o : rdata.getAnyObject()) {
> AnyNode node = (AnyNode)o;
>
> and subsequently, I need to work with "AnyNode", getNamespacePrefix(),
> getLocalName(), getFirstChild(), getNextSibling(), getNodeType() etc.
> I needed to write a recursive function in order to retrieve, for
> instance, the value of the "adminData:briefLabel" child. It's a lot
> more complicated, btw, then if I used XOM and XPath.
>
I've since found the schema that expresses the subtree, and I can
marshal the 'node' element retrieved above into a String, then
unmarshal from that String, as in:
for (Object o : rdata.getAnyObject()) {
AnyNode node = (AnyNode)o;
Writer w = new StringWriter();
Marshaller.marshal(node, w);
AdminDataType adData = AdminData.unmarshal(new
StringReader(w.toString()));
System.out.println(">>>> " + adData.getBriefLabel());
This is, obviously, inefficient. Is there a better way of doing that?
Can I tell Castor in some way that the XML that "AnyNode" represents
can really be unmarshaled using type "AdminData"?
- Godmar
> Obviously, what I would like is to be able to specify a schema that
> describes the structure of the XML fragment within
> <recordData></recordData>, then tell Castor that this XML can occur
> there, and generate code for the entire thing and call, for instance
> rdata.getAdminData().getBriefData()
>
> Is it possible to combine schemas in this way?
>
> Note that this should be a pretty common challenge for XML processing
> - there are many formats where a schema describes the structure of an
> outer carrier format but allows for XML on the inside which is
> described separately elsewhere. (OAI-PMH and Atom are two other
> examples.)
>
> - Godmar
>
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email