Re: [castor-user] can Castor handle XML "bundling" formats?

Werner Guttmann Wed, 16 Apr 2008 00:37:42 -0700

Godmar,

let me find some (good) time to address many of your questions. This is
really just a lack of tie currently ...


Werner

Godmar Back wrote:
> PS:
> 
> On Sat, Apr 5, 2008 at 9:31 AM, Godmar Back <[EMAIL PROTECTED]> wrote:
>> On Sat, Apr 5, 2008 at 5:16 AM, Werner Guttmann <[EMAIL PROTECTED]> wrote:
>>  >
>>  >  Given that I would like to know what your question really is about, let's
>>  > see what your reply is.
>>  >
>>
>>  Let me give a concrete, practical example.
>>
>>  I'd like to process the XML that comes from this URL:
>>
>>  
>> http://www.worldcat.org/webservices/registry/search/Institutions?version=1.1&operation=searchRetrieve&recordSchema=info%3Arfa%2FrfaRegistry%2FschemaInfos%2FadminData&maximumRecords=10&startRecord=1&resultSetTTL=300&recordPacking=xml&query=local.oclcAccountName+all+%22Virginia+Polytechnic+Institute+and+State+University%22+or+local.institutionAlias+all+%22Virginia+Polytechnic+Institute+and+State+University%22+or+local.institutionName+all+%22Virginia+Polytechnic+Institute+and+State+University%22
>>
>>  This is a XML format that bundles a set of search results. Let's focus
>>  on the first record retrieved. I extracted it and placed it here:
>>  http://top.cs.vt.edu/~gback/srw-record.xml
>>  The important excerpt looks like this:
>>
>>  <searchRetrieveResponse xmlns="http://www.loc.gov/zing/srw/";>
>>  ..... elements elided ....
>>  <recordData>
>>         <adminData:adminData>
>>
>>  
>> <adminData:resourceID>info:rfa/localhost/Institutions/3064</adminData:resourceID>
>>                  <adminData:briefLabel>VIRGINIA POLYTECHNIC
>>                                                     INSTITUTE AND
>>  STATE UNIVERSITY
>>                  </adminData:briefLabel>
>>                   ... details elided ....
>>         </adminData:adminData>
>>  </recordData>
>>  ....
>>
>>  The Schema that describes this XML is here:
>>  http://www.loc.gov/standards/sru/sru1-1archive/xml-files/srw-types.xsd
>>
>>  It describes <recordData> as:
>>
>>
>>  <xsd:element name="recordData" type="stringOrXmlFragment" nillable="false"/>
>>
>> <xsd:complexType name="stringOrXmlFragment" mixed="true">
>>     <xsd:sequence>
>>       <xsd:any namespace="##any" processContents="lax" minOccurs="0"
>>  maxOccurs="unbounded"/>
>>     </xsd:sequence>
>>   </xsd:complexType>
>>
>>  My code to process this XML starts out clean (I'm giving an
>>  abbreviated version here:)
>>
>>      SearchRetrieveResponse srr =
>>  (SearchRetrieveResponse)SearchRetrieveResponse.unmarshal(
>>                 new InputStreamReader(urlconn.getInputStream()));
>>      for (RecordType r : srr.getRecords().getRecord()) {
>>             RecordData rdata = r.getRecordData();
>>
>>  and now the ugliness starts:
>>
>>              for (Object o : rdata.getAnyObject()) {
>>                  AnyNode node = (AnyNode)o;
>>
>>  and subsequently, I need to work with "AnyNode", getNamespacePrefix(),
>>  getLocalName(), getFirstChild(), getNextSibling(), getNodeType() etc.
>>  I needed to write a recursive function in order to retrieve, for
>>  instance, the value of the "adminData:briefLabel" child. It's a lot
>>  more complicated, btw, then if I used XOM and XPath.
>>
> 
> I've since found the schema that expresses the subtree, and I can
> marshal the 'node' element retrieved above into a String, then
> unmarshal from that String, as in:
> 
>           for (Object o : rdata.getAnyObject()) {
>                 AnyNode node = (AnyNode)o;
> 
>                 Writer w = new StringWriter();
>                 Marshaller.marshal(node, w);
>                 AdminDataType adData = AdminData.unmarshal(new
> StringReader(w.toString()));
>                 System.out.println(">>>> " + adData.getBriefLabel());
> 
> This is, obviously, inefficient. Is there a better way of doing that?
> Can I tell Castor in some way that the XML that "AnyNode" represents
> can really be unmarshaled using type "AdminData"?
> 
>  - Godmar
> 
>>  Obviously, what I would like is to be able to specify a schema that
>>  describes the structure of the XML fragment within
>>  <recordData></recordData>, then tell Castor that this XML can occur
>>  there, and generate code for the entire thing and call, for instance
>>  rdata.getAdminData().getBriefData()
>>
>>  Is it possible to combine schemas in this way?
>>
>>  Note that this should be a pretty common challenge for XML processing
>>  - there are many formats where a schema describes the structure of an
>>  outer carrier format but allows for XML on the inside which is
>>  described separately elsewhere. (OAI-PMH and Atom are two other
>>  examples.)
>>
>>   - Godmar
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
> 
>     http://xircles.codehaus.org/manage_email
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

    http://xircles.codehaus.org/manage_email

Re: [castor-user] can Castor handle XML "bundling" formats?

Reply via email to