Xerces-C Tech Talk: Network based XML source

roddey 26 Jan 2000 01:17:02 -0000

Just to make sure everyone is up to date on the additions that have been
added to make the next release more friendly for "non-local file" oriented
XML parsing...

There is an abstract base class in the util/XMLNetAccesor.hpp named
XMLNetAccessor. Each platform support file can, upon request by the
platform independent part of the platform utils, create an object of a
class which implements this interface and reutrn it. This is done in the
method makeNetAccessor(). An implementation would tend to be very simple
and look like this:

XMLNetAccessor* XMLPlatformUtils::makeNetAccessor()
{
    return new MyKindaNetAccessor;
}

So it just has to create a net accessor object of the desired type and
return that object. If your platform does not support any network access,
just return a null pointer (zero) and this will tell the parser system that
such support is not available.

The XMLNetAccessor API is also very simple. Its sole reason in life is to
act as a factory for input streams that can stream in data from URL based
sources. If a URL is used to reference some XML, and a net accessor is
installed, its makeNew() method is called. This method is passed the URL
from which the XML should be retrieved. An implementation might look
something like this:

// Only I need to know about my type of socket
#include "MySocketInputStream.hpp"

BinInputStream* MyKindaNetAccessor::makeNew(const  XMLURL& urlSrc)
{
    // Call some sort of sock open function
    AHandle myHandle = FooBar::openSocket(urlSrc.getURLText());

   // And create a new stream of my type with this handle and return it
   return new MySocketInputStream(myHandle);
}


Since MySocketInputStream only needs to be known within your own Cpp file
for your netaccessor implementation, no one else needs to see its Hpp file.
So you can pass any system specific stuff to you want. Or, alternatively,
you could just pass it the URL and let it do all the work. Also, you could
just implement the stream class within the Cpp file itself, since it might
be quite simple and small.

The socket returned must implement the BinInputStream interface from
(surpise) util/BinInputStream.hpp. Here again, the API is very simple. All
it has to do is to return a buffer of date upon request. When the parser
needs more raw data to parse, it will call the readBytes() API on the
stream. This API must read up to, but not more than, the number of bytes
requested. When no more data can be read, it should return zero bytes read.
For efficiency's sake, it should read as much as it can in one shot. The
API looks like this:

virtual unsigned int readBytes
(
            XMLByte* const      toFill
    , const unsigned int        maxToRead
) = 0;

The only other API on the stream is the current position. This can easily
be maintaned by just having a counter member to which you add the number of
bytes read on each call.

When the parser is done with the stream, it will destroy it. At this point,
the stream can destroy the handle. Either the stream can do itself, or you
can provide APIs on your netaccessor implementation that the stream can
call to do basic operations like opening the socket, reading the socket,
and destroying the socket. Here again, your net accessor implementation
only needs to be seen by your platform util file and your input stream
implementation, so you can have any system specific APIs and parameters
between them that you want.

We will probably ship an experimental LibWWW based implementation with the
system for people to play with. It will not, as far as we can tell at this
point, be compiled into the actual binary drops. So, you will need to
rebuild to play with this. You can certainly do implementations based on
your own system services as well, or based on other third party support
libraries. If someone out there really familiar with WinInet under Win32
would like to do an implementation based on that, we'd love to accept it
and add it to the code base. As we understand it, the amount of code
required would probably be pretty minimal, but none of us here at this time
are really familiar with WinInet and we are too busy with the release at
this time to get into it.

NOTE: At this time, the URL won't be completely normalized. So, if you
require any normalized form, do it before you use the URL. You will be
guaranteed that the URL was successfully parsed into its constituent parts,
so its known to be correct to that degree. You can either call the methods
on the XMLURL object to get the individual parts or just pass the full text
to something which will parse it as it sees fit. Don't assume that what
went into the URL is what will come out, because some normalization can
happen, such as weaving together relative paths with the base, faulting in
the default port, and faulting in the local host if no host is provided. We
will make the URL class more full service in subsequent releases I'm sure,
but it will do the job well enough for now.

----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]
Xerces-C Tech Talk: Network based XML source

Reply via email to