I am attempting to get my first custom processor to execute.

1. Using the URLGenerator processor as a base:

a. I used IDEA's refactor/rename functionality on the class name, to clone

         org.orbeon.oxf.processor.generator.URLGenerator

      as
         org.orbeon.oxf.processor.HttpPostProcessor.

  b. I defined my custom namespace by modifying URL_NAMESPACE_URI to

public static final String URL_NAMESPACE_URI = "com:wynnon:oxf:processors";

The new class compiled OK, and I added the resulting class tree to WEB-INF/classes/.

  I have not made any other changes to the code, at the moment
  I'm just trying to produce the URLGenerator functionality in working
  code that I can modify to achieve the functionality I want.

2. I modified processors.xml in orbeon.jar, putting my processor into
a custom namespace (as per http://www.orbeon.com/ois/doc/home-changes, sec 5.2.5),
as follows:


  a. Changed <processors ...> to add the wynn ns prefix

     <processors xmlns:oxf="http://www.orbeon.com/oxf/processors";
           xmlns:wynn="com:wynnon:oxf:processors">

  b. Added the following processor definition:

  <processor name="wynn:http-post">
     <class name="org.orbeon.oxf.processor.HttpPostProcessor" />
  </processor>

3. I invoked the processor with the following code

  <p:config
       xmlns:p="http://www.orbeon.com/oxf/pipeline";
       xmlns:oxf="http://www.orbeon.com/oxf/processors";
       xmlns:wynn="com:wynnon:oxf:processors">

<!-- Generates a response from the cnn website (Learning) -->
<p:param type="input" name="instance"/>
<p:param type="output" name="data"/>
<p:processor name="wynn:http-post"
xmlns:p="http://www.orbeon.com/oxf/pipeline";>
<p:input name="config">
<config>
<url>http://www.cnn.com</url>
<content-type>text/html</content-type>
</config>
</p:input>
<p:output name="data" ref="data"/>
</p:processor>


</p:config>

Execution failed with the following error:

Cannot find processor factory with name "{http://www.orbeon.com/oxf/processors}http-post";

I checked the Pipeline API Document and searched google and the list archives,
but have not found anything about setting up processor factories.


I have attached my processor source code to this note.

Questions
---------
1. Sorry if I missed it, but can you direct me to documentation/java code
  re: setting up processor factories?

2. What can I do to get my custom processor, above, running ?

3. Ideally, I would set my processor up so that it is not dependent on a customized
orbeon jar, which would allow me to simply replace orbeon.jar whenever a new
version comes out. At present, I modify processors.xml in the jar to create the
mapping of processor name to processor class. Is there a way to map processor
classes outside orbeon.jar ?


--
Bill Winspur
Manager, Wynnon Systems Inc
Mobile: 403-519-5889

/**
 *  Copyright (C) 2004 Orbeon, Inc.
 *
 *  This program is free software; you can redistribute it and/or modify it under the 
terms of the
 *  GNU Lesser General Public License as published by the Free Software Foundation; 
either version
 *  2.1 of the License, or (at your option) any later version.
 *
 *  This program is distributed in the hope that it will be useful, but WITHOUT ANY 
WARRANTY;
 *  without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR 
PURPOSE.
 *  See the GNU Lesser General Public License for more details.
 *
 *  The full text of the license is available at 
http://www.gnu.org/copyleft/lesser.html
 */
package org.orbeon.oxf.processor;

import org.apache.log4j.Logger;
import org.dom4j.Element;
import org.orbeon.oxf.cache.*;
import org.orbeon.oxf.common.OXFException;
import org.orbeon.oxf.common.ValidationException;
import org.orbeon.oxf.processor.*;
import org.orbeon.oxf.processor.generator.URLGenerator;
import org.orbeon.oxf.processor.generator.TidyConfig;
import org.orbeon.oxf.resources.ResourceManagerWrapper;
import org.orbeon.oxf.resources.URLFactory;
import org.orbeon.oxf.resources.oxf.Handler;
import org.orbeon.oxf.util.NetUtils;
import org.orbeon.oxf.xml.*;
import org.orbeon.oxf.xml.dom4j.LocationData;
import org.w3c.dom.Document;
import org.w3c.tidy.Tidy;
import org.xml.sax.*;

import javax.xml.parsers.SAXParser;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.sax.SAXResult;
import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

/**
 * Posts an HTTP form to a URL, then generates SAX
 * events from a document fetched from the URL.
 */
public class HttpPostProcessor extends ProcessorImpl {

    private static Logger logger = Logger.getLogger(HttpPostProcessor.class);

    private static final String DEFAULT_TEXT_ENCODING = "iso-8859-1";

    private static final boolean DEFAULT_VALIDATING = false;

    private static final boolean DEFAULT_FORCE_CONTENT_TYPE = false;

    private static final boolean DEFAULT_FORCE_ENCODING = false;

    private static final int CACHE_EXPIRATION_NO_CACHE = 0;
    private static final int CACHE_EXPIRATION_NO_EXPIRATION = -1;
    private static final int CACHE_EXPIRATION_LAST_MODIFIED = -2;

    private static final boolean DEFAULT_CACHE_USE_LOCAL_CACHE = true;
    private static final boolean DEFAULT_CACHE_ALWAYS_REVALIDATE = true;
    private static final int DEFAULT_CACHE_EXPIRATION = CACHE_EXPIRATION_LAST_MODIFIED;

    private static final String DEFAULT_TEXT_DOCUMENT_ELEMENT = "document";
    private static final String DEFAULT_TEXT_LINE_ELEMENT = "line";

    private static final String DEFAULT_BINARY_DOCUMENT_ELEMENT = "document";

    public static final String URL_NAMESPACE_URI = "com:wynnon:oxf:processors";
    public static final String VALIDATING_PROPERTY = "validating";

    private Config config;

    public HttpPostProcessor() {
        addInputInfo(new ProcessorInputOutputInfo(INPUT_CONFIG, URL_NAMESPACE_URI));
        addOutputInfo(new ProcessorInputOutputInfo(OUTPUT_DATA));
    }

    public HttpPostProcessor(String url) {
        try {
            this.config = new Config(URLFactory.createURL(url));
            addOutputInfo(new ProcessorInputOutputInfo(OUTPUT_DATA));
        } catch (MalformedURLException e) {
            throw new OXFException(e);
        }
    }

    public HttpPostProcessor(URL url) {
        this.config = new Config(url);
        addOutputInfo(new ProcessorInputOutputInfo(OUTPUT_DATA));
    }

    private static class Config {
        private URL url;
        private String contentType = ProcessorUtils.DEFAULT_CONTENT_TYPE;
        private boolean forceContentType = DEFAULT_FORCE_CONTENT_TYPE;
        private String encoding;
        private boolean forceEncoding = DEFAULT_FORCE_ENCODING;
        private boolean validating = DEFAULT_VALIDATING;
        private Map headers;

        private boolean cacheUseLocalCache = DEFAULT_CACHE_USE_LOCAL_CACHE;
        private boolean cacheAlwaysRevalidate = DEFAULT_CACHE_ALWAYS_REVALIDATE;
        private int cacheExpiration = DEFAULT_CACHE_EXPIRATION;

        private TidyConfig tidyConfig;

        public Config(URL url) {
            this.url = url;
        }

        public Config(URL url, String contentType, boolean forceContentType, String 
encoding, boolean forceEncoding,
                      boolean validating, Map headers, boolean cacheUseLocalCache, 
boolean cacheAlwaysRevalidate, int cacheExpiration, TidyConfig tidyConfig) {
            this.url = url;
            this.contentType = contentType;
            this.forceContentType = forceContentType;
            this.encoding = encoding;
            this.forceEncoding = forceEncoding;
            this.validating = validating;
            this.headers = headers;

            this.cacheUseLocalCache = cacheUseLocalCache;
            this.cacheAlwaysRevalidate = cacheAlwaysRevalidate;
            this.cacheExpiration = cacheExpiration;

            this.tidyConfig = tidyConfig;
        }

        public URL getURL() {
            return url;
        }

        public String getContentType() {
            return contentType;
        }

        public boolean isForceContentType() {
            return forceContentType;
        }

        public String getEncoding() {
            return encoding;
        }

        public boolean isForceEncoding() {
            return forceEncoding;
        }

        public TidyConfig getTidyConfig() {
            return tidyConfig;
        }

        public boolean isValidating() {
            return validating;
        }

        public Map getHeaders() {
            return headers;
        }

        public boolean isCacheUseLocalCache() {
            return cacheUseLocalCache;
        }

//        public boolean isCacheAlwaysRevalidate() {
//            return cacheAlwaysRevalidate;
//        }

//        public int getCacheExpiration() {
//            return cacheExpiration;
//        }

        public String toString() {
            return "[" + getURL().toExternalForm() + "|" + getContentType() + "|" + 
isValidating() + "|" + tidyConfig + "]";
        }
    }

    public ProcessorOutput createOutput(String name) {
        ProcessorOutput output = new ProcessorOutputImpl(getClass(), name) {
            public void readImpl(org.orbeon.oxf.pipeline.api.PipelineContext context, 
ContentHandler contentHandler) {

                // Read config input into a URL, cache if possible
                Config config = HttpPostProcessor.this.config != null ? 
HttpPostProcessor.this.config :
                        (Config) readCacheInputAsObject(context, 
getInputByName(INPUT_CONFIG), new CacheableInputReader() {
                            public Object 
read(org.orbeon.oxf.pipeline.api.PipelineContext context, ProcessorInput input) {
                                Element configElement = readInputAsDOM4J(context, 
input).getRootElement();

                                // shortcut if the url is direct child of config
                                String url = configElement.getTextTrim();
                                if(url != null && !url.equals("") ) {
                                    try {
                                        return new Config(URLFactory.createURL(url));
                                    } catch (MalformedURLException e) {
                                        throw new OXFException(e);
                                    }
                                }

                                // We have the /config/url syntax
                                url =  
XPathUtils.selectStringValueNormalize(configElement, "/config/url");

                                // Get content-type
                                String contentType = 
XPathUtils.selectStringValueNormalize(configElement, "/config/content-type");
                                boolean forceContentType = 
ProcessorUtils.selectBooleanValue(configElement, "/config/force-content-type", 
DEFAULT_FORCE_CONTENT_TYPE);
                                if (forceContentType && (contentType == null || 
contentType.equals("")))
                                    throw new OXFException("The force-content-type 
element requires a content-type element.");

                                // Get encoding
                                String encoding = 
XPathUtils.selectStringValueNormalize(configElement, "/config/encoding");
                                boolean forceEncoding = 
ProcessorUtils.selectBooleanValue(configElement, "/config/force-encoding", 
DEFAULT_FORCE_ENCODING);
                                if (forceEncoding && (encoding == null || 
encoding.equals("")))
                                    throw new OXFException("The force-encoding element 
requires an encoding element.");

                                // Get headers
                                Map headers = new HashMap();
                                for (Iterator i = 
configElement.selectNodes("/config/header").iterator(); i.hasNext();) {
                                    Element headerElement = (Element) i.next();
                                    String name = 
headerElement.element("name").getStringValue();
                                    String value = 
headerElement.element("value").getStringValue();
                                    headers.put(name, value);
                                }

                                // Validation setting: local, then properties, then 
hard-coded default
                                boolean defaultValidating = 
getPropertySet().getBoolean(VALIDATING_PROPERTY, DEFAULT_VALIDATING).booleanValue();
                                boolean validating = 
ProcessorUtils.selectBooleanValue(configElement, "/config/validating", 
defaultValidating);

                                // Cache control
                                boolean cacheUseLocalCache = 
ProcessorUtils.selectBooleanValue(configElement, 
"/config/cache-control/use-local-cache", DEFAULT_CACHE_USE_LOCAL_CACHE);
                                boolean cacheAlwaysRevalidate = 
ProcessorUtils.selectBooleanValue(configElement, 
"/config/cache-control/always-revalidate", DEFAULT_CACHE_ALWAYS_REVALIDATE);
                                int cacheExpiration = 
ProcessorUtils.selectIntValue(configElement, "/config/cache-control/expiration", 
DEFAULT_CACHE_EXPIRATION);

                                // Get Tidy config (will only apply if content-type is 
text/html)
                                TidyConfig tidyConfig = new 
TidyConfig(XPathUtils.selectSingleNode(configElement, "/config/tidy-options"));

                                // Create configuration object
                                try {
                                    Config config = new 
Config(URLFactory.createURL(url), contentType, forceContentType, encoding, 
forceEncoding,
                                            validating, headers, cacheUseLocalCache, 
cacheAlwaysRevalidate, cacheExpiration, tidyConfig);
                                    if (logger.isDebugEnabled())
                                        logger.debug("Read configuration: " + 
config.toString());
                                    return config;
                                } catch (MalformedURLException e) {
                                    throw new OXFException(e);
                                }
                            }
                        });
                try {
                    // Never accept a null url
                    if (config.getURL() == null)
                        throw new OXFException("Missing configuration.");
                    // Create unique key and validity for the document
                    CacheKey key = new InternalCacheKey(HttpPostProcessor.this, 
"urlDocument", config.toString());

                    // Resource from cache
                    Object cachedResource = null;
                    // Check if we can directly serve the resource from cache
//                    if (config.cacheExpiration != CACHE_EXPIRATION_LAST_MODIFIED) {
//                        // We don't use the last-modified header, but instead we use 
an expiration value set by the user
//                        long cacheExpiration = (config.cacheExpiration < 0) ? 
config.cacheExpiration : config.cacheExpiration * 1000; // time is in msb
//                        cachedResource = 
ObjectCache.instance().findValidWithExpiration(context, key, cacheExpiration);
//                        if (cachedResource != null)
//                            ((SAXStore) cachedResource).replay(contentHandler);
//                    }

                    if (cachedResource == null) {
                        // We were unable to just replay from cache without accessing 
the resource

                        // Decide whether to use read from the special oxf: handler or 
the generic URL handler
                        ResourceHandler handler = 
Handler.PROTOCOL.equals(config.getURL().getProtocol())
                                ? (ResourceHandler) new OXFResourceHandler(config)
                                : (ResourceHandler) new URLResourceHandler(config);
                        try {
                            Object validity = handler.getValidity();
                            cachedResource = ObjectCache.instance().findValid(context, 
key, validity);
                            if (cachedResource != null) {
                                // Just replay the cached resource
                                // NOTE: should we do this only with 
config.isCacheUseLocalCache() = true?
                                ((SAXStore) cachedResource).replay(contentHandler);
                            } else {
                                // We need to read the resource

                                // Find content-type to use. If the config says to
                                // force the content-type, we use the content-type
                                // provided by the user. Otherwise, we give the
                                // priority to the content-type provided by the
                                // connection, then the content-type provided by the
                                // user, then we use the default content-type (XML).
                                // The user will have to provide a content-type for
                                // example to read HTML documents with the file:
                                // protocol.
                                String contentType;
                                if (config.isForceContentType()) {
                                    contentType = config.getContentType();
                                } else {
                                    contentType = handler.getResourceContentType();
                                    if (contentType == null)
                                        contentType = config.getContentType();
                                    if (contentType == null)
                                        contentType = 
ProcessorUtils.DEFAULT_CONTENT_TYPE;
                                }

                                // Create store for caching if necessary
                                ContentHandler output = config.isCacheUseLocalCache() 
? new SAXStore(contentHandler) : contentHandler;

                                // Read resource
                                if 
(ProcessorUtils.HTML_CONTENT_TYPE.equals(contentType)) {
                                    handler.readHTML(output);
                                } else if 
(ProcessorUtils.TEXT_CONTENT_TYPE.equals(contentType)) {
                                    handler.readText(output);
                                } else if 
(ProcessorUtils.XML_CONTENT_TYPE1.equals(contentType) || 
ProcessorUtils.XML_CONTENT_TYPE2.equals(contentType)) {
                                    handler.readXML(output);
                                } else {
                                    handler.readBinary(output);
                                }

                                // Cache the resource
                                if (config.isCacheUseLocalCache())
                                    ObjectCache.instance().add(context, key, validity, 
output);
                            }
                        } finally {
                            if (handler != null)
                                handler.destroy();
                        }
                    }
                } catch (SAXParseException spe) {
                    throw new ValidationException(spe.getMessage(), new 
LocationData(spe));
                } catch (ValidationException e) {
                    LocationData locationData = e.getLocationData();
                    // The system id may not be set
                    if (locationData == null || locationData.getSystemID() == null)
                        e.setLocationData(new 
LocationData(config.getURL().toExternalForm(), -1, -1));

                    throw e;

                } catch (OXFException e) {
                    throw e;
                } catch (Exception e) {
                    throw new ValidationException(e, new 
LocationData(config.getURL().toExternalForm(), -1, -1));
                }
            }

            private Config getConfig(org.orbeon.oxf.pipeline.api.PipelineContext 
context) {
                // Make sure the input is cacheable
                OutputCacheKey outputKey = getInputKey(context, 
getInputByName(INPUT_CONFIG));
                if (outputKey == null) return null;
                InputCacheKey key = new InputCacheKey(getInputByName(INPUT_CONFIG), 
outputKey);
                Object validity = getInputValidity(context, 
getInputByName(INPUT_CONFIG));
                if (validity == null) return null;
                // Try to find resource manager key in cache
                Config config = (Config) ObjectCache.instance().findValid(context, 
key, validity);
                if (logger.isDebugEnabled())
                    if (config != null)
                        logger.debug("Config found: " + config.toString());
                    else
                        logger.debug("Config not found");
                return config;
            }

            public OutputCacheKey 
getKeyImpl(org.orbeon.oxf.pipeline.api.PipelineContext context) {
                Config config = HttpPostProcessor.this.config != null ? 
HttpPostProcessor.this.config : getConfig(context);
                return (config != null) ? new OutputCacheKey(this, config.toString()) 
: null;
            }

            public Object getValidityImpl(org.orbeon.oxf.pipeline.api.PipelineContext 
context) {
                Config config = HttpPostProcessor.this.config != null ? 
HttpPostProcessor.this.config : getConfig(context);
                try {
                    // We need the config to do more
                    if (config == null || config.getURL() == null)
                        return null;

                    ResourceHandler handler = 
Handler.PROTOCOL.equals(config.getURL().getProtocol())
                            ? (ResourceHandler) new OXFResourceHandler(config)
                            : (ResourceHandler) new URLResourceHandler(config);

                    try {
                        // FIXME: this can potentially be very slow with some URLs
                        return handler.getValidity();
                    } finally {
                        if (handler != null)
                            handler.destroy();
                    }

                } catch (IOException e) {
                    return null;
                }
            }
        };
        addOutput(name, output);
        return output;
    }

    private interface ResourceHandler {
        public Object getValidity() throws IOException;
        public String getResourceContentType() throws IOException;
        public String getResourceEncoding() throws IOException;
        public void destroy() throws IOException;
        public void readHTML(ContentHandler output) throws IOException;
        public void readText(ContentHandler output) throws IOException;
        public void readXML(ContentHandler output) throws IOException;
        public void readBinary(ContentHandler output) throws IOException;
    }

    private static class OXFResourceHandler implements ResourceHandler {
        private Config config;
        private String resourceManagerKey;
        private InputStream inputStream;

        public OXFResourceHandler(Config config) {
            this.config = config;
        }

        public String getResourceContentType() throws IOException {
            // We generally don't know the "connection" content-type
            return null;
        }

        public String getResourceEncoding() throws IOException {
            // We generally don't know the "connection" encoding
            return null;
        }

        public Object getValidity() throws IOException {
            getKey();
            if (logger.isDebugEnabled())
                logger.debug("OXF Protocol: Using ResourceManager for key " + 
getKey());

            long result = ResourceManagerWrapper.instance().lastModified(getKey());
             // Zero and negative values often have a special meaning, make sure to 
normalize here
            return (result <= 0) ? null : new Long(result);
        }

        public void destroy() throws IOException {
            if (inputStream != null) {
                inputStream.close();
            }
        }

        private String getEncoding() throws IOException {
            if (config.isForceEncoding())
                return config.getEncoding();
            else
                return getResourceEncoding();
        }

        public void readHTML(ContentHandler output) throws IOException {
            inputStream = 
ResourceManagerWrapper.instance().getContentAsStream(getKey());
            URLResourceHandler.readHTML(inputStream, config.getTidyConfig(), 
getEncoding(), output);
        }

        public void readText(ContentHandler output) throws IOException {
            inputStream = 
ResourceManagerWrapper.instance().getContentAsStream(getKey());
            URLResourceHandler.readText(inputStream, getEncoding(), output);
        }

        public void readXML(ContentHandler output) throws IOException {
            if (config.isForceEncoding()) {
                // Special case, we force the encoding. We have to do the parsing 
ourselves
                // NOTE: Possibly, some resource managers may not support 
getContentAsStream()
                inputStream = 
ResourceManagerWrapper.instance().getContentAsStream(getKey());
                XMLUtils.readerToSAX(
                        new InputStreamReader(inputStream, config.getEncoding()),
                        config.getURL().toExternalForm(),
                        output, config.isValidating()
                );
            } else {
                // Regular case, the resource manager does the job and autodetects the 
encoding
                ResourceManagerWrapper.instance().getContentAsSAX(getKey(), output);
            }
        }

        public void readBinary(ContentHandler output) throws IOException {
            inputStream = 
ResourceManagerWrapper.instance().getContentAsStream(getKey());
            URLResourceHandler.readBinary(inputStream, output);
        }

        private String getKey() {
            if (resourceManagerKey == null)
                resourceManagerKey = config.getURL().getFile();
            return resourceManagerKey;
        }
    }

    private static class URLResourceHandler implements ResourceHandler {
        private Config config;
        private URLConnection urlConn;

        public URLResourceHandler(Config config) {
            this.config = config;
        }

        public String getResourceContentType() throws IOException {
            openConnection();
            return NetUtils.getContentTypeContentType(urlConn.getContentType());
        }

        public String getResourceEncoding() throws IOException {
            openConnection();
            return NetUtils.getContentTypeCharset(urlConn.getContentType());
        }

        public Object getValidity() throws IOException {
            openConnection();
            long lastModified = NetUtils.getLastModified(urlConn);
            // Zero and negative values often have a special meaning, make sure to 
normalize here
            return lastModified <= 0 ? null : new Long(lastModified);
        }

        public void destroy() throws IOException {
            // Make sure the connection is closed because when
            // getting the last modified date, the stream is
            // actually opened. When using the file: protocol, the
            // file can be locked on disk.
            if (urlConn != null) {
                urlConn.getInputStream().close();
            }
        }

        private void openConnection() throws IOException {
            if (urlConn == null) {
                urlConn = config.getURL().openConnection();
                Map headers = config.getHeaders();
                if (headers != null) {
                    for (Iterator i = headers.keySet().iterator(); i.hasNext();) {
                        String name = (String) i.next();
                        String value = (String) config.getHeaders().get(name);
                        urlConn.setRequestProperty(name, value);
                    }
                }
            }
        }

        private String getEncoding() throws IOException {
            if (config.isForceEncoding())
                return config.getEncoding();
            else
                return getResourceEncoding();
        }

        public void readHTML(ContentHandler output) throws IOException {
            openConnection();
            readHTML(urlConn.getInputStream(), config.getTidyConfig(), getEncoding(), 
output);
        }

        public void readText(ContentHandler output) throws IOException {
            openConnection();
            readText(urlConn.getInputStream(), getEncoding(), output);
        }

        public void readBinary(ContentHandler output) throws IOException {
            openConnection();
            readBinary(urlConn.getInputStream(), output);
        }

        public void readXML(ContentHandler output) throws IOException {
            openConnection();
            // Read the resource from the resource manager and parse it as XML
            try {
                SAXParser parser = XMLUtils.newSAXParser(config.isValidating());
                XMLReader reader = parser.getXMLReader();
                reader.setContentHandler(output);
                reader.setEntityResolver(XMLUtils.ENTITY_RESOLVER);
                reader.setErrorHandler(XMLUtils.ERROR_HANDLER);
                InputSource inputSource;
                if (config.isForceContentType()) {
                    // This is a special case where the user wants to force an 
encoding on an XML file
                    // NOTE: We do not support the case where the connection encoding 
is used
                    inputSource = new InputSource(new 
InputStreamReader(urlConn.getInputStream(), config.getEncoding()));
                } else {
                    // This is the regular case where the XML parser autodetects the 
encoding
                    inputSource = new InputSource(urlConn.getInputStream());
                }
                inputSource.setSystemId(config.getURL().toExternalForm());
                reader.parse(inputSource);
            } catch (SAXException e) {
                throw new OXFException(e);
            }
        }

        public static void readHTML(InputStream is, TidyConfig tidyConfig, String 
encoding, ContentHandler output) throws IOException {
            Tidy tidy = new Tidy();
//          tidy.setOnlyErrors(false);
            tidy.setShowWarnings(tidyConfig.isShowWarnings());
            tidy.setQuiet(tidyConfig.isQuiet());

            // Set encoding
            // If the encoding is null, we get a default
            tidy.setCharEncoding(TidyConfig.getTidyEncoding(encoding));

            // Parse and output to SAXResult
            Document document = tidy.parseDOM(is, null);
            try {
                Transformer transformer = TransformerUtils.getIdentityTransformer();
                transformer.transform(new DOMSource(document), new SAXResult(output));
            } catch (TransformerException e) {
                throw new OXFException(e);
            }
        }

        public static void readText(InputStream is, String encoding, ContentHandler 
output) throws IOException {
            if (encoding == null)
                encoding = DEFAULT_TEXT_ENCODING;
            BufferedReader br = new BufferedReader(new InputStreamReader(is, 
encoding));

            // Parse the input and output elements
            ContentHandlerHelper helper = new ContentHandlerHelper(output);
            helper.startDocument();
            helper.startElement(DEFAULT_TEXT_DOCUMENT_ELEMENT);

            String line;
            while ((line = br.readLine()) != null) {
                helper.element(DEFAULT_TEXT_LINE_ELEMENT, line);
            }

            helper.endElement();
            helper.endDocument();
        }

        public static void readBinary(InputStream is, ContentHandler output) throws 
IOException {
            ContentHandlerHelper helper = new ContentHandlerHelper(output);
            helper.startDocument();
            helper.startElement(DEFAULT_BINARY_DOCUMENT_ELEMENT);

            XMLUtils.inputStreamToBase64Characters(new BufferedInputStream(is), 
output);

            helper.endElement();
            helper.endDocument();
        }
    }
}

Reply via email to