I am attempting to get my first custom processor to execute.
1. Using the URLGenerator processor as a base:
a. I used IDEA's refactor/rename functionality on the class name, to clone
org.orbeon.oxf.processor.generator.URLGenerator
as
org.orbeon.oxf.processor.HttpPostProcessor.b. I defined my custom namespace by modifying URL_NAMESPACE_URI to
public static final String URL_NAMESPACE_URI = "com:wynnon:oxf:processors";
The new class compiled OK, and I added the resulting class tree to WEB-INF/classes/.
I have not made any other changes to the code, at the moment I'm just trying to produce the URLGenerator functionality in working code that I can modify to achieve the functionality I want.
2. I modified processors.xml in orbeon.jar, putting my processor into
a custom namespace (as per http://www.orbeon.com/ois/doc/home-changes, sec 5.2.5),
as follows:
a. Changed <processors ...> to add the wynn ns prefix
<processors xmlns:oxf="http://www.orbeon.com/oxf/processors" xmlns:wynn="com:wynnon:oxf:processors">
b. Added the following processor definition:
<processor name="wynn:http-post">
<class name="org.orbeon.oxf.processor.HttpPostProcessor" />
</processor>3. I invoked the processor with the following code
<p:config
xmlns:p="http://www.orbeon.com/oxf/pipeline"
xmlns:oxf="http://www.orbeon.com/oxf/processors"
xmlns:wynn="com:wynnon:oxf:processors"><!-- Generates a response from the cnn website (Learning) -->
<p:param type="input" name="instance"/>
<p:param type="output" name="data"/>
<p:processor name="wynn:http-post"
xmlns:p="http://www.orbeon.com/oxf/pipeline">
<p:input name="config">
<config>
<url>http://www.cnn.com</url>
<content-type>text/html</content-type>
</config>
</p:input>
<p:output name="data" ref="data"/>
</p:processor>
</p:config>
Execution failed with the following error:
Cannot find processor factory with name "{http://www.orbeon.com/oxf/processors}http-post"
I checked the Pipeline API Document and searched google and the list archives,
but have not found anything about setting up processor factories.
I have attached my processor source code to this note.
Questions --------- 1. Sorry if I missed it, but can you direct me to documentation/java code re: setting up processor factories?
2. What can I do to get my custom processor, above, running ?
3. Ideally, I would set my processor up so that it is not dependent on a customized
orbeon jar, which would allow me to simply replace orbeon.jar whenever a new
version comes out. At present, I modify processors.xml in the jar to create the
mapping of processor name to processor class. Is there a way to map processor
classes outside orbeon.jar ?
-- Bill Winspur Manager, Wynnon Systems Inc Mobile: 403-519-5889
/** * Copyright (C) 2004 Orbeon, Inc. * * This program is free software; you can redistribute it and/or modify it under the terms of the * GNU Lesser General Public License as published by the Free Software Foundation; either version * 2.1 of the License, or (at your option) any later version. * * This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; * without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * See the GNU Lesser General Public License for more details. * * The full text of the license is available at http://www.gnu.org/copyleft/lesser.html */ package org.orbeon.oxf.processor;
import org.apache.log4j.Logger;
import org.dom4j.Element;
import org.orbeon.oxf.cache.*;
import org.orbeon.oxf.common.OXFException;
import org.orbeon.oxf.common.ValidationException;
import org.orbeon.oxf.processor.*;
import org.orbeon.oxf.processor.generator.URLGenerator;
import org.orbeon.oxf.processor.generator.TidyConfig;
import org.orbeon.oxf.resources.ResourceManagerWrapper;
import org.orbeon.oxf.resources.URLFactory;
import org.orbeon.oxf.resources.oxf.Handler;
import org.orbeon.oxf.util.NetUtils;
import org.orbeon.oxf.xml.*;
import org.orbeon.oxf.xml.dom4j.LocationData;
import org.w3c.dom.Document;
import org.w3c.tidy.Tidy;
import org.xml.sax.*;
import javax.xml.parsers.SAXParser;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.sax.SAXResult;
import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
/**
* Posts an HTTP form to a URL, then generates SAX
* events from a document fetched from the URL.
*/
public class HttpPostProcessor extends ProcessorImpl {
private static Logger logger = Logger.getLogger(HttpPostProcessor.class);
private static final String DEFAULT_TEXT_ENCODING = "iso-8859-1";
private static final boolean DEFAULT_VALIDATING = false;
private static final boolean DEFAULT_FORCE_CONTENT_TYPE = false;
private static final boolean DEFAULT_FORCE_ENCODING = false;
private static final int CACHE_EXPIRATION_NO_CACHE = 0;
private static final int CACHE_EXPIRATION_NO_EXPIRATION = -1;
private static final int CACHE_EXPIRATION_LAST_MODIFIED = -2;
private static final boolean DEFAULT_CACHE_USE_LOCAL_CACHE = true;
private static final boolean DEFAULT_CACHE_ALWAYS_REVALIDATE = true;
private static final int DEFAULT_CACHE_EXPIRATION = CACHE_EXPIRATION_LAST_MODIFIED;
private static final String DEFAULT_TEXT_DOCUMENT_ELEMENT = "document";
private static final String DEFAULT_TEXT_LINE_ELEMENT = "line";
private static final String DEFAULT_BINARY_DOCUMENT_ELEMENT = "document";
public static final String URL_NAMESPACE_URI = "com:wynnon:oxf:processors";
public static final String VALIDATING_PROPERTY = "validating";
private Config config;
public HttpPostProcessor() {
addInputInfo(new ProcessorInputOutputInfo(INPUT_CONFIG, URL_NAMESPACE_URI));
addOutputInfo(new ProcessorInputOutputInfo(OUTPUT_DATA));
}
public HttpPostProcessor(String url) {
try {
this.config = new Config(URLFactory.createURL(url));
addOutputInfo(new ProcessorInputOutputInfo(OUTPUT_DATA));
} catch (MalformedURLException e) {
throw new OXFException(e);
}
}
public HttpPostProcessor(URL url) {
this.config = new Config(url);
addOutputInfo(new ProcessorInputOutputInfo(OUTPUT_DATA));
}
private static class Config {
private URL url;
private String contentType = ProcessorUtils.DEFAULT_CONTENT_TYPE;
private boolean forceContentType = DEFAULT_FORCE_CONTENT_TYPE;
private String encoding;
private boolean forceEncoding = DEFAULT_FORCE_ENCODING;
private boolean validating = DEFAULT_VALIDATING;
private Map headers;
private boolean cacheUseLocalCache = DEFAULT_CACHE_USE_LOCAL_CACHE;
private boolean cacheAlwaysRevalidate = DEFAULT_CACHE_ALWAYS_REVALIDATE;
private int cacheExpiration = DEFAULT_CACHE_EXPIRATION;
private TidyConfig tidyConfig;
public Config(URL url) {
this.url = url;
}
public Config(URL url, String contentType, boolean forceContentType, String
encoding, boolean forceEncoding,
boolean validating, Map headers, boolean cacheUseLocalCache,
boolean cacheAlwaysRevalidate, int cacheExpiration, TidyConfig tidyConfig) {
this.url = url;
this.contentType = contentType;
this.forceContentType = forceContentType;
this.encoding = encoding;
this.forceEncoding = forceEncoding;
this.validating = validating;
this.headers = headers;
this.cacheUseLocalCache = cacheUseLocalCache;
this.cacheAlwaysRevalidate = cacheAlwaysRevalidate;
this.cacheExpiration = cacheExpiration;
this.tidyConfig = tidyConfig;
}
public URL getURL() {
return url;
}
public String getContentType() {
return contentType;
}
public boolean isForceContentType() {
return forceContentType;
}
public String getEncoding() {
return encoding;
}
public boolean isForceEncoding() {
return forceEncoding;
}
public TidyConfig getTidyConfig() {
return tidyConfig;
}
public boolean isValidating() {
return validating;
}
public Map getHeaders() {
return headers;
}
public boolean isCacheUseLocalCache() {
return cacheUseLocalCache;
}
// public boolean isCacheAlwaysRevalidate() {
// return cacheAlwaysRevalidate;
// }
// public int getCacheExpiration() {
// return cacheExpiration;
// }
public String toString() {
return "[" + getURL().toExternalForm() + "|" + getContentType() + "|" +
isValidating() + "|" + tidyConfig + "]";
}
}
public ProcessorOutput createOutput(String name) {
ProcessorOutput output = new ProcessorOutputImpl(getClass(), name) {
public void readImpl(org.orbeon.oxf.pipeline.api.PipelineContext context,
ContentHandler contentHandler) {
// Read config input into a URL, cache if possible
Config config = HttpPostProcessor.this.config != null ?
HttpPostProcessor.this.config :
(Config) readCacheInputAsObject(context,
getInputByName(INPUT_CONFIG), new CacheableInputReader() {
public Object
read(org.orbeon.oxf.pipeline.api.PipelineContext context, ProcessorInput input) {
Element configElement = readInputAsDOM4J(context,
input).getRootElement();
// shortcut if the url is direct child of config
String url = configElement.getTextTrim();
if(url != null && !url.equals("") ) {
try {
return new Config(URLFactory.createURL(url));
} catch (MalformedURLException e) {
throw new OXFException(e);
}
}
// We have the /config/url syntax
url =
XPathUtils.selectStringValueNormalize(configElement, "/config/url");
// Get content-type
String contentType =
XPathUtils.selectStringValueNormalize(configElement, "/config/content-type");
boolean forceContentType =
ProcessorUtils.selectBooleanValue(configElement, "/config/force-content-type",
DEFAULT_FORCE_CONTENT_TYPE);
if (forceContentType && (contentType == null ||
contentType.equals("")))
throw new OXFException("The force-content-type
element requires a content-type element.");
// Get encoding
String encoding =
XPathUtils.selectStringValueNormalize(configElement, "/config/encoding");
boolean forceEncoding =
ProcessorUtils.selectBooleanValue(configElement, "/config/force-encoding",
DEFAULT_FORCE_ENCODING);
if (forceEncoding && (encoding == null ||
encoding.equals("")))
throw new OXFException("The force-encoding element
requires an encoding element.");
// Get headers
Map headers = new HashMap();
for (Iterator i =
configElement.selectNodes("/config/header").iterator(); i.hasNext();) {
Element headerElement = (Element) i.next();
String name =
headerElement.element("name").getStringValue();
String value =
headerElement.element("value").getStringValue();
headers.put(name, value);
}
// Validation setting: local, then properties, then
hard-coded default
boolean defaultValidating =
getPropertySet().getBoolean(VALIDATING_PROPERTY, DEFAULT_VALIDATING).booleanValue();
boolean validating =
ProcessorUtils.selectBooleanValue(configElement, "/config/validating",
defaultValidating);
// Cache control
boolean cacheUseLocalCache =
ProcessorUtils.selectBooleanValue(configElement,
"/config/cache-control/use-local-cache", DEFAULT_CACHE_USE_LOCAL_CACHE);
boolean cacheAlwaysRevalidate =
ProcessorUtils.selectBooleanValue(configElement,
"/config/cache-control/always-revalidate", DEFAULT_CACHE_ALWAYS_REVALIDATE);
int cacheExpiration =
ProcessorUtils.selectIntValue(configElement, "/config/cache-control/expiration",
DEFAULT_CACHE_EXPIRATION);
// Get Tidy config (will only apply if content-type is
text/html)
TidyConfig tidyConfig = new
TidyConfig(XPathUtils.selectSingleNode(configElement, "/config/tidy-options"));
// Create configuration object
try {
Config config = new
Config(URLFactory.createURL(url), contentType, forceContentType, encoding,
forceEncoding,
validating, headers, cacheUseLocalCache,
cacheAlwaysRevalidate, cacheExpiration, tidyConfig);
if (logger.isDebugEnabled())
logger.debug("Read configuration: " +
config.toString());
return config;
} catch (MalformedURLException e) {
throw new OXFException(e);
}
}
});
try {
// Never accept a null url
if (config.getURL() == null)
throw new OXFException("Missing configuration.");
// Create unique key and validity for the document
CacheKey key = new InternalCacheKey(HttpPostProcessor.this,
"urlDocument", config.toString());
// Resource from cache
Object cachedResource = null;
// Check if we can directly serve the resource from cache
// if (config.cacheExpiration != CACHE_EXPIRATION_LAST_MODIFIED) {
// // We don't use the last-modified header, but instead we use
an expiration value set by the user
// long cacheExpiration = (config.cacheExpiration < 0) ?
config.cacheExpiration : config.cacheExpiration * 1000; // time is in msb
// cachedResource =
ObjectCache.instance().findValidWithExpiration(context, key, cacheExpiration);
// if (cachedResource != null)
// ((SAXStore) cachedResource).replay(contentHandler);
// }
if (cachedResource == null) {
// We were unable to just replay from cache without accessing
the resource
// Decide whether to use read from the special oxf: handler or
the generic URL handler
ResourceHandler handler =
Handler.PROTOCOL.equals(config.getURL().getProtocol())
? (ResourceHandler) new OXFResourceHandler(config)
: (ResourceHandler) new URLResourceHandler(config);
try {
Object validity = handler.getValidity();
cachedResource = ObjectCache.instance().findValid(context,
key, validity);
if (cachedResource != null) {
// Just replay the cached resource
// NOTE: should we do this only with
config.isCacheUseLocalCache() = true?
((SAXStore) cachedResource).replay(contentHandler);
} else {
// We need to read the resource
// Find content-type to use. If the config says to
// force the content-type, we use the content-type
// provided by the user. Otherwise, we give the
// priority to the content-type provided by the
// connection, then the content-type provided by the
// user, then we use the default content-type (XML).
// The user will have to provide a content-type for
// example to read HTML documents with the file:
// protocol.
String contentType;
if (config.isForceContentType()) {
contentType = config.getContentType();
} else {
contentType = handler.getResourceContentType();
if (contentType == null)
contentType = config.getContentType();
if (contentType == null)
contentType =
ProcessorUtils.DEFAULT_CONTENT_TYPE;
}
// Create store for caching if necessary
ContentHandler output = config.isCacheUseLocalCache()
? new SAXStore(contentHandler) : contentHandler;
// Read resource
if
(ProcessorUtils.HTML_CONTENT_TYPE.equals(contentType)) {
handler.readHTML(output);
} else if
(ProcessorUtils.TEXT_CONTENT_TYPE.equals(contentType)) {
handler.readText(output);
} else if
(ProcessorUtils.XML_CONTENT_TYPE1.equals(contentType) ||
ProcessorUtils.XML_CONTENT_TYPE2.equals(contentType)) {
handler.readXML(output);
} else {
handler.readBinary(output);
}
// Cache the resource
if (config.isCacheUseLocalCache())
ObjectCache.instance().add(context, key, validity,
output);
}
} finally {
if (handler != null)
handler.destroy();
}
}
} catch (SAXParseException spe) {
throw new ValidationException(spe.getMessage(), new
LocationData(spe));
} catch (ValidationException e) {
LocationData locationData = e.getLocationData();
// The system id may not be set
if (locationData == null || locationData.getSystemID() == null)
e.setLocationData(new
LocationData(config.getURL().toExternalForm(), -1, -1));
throw e;
} catch (OXFException e) {
throw e;
} catch (Exception e) {
throw new ValidationException(e, new
LocationData(config.getURL().toExternalForm(), -1, -1));
}
}
private Config getConfig(org.orbeon.oxf.pipeline.api.PipelineContext
context) {
// Make sure the input is cacheable
OutputCacheKey outputKey = getInputKey(context,
getInputByName(INPUT_CONFIG));
if (outputKey == null) return null;
InputCacheKey key = new InputCacheKey(getInputByName(INPUT_CONFIG),
outputKey);
Object validity = getInputValidity(context,
getInputByName(INPUT_CONFIG));
if (validity == null) return null;
// Try to find resource manager key in cache
Config config = (Config) ObjectCache.instance().findValid(context,
key, validity);
if (logger.isDebugEnabled())
if (config != null)
logger.debug("Config found: " + config.toString());
else
logger.debug("Config not found");
return config;
}
public OutputCacheKey
getKeyImpl(org.orbeon.oxf.pipeline.api.PipelineContext context) {
Config config = HttpPostProcessor.this.config != null ?
HttpPostProcessor.this.config : getConfig(context);
return (config != null) ? new OutputCacheKey(this, config.toString())
: null;
}
public Object getValidityImpl(org.orbeon.oxf.pipeline.api.PipelineContext
context) {
Config config = HttpPostProcessor.this.config != null ?
HttpPostProcessor.this.config : getConfig(context);
try {
// We need the config to do more
if (config == null || config.getURL() == null)
return null;
ResourceHandler handler =
Handler.PROTOCOL.equals(config.getURL().getProtocol())
? (ResourceHandler) new OXFResourceHandler(config)
: (ResourceHandler) new URLResourceHandler(config);
try {
// FIXME: this can potentially be very slow with some URLs
return handler.getValidity();
} finally {
if (handler != null)
handler.destroy();
}
} catch (IOException e) {
return null;
}
}
};
addOutput(name, output);
return output;
}
private interface ResourceHandler {
public Object getValidity() throws IOException;
public String getResourceContentType() throws IOException;
public String getResourceEncoding() throws IOException;
public void destroy() throws IOException;
public void readHTML(ContentHandler output) throws IOException;
public void readText(ContentHandler output) throws IOException;
public void readXML(ContentHandler output) throws IOException;
public void readBinary(ContentHandler output) throws IOException;
}
private static class OXFResourceHandler implements ResourceHandler {
private Config config;
private String resourceManagerKey;
private InputStream inputStream;
public OXFResourceHandler(Config config) {
this.config = config;
}
public String getResourceContentType() throws IOException {
// We generally don't know the "connection" content-type
return null;
}
public String getResourceEncoding() throws IOException {
// We generally don't know the "connection" encoding
return null;
}
public Object getValidity() throws IOException {
getKey();
if (logger.isDebugEnabled())
logger.debug("OXF Protocol: Using ResourceManager for key " +
getKey());
long result = ResourceManagerWrapper.instance().lastModified(getKey());
// Zero and negative values often have a special meaning, make sure to
normalize here
return (result <= 0) ? null : new Long(result);
}
public void destroy() throws IOException {
if (inputStream != null) {
inputStream.close();
}
}
private String getEncoding() throws IOException {
if (config.isForceEncoding())
return config.getEncoding();
else
return getResourceEncoding();
}
public void readHTML(ContentHandler output) throws IOException {
inputStream =
ResourceManagerWrapper.instance().getContentAsStream(getKey());
URLResourceHandler.readHTML(inputStream, config.getTidyConfig(),
getEncoding(), output);
}
public void readText(ContentHandler output) throws IOException {
inputStream =
ResourceManagerWrapper.instance().getContentAsStream(getKey());
URLResourceHandler.readText(inputStream, getEncoding(), output);
}
public void readXML(ContentHandler output) throws IOException {
if (config.isForceEncoding()) {
// Special case, we force the encoding. We have to do the parsing
ourselves
// NOTE: Possibly, some resource managers may not support
getContentAsStream()
inputStream =
ResourceManagerWrapper.instance().getContentAsStream(getKey());
XMLUtils.readerToSAX(
new InputStreamReader(inputStream, config.getEncoding()),
config.getURL().toExternalForm(),
output, config.isValidating()
);
} else {
// Regular case, the resource manager does the job and autodetects the
encoding
ResourceManagerWrapper.instance().getContentAsSAX(getKey(), output);
}
}
public void readBinary(ContentHandler output) throws IOException {
inputStream =
ResourceManagerWrapper.instance().getContentAsStream(getKey());
URLResourceHandler.readBinary(inputStream, output);
}
private String getKey() {
if (resourceManagerKey == null)
resourceManagerKey = config.getURL().getFile();
return resourceManagerKey;
}
}
private static class URLResourceHandler implements ResourceHandler {
private Config config;
private URLConnection urlConn;
public URLResourceHandler(Config config) {
this.config = config;
}
public String getResourceContentType() throws IOException {
openConnection();
return NetUtils.getContentTypeContentType(urlConn.getContentType());
}
public String getResourceEncoding() throws IOException {
openConnection();
return NetUtils.getContentTypeCharset(urlConn.getContentType());
}
public Object getValidity() throws IOException {
openConnection();
long lastModified = NetUtils.getLastModified(urlConn);
// Zero and negative values often have a special meaning, make sure to
normalize here
return lastModified <= 0 ? null : new Long(lastModified);
}
public void destroy() throws IOException {
// Make sure the connection is closed because when
// getting the last modified date, the stream is
// actually opened. When using the file: protocol, the
// file can be locked on disk.
if (urlConn != null) {
urlConn.getInputStream().close();
}
}
private void openConnection() throws IOException {
if (urlConn == null) {
urlConn = config.getURL().openConnection();
Map headers = config.getHeaders();
if (headers != null) {
for (Iterator i = headers.keySet().iterator(); i.hasNext();) {
String name = (String) i.next();
String value = (String) config.getHeaders().get(name);
urlConn.setRequestProperty(name, value);
}
}
}
}
private String getEncoding() throws IOException {
if (config.isForceEncoding())
return config.getEncoding();
else
return getResourceEncoding();
}
public void readHTML(ContentHandler output) throws IOException {
openConnection();
readHTML(urlConn.getInputStream(), config.getTidyConfig(), getEncoding(),
output);
}
public void readText(ContentHandler output) throws IOException {
openConnection();
readText(urlConn.getInputStream(), getEncoding(), output);
}
public void readBinary(ContentHandler output) throws IOException {
openConnection();
readBinary(urlConn.getInputStream(), output);
}
public void readXML(ContentHandler output) throws IOException {
openConnection();
// Read the resource from the resource manager and parse it as XML
try {
SAXParser parser = XMLUtils.newSAXParser(config.isValidating());
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(output);
reader.setEntityResolver(XMLUtils.ENTITY_RESOLVER);
reader.setErrorHandler(XMLUtils.ERROR_HANDLER);
InputSource inputSource;
if (config.isForceContentType()) {
// This is a special case where the user wants to force an
encoding on an XML file
// NOTE: We do not support the case where the connection encoding
is used
inputSource = new InputSource(new
InputStreamReader(urlConn.getInputStream(), config.getEncoding()));
} else {
// This is the regular case where the XML parser autodetects the
encoding
inputSource = new InputSource(urlConn.getInputStream());
}
inputSource.setSystemId(config.getURL().toExternalForm());
reader.parse(inputSource);
} catch (SAXException e) {
throw new OXFException(e);
}
}
public static void readHTML(InputStream is, TidyConfig tidyConfig, String
encoding, ContentHandler output) throws IOException {
Tidy tidy = new Tidy();
// tidy.setOnlyErrors(false);
tidy.setShowWarnings(tidyConfig.isShowWarnings());
tidy.setQuiet(tidyConfig.isQuiet());
// Set encoding
// If the encoding is null, we get a default
tidy.setCharEncoding(TidyConfig.getTidyEncoding(encoding));
// Parse and output to SAXResult
Document document = tidy.parseDOM(is, null);
try {
Transformer transformer = TransformerUtils.getIdentityTransformer();
transformer.transform(new DOMSource(document), new SAXResult(output));
} catch (TransformerException e) {
throw new OXFException(e);
}
}
public static void readText(InputStream is, String encoding, ContentHandler
output) throws IOException {
if (encoding == null)
encoding = DEFAULT_TEXT_ENCODING;
BufferedReader br = new BufferedReader(new InputStreamReader(is,
encoding));
// Parse the input and output elements
ContentHandlerHelper helper = new ContentHandlerHelper(output);
helper.startDocument();
helper.startElement(DEFAULT_TEXT_DOCUMENT_ELEMENT);
String line;
while ((line = br.readLine()) != null) {
helper.element(DEFAULT_TEXT_LINE_ELEMENT, line);
}
helper.endElement();
helper.endDocument();
}
public static void readBinary(InputStream is, ContentHandler output) throws
IOException {
ContentHandlerHelper helper = new ContentHandlerHelper(output);
helper.startDocument();
helper.startElement(DEFAULT_BINARY_DOCUMENT_ELEMENT);
XMLUtils.inputStreamToBase64Characters(new BufferedInputStream(is),
output);
helper.endElement();
helper.endDocument();
}
}
}
