I've been thinking more about an API for the distiller that would
allow it to be embedded in a Python or Java program, and called
directly. Thanks to Python's ability to be linked as a library, this
could also be used as a C API, in a C or C++ program that wanted to
call the distiller directly. Here's what I've got so far (and this
is working in my code, called from Python so far):
package org.plkr.distiller.API;
public interface Invocation {
int invoke (java.lang.String[] arguments,
java.io.OutputStream optional_output_channel,
java.lang.String optional_input_String,
java.util.Hashtable optional_config_parameters,
org.plkr.distiller.API.Callback optional_status_callback);
};
public interface Callback {
int update (int number_collected, int number_in_queue);
};
"optional_output_channel" could be a real open file, or an in-memory
construct like ByteArrayOutputStream (or in Python, StringIO).
"optional_input_String" would be an alternative to passing the home
page as a file; it could just be passed as a big string.
"optional_config_parameters" would be a hashtable of string key/value
pairs, which would override any values read from config files (and
also override values read from environment variables?).
"optional_status_callback" would be an instance of a class, the
"update" method of which would be called whenever the distiller was
about to collect a new URL, and again after it finished.
In Java, the class org.plkr.distiller.InvokePluckerFromJava would
provide an implementation of org.plkr.distiller.API.Invocation. In
Python, the class PyPlucker.Spider would provide an implementation of
org.plkr.distiller.API.Invocation.
Comments welcome!
Bill
_______________________________________________
plucker-dev mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev