I've had a few thoughts about the whole "move the code to the data"
concept (or "Move the code to the Service node") for some time,
considering it a low priority, I have kept quiet about it, until
recently when the topic came up during a recent email discussion.
Current Practise for River applications is to move code and data around
together in the form of marshalled objects. Two particular groups of
Objects are of interest, those that are process or code intensive where
methods process and create returned results and data intensive objects
where there is little to be done in the way of processing, where minor
copy / transformations are performed on existing state.
I think that the River platform addresses these Object groups quite
effectively when the processing is known at compile time or when the
service requirements are clear. However there are Occasions when it
would be less network intensive or simpler to submit the distributed
equivalent of a ScheduledTask or Runnable to consume an existing data
intensive service at the origin of that service and make the desired
result available via a temporary service or some other mechanism or
protocol. In cases where particular class files and libraries required
to perform processing are available at the service node, but unavailable
at the client due to a legacy java environment, no ability to load
remote class files, or a constrained memory environment that cannot
provide enough memory space for the processing required. The result of
the uploaded runnable class file can be transformed into a locally
available or compatible class file.
The Runnable uploaded code might be uploaded to the service node, by the
client or a third party mediator. Any suggestions for what the
mechanism should be would also be useful. I'm thinking that a signed
OSGi bundle containing a set of permissions would be a good model to
start from, considering that OSGi already has many of the Security
mechanisms that would make such a thing possible.
In essence the DistributedScheduledTask is a remote piece of client code
that is executed in the service node. I'm wondering just what should a
DistributedExecutorService provide, if anyone else has had thoughts
similar to mine.
For instance, a Reporting Node in a cluster might send out the same
DistributedScheduledTask to all available services of a particular type
to perform some intensive data processing or filtering remotely at each
node and retrieve the results from each after processing. The Reporting
Node might have changing reporting requirements similar to performing
queries for instance.
Cheers,
Peter.
- Distributed ExecutorService Peter Firmstone
-