I've been experimenting with some changes to the way the Osmosis pipeline executes.
*Existing Operation* Currently, the typical interaction between a source task and its sink is as follows: - - Zero or more calls to process(xxxx). - One call to complete() if processing is successful. - One call to release() regardless of success or failure. This works well enough in most cases. The main disadvantage for current functionality is that *many* classes have to implement lazy initialisation and initialise on the first call to process. *New Operation* However there's a new feature I'd like to introduce. I'd like "header" information to be able to be passed through the pipeline. This will take the form of a Map<String, Object> and provide a generic way to pass additional meta data through the pipeline. The task interaction would now look like: - *One call to initialize(Map<String, Object>) at the start of processing. If startup fails it doesn't have to be called.* - Zero or more calls to process(xxxx). - One call to complete() if processing is successful. - One call to release() regardless of success or failure. *Reasons* This may be used for something as simple as passing additional information such as replication timestamps, but may also be used by closely related tasks to exchange more complex objects. My main driver for doing this right now is to allow me to decompose the current monolithic tasks used for replication into smaller tasks. For example, I can separate the apidb schema specific code which extracts data by tracking PostgreSQL specific transaction ids from the code that writes the data into change and state files. This allows the apidb code to then feed changes into other tasks (eg. constant updates streaming over HTTP). Along with the metatags now able to be attached to all entities, it is now possible to pass all kinds of additional data through the pipeline without extending the core. The XML tasks already support writing the recently added entity metatags as additional entity attributes and I'd like them to support this new global metadata as well by adding new XML attributes to the main <osm> or <osmChange> elements. Longer term I'd like to replace the existing Bound class with something like. Bound is currently being treated as a normal Entity like nodes, ways and relations but it is awkward and involves a number of hacks. Passing all bound information during pipeline startup would be much cleaner (I think). But that isn't a trivial task so will have to wait for another day. *Code Changes* The pipeline design hasn't changed much since it was introduced so this is a fairly significant change. The code is already implemented and at least compiles and passes unit tests. https://github.com/brettch/osmosis/tree/init All tasks have been updated where necessary to support the new initialize method, but I haven't updated tasks to take full advantage of it (eg. eliminate lazy initialization logic). Unless I hear any major objections I'll merge it into the master branch at least on my repository and probably the main openstreetmap/osmosis repository within the next few days. Let me know if you have any thoughts or suggestions. Brett
_______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
