Hi All, This email is just musings at this point. I'm not sure if I'll be able to implement anything anytime soon, but I'd be interested in people's thoughts on this.
Until now I've intentionally kept the core data classes in Osmosis as simple as possible to simplify maintenance and ensure consistency across all tasks. I've only added attributes that are required to support basic OSM data and avoided any extensions from creeping in. However it can be quite limiting when there is no way of passing additional data through the pipeline. Examples of additional data might be: - A "mutated" flag of some kind to flag when a particular entity has been changed and shouldn't be uploaded to the main API. An example is when ways are clipped at bounding box boundaries. - A "visible" flag. I hesitate to include this one because Osmosis supports this via change streams, not optional visible attributes. - Header information to be attached to the Bound element such as replication timestamp information, source URLs, etc. - Custom data exchanged between specialised tasks. For example, a polygon processing task might add full geometric information to a way. To add some flexibility I'm thinking along the following lines: - Add a new collection to entities that can be optionally populated with String/Object pairs. Conceptually similar to a Map<String, Object> but possibly stored like existing Tag objects in a simple Collection (currently implemented as an ArrayList) for efficiency. - The collection may be null when no data is required to minimise overhead in the common case. Consumers would need to explicitly check for null which is a tad ugly but I think warranted here. - Modify key tasks such as XML tasks to support serialising these additional values as attributes on the entities themselves (eg. <node id=1 version=1 ... mutated="true" /> ). Alternatively represent them as sub-elements (eg. metatag stored as <node id=1...><mtag k="mymtag" v="myvalue"></node>) . The object would simply have the toString method called on it to get a string representation. Reading from XML would result in a String object. - Tasks not caring about the data would simply pass the objects on without modification. - Some Sink tasks such as PostgreSQL database tasks would ignore the additional data. - Some tasks such as --bounding-box could add a flag such as "mutated". - Rename the existing Bound entity to something more generic like Header to allow more file attributes to be persisted. I think this approach would allow additional data to be attached to entities in a generic fashion without Osmosis itself having to add special support for it. It would keep the pipeline generic but allow specialised tasks to exchange their own custom data. I think representing the value part of data as an Object rather than String makes more sense because it allows custom tasks to exchange complete objects instead of forcing serialisation to and from String. The additional data could in theory be represented as Tags without changing the pipeline at all, but it gets messy mixing real data with metadata. I'm not sure if it makes sense to add support for this to the Bound object, or to simply allow Tag objects to be added instead. Perhaps tags make more sense here? The whole Bound concept has always fitted awkwardly in Osmosis, so I'm not sure how to tackle this one. Hmm, a somewhat rambling email :-) Any thoughts? Cheers Brett
_______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
