Re: [Tagging] Data metamodels

2015-06-06 Thread Colin Smale
 

On 2015-06-06 15:55, Daniel Koć wrote: 

 W dniu 06.06.2015 11:29, Colin Smale napisał(a):
 
 Time to work towards an updated metamodel, with:
 
 * Multiple values (lists of values - sorting out the semicolon
 business?)
 
 Sure!
 
 * Complex values (data structures - formalising the namespace syntax?)
 
 Any example? I don't know what are you talking about.

Take addresses as an example. An address is composed of a set of fields,
like house number, postcode etc. These are mapped into the OSM tags
addr:street, addr:postcode and so on. You can consider an address
to be a reusable definition, which can be used in many contexts. The
current OSM syntax using the colon says that *this* use of street is
in the role of a part of an addr, and is semantically distinct from a
street used as part of some other collection of values. All the data
fields which are part of an addr are grouped together by the common
prefix addr:. But this usage of the colon to separate the namespace ID
from the field is not actually part of the data metamodel. The key
addr:housenumber is just a string and the colon is nothing special at
the moment. It all hangs together with a sort of unwritten gentleman's
agreement. 

 * Simple Polygon as a basic type (under construction without any
 tangible progress for years)
 
 What do you mean? This issue maybe?:
 
 http://wiki.openstreetmap.org/wiki/The_Future_of_Areas [1]

That is exactly what I mean. That article was created in 2011 and has
essentially gone nowhere since then. 

 These are all real-life things that cause a lot of energy to be
 expended in OSM, simply because we don't have a way of representing
 them in the metamodel.

 You're right. I also argue we need better category system, exactly because we 
 loose a lot of energy for trying to put some real-life objects into too 
 narrow and fixed categorization model.
 
 Time to take things to the next level!
 Any practical hints how to do it?

This is where it gets problematic. Any attempt in this direction will
necessarily restrict the freedom of mappers, by saying there is a right 
way and a wrong way to do something. The theoretical side of creating an
information metamodel is the easy bit. Getting the community to buy in
to 
something that will need support from every stakeholder in the OSM
ecosystem is a challenge that is better picked up by someone else with
more 
diplomacy and patience than me... It's part of what I do for a living,
and I try to pick my battles carefully. 

//colin 
 

Links:
--
[1] http://wiki.openstreetmap.org/wiki/The_Future_of_Areas
___
Tagging mailing list
Tagging@openstreetmap.org
https://lists.openstreetmap.org/listinfo/tagging


Re: [Tagging] Data metamodels

2015-06-06 Thread pmailkeey .
On 6 June 2015 at 15:36, Colin Smale colin.sm...@xs4all.nl wrote:

  On 2015-06-06 15:55, Daniel Koć wrote:

 W dniu 06.06.2015 11:29, Colin Smale napisał(a):

 Time to work towards an updated metamodel, with:

 * Multiple values (lists of values - sorting out the semicolon
 business?)


 Sure!

 * Complex values (data structures - formalising the namespace syntax?)


 Any example? I don't know what are you talking about.


 Take addresses as an example. An address is composed of a set of fields,
 like house number, postcode etc. These are mapped into the OSM tags
 addr:street, addr:postcode and so on. You can consider an address to
 be a reusable definition, which can be used in many contexts. The current
 OSM syntax using the colon says that *this* use of street is in the role
 of a part of an addr, and is semantically distinct from a street used
 as part of some other collection of values. All the data fields which are
 part of an addr are grouped together by the common prefix addr:. But
 this usage of the colon to separate the namespace ID from the field is not
 actually part of the data metamodel. The key addr:housenumber is just a
 string and the colon is nothing special at the moment. It all hangs
 together with a sort of unwritten gentleman's agreement.


My thoughts on address is that it is 'one value' composed of several
components. Whether any component is numerical or alphanumerical is
probably insignificant - an address is a name (string) not a number. I see
little need for splitting 'address' into separate tags (addr:number etc. )
and would prefer to see a value for address having its components separated
by commas. The number of components should not be limited although it's
unlikely to come across any more than ten. It is probably logical to treat
the components as we do with numbers (and hopefully dates) by listing the
components in order of significance with the most significant first:
address=usa,20500,DC,Washington,Pennsylvania Avenue NW,1600,The White
House. As addresses are mostly connected with routing, they should be on
the routing map rather than the topographical map - i.e. that addresses
should be complete and not rely on being in any defined area (state,
county, zip,settlement,street) so there is no reliance on reference to
other data - unless address data can be reliably generated from the
physical location of an address by topo map reading - which actually would
be a better way of holding data in the db.

Current address components are too few - forcing components to share keys.
The above suggestion would eliminate this and allow for standardisation.

It would appear that allowing keys to have componented values would solve
many problems such as currently being discussed shop=camera,video,frames

Semicolons are a peculiar choice of delimiter; commas are much more common
(CSV) and basing the component values being 'floating' (ideally most
important first), there will never be a double comma (,,) which allows such
to be used to generate a comma in a string.




 You're right. I also argue we need better category system, exactly because
 we loose a lot of energy for trying to put some real-life objects into too
 narrow and fixed categorization model.

 Time to take things to the next level!
 Any practical hints how to do it?



 This is where it gets problematic. Any attempt in this direction will
 necessarily restrict the freedom of mappers, by saying there is a right
 way and a wrong way to do something.


If that is true, is it a good thing or a bad thing ? It appears we're
already creating restrictions - and I don't think that can be helped if
there's going to be any hope of consistency at all. To the mapper, is the
category important ? It's a pub and I don't care what category it is in as
to me and many, it's not important. Even factual stuff is not important -
whether the pub is a building or a beer tent - what matters is the fact a
member of the public can expect to be able to purchase the typical things
found in pubs.




 The theoretical side of creating an information metamodel is the easy bit.
 Getting the community to buy in to
 something that will need support from every stakeholder in the OSM
 ecosystem is a challenge that is better picked up by someone else with more
 diplomacy and patience than me... It's part of what I do for a living, and
 I try to pick my battles carefully.



In fact, Colin, that is not the problem. The problem is OSM needs a MUCH
better decision-making engine. There is a lack of decisions - that is clear
from apparently dead proposals and it is clear that past decisions have
been made on incomplete data available at the time - such that things would
work better if they were changed. The other problem with OSM is getting to
the heart of it to progress anything - as surely, OSM must be the biggest
organisation based on the number of premises it uses taking into account
all the ad-hoc mappers!

There are times when 'tweaks' are great and