Martin, if it walks like a duck and quacks like a duck, then it had better be a duck... What I mean with this, is if the grammar is so English-like such that people are tempted to use constructions which are not (or not quite) supported by the grammar, or if the way it works is contrary to how the English language would interpret it, then "errors will occur". Plus, of course, that the majority of users will not have English as their first language, and we have to make this accessible to the man-in-the-street and not allow it to become so obfuscated that only PhD's need apply.

Bottom line: I question whether making it kind of pseudo-English-like is the right goal to aim at. A simple grammar which is (mis)understood equally over the whole world might be better. Your post below is full of examples supporting my point. The grammar should be derived from what you are trying to model, just as a (descriptive) grammar for English is reverse-engineered from the way the language is used. If you start with the premise that the answer must be expressed in ANTLR and shouldn't include brackets, that's putting the cart before the horse. Please feel free to carry on with your experimentation to see if you can make a grammar on this basis, but remember that humans have to read and write this stuff (which does not detract from my earlier assertion that machine-readability is a slightly higher priority than human-readability) and they often need clear boundaries to make the most of their creativity. If you put a child in the middle of an "infinitely large" field with no boundaries obvious to the child, they won't move far from where you put them. If you put the same child in a large fenced enclosure, they will explore every inch. Give a child a paintbrush in front of a huge blank wall and you will get a small picture. Mark out a "frame" on the wall and say "paint in this" and it will all get used. Give a mapper no limits on tagging, and many things will not get tagged (because of inner doubts about how to do it). Give the same mapper a menu of 100 tags which can be used, and he will use many more of them.

> Human language is sadly not very precise: "except taxis AND bicycles" does not mean, you must be in a vehicle that is both (it means if not taxis AND if not bicycles),

The human language here is extremely precise to any fluent English-speaker. It means what it says. It's the IT-based interpretation of the word AND which leads to the grammar misinterpreting the intention. Think of the expression:

    a * b + c * d

To the "untrained eye" this may appear ambiguous or be interpreted differently to how a compiler will interpret it. Nonetheless it's valid code and no compiler will complain. However style-wise there is a school of thought that such constructions are unsafe because a "bug" caused by precedence problems is difficult to find by a quick inspection. My mandating the use of parentheses to make the programmer's intentions clear to a code-reviewer helps to detect bugs early, and has the desirable side-effect of making the programmer think just a little bit harder about how it's going to work out. Prevention is better than cure: anything making it less likely that "coding errors" make it into the database is most definitely a Good Thing To Have. Grammars which allow "just about everything" are a pain because they frustrate this error checking and delay error detection considerably, often relying on a user to report an anomaly, triggering all kinds of incident management and problem management processes and costing thousands of times more to fix than if the input validation had stopped the error occurring in the first place.

"If I were king" I would be looking for a system that:
* makes common cases easy
* makes complex cases possible
* makes each rule as standalone as possible (one sign -> one rule)
* does not rely on the user's fluency in English grammar (knowledge of a set of specific words, e.g. tags and functions, is fine) * uses grammatical constructions which are familiar to most people, or easily learnt
* has a grammar allowing for a user-extensible function repertoire
* allowing user-defined functions to be stored in an external file (accessible at entry and run time)
* fits comfortably with the key-value-pair paradigm of OSM
* can produce a result in a variety of data types including at least boolean, number, string
* can use the value of other tags as variables
* can use other variables supplied at run time (e.g. weather, time, vehicle type)
* supports the usual comparision and numeric operations
* supports string concatenation operation

Colin

On 14/06/2012 22:19, martinq wrote:
Hi,

sadly this discussion was restarted before I could establish a reference implementation for a less technical way of tagging conditional values (for those who are interested: it is a ANTLR grammar, hopefully with built-in evaluation code). The reference implementation is for me a key for acceptance, because the less technical to tag, to more difficult to parse. And we all agree that it should be possible to parse it - but not necessarily easy.

Objective of my proposal: As less rules as possible - as close as possible to the sign-posted information.

The proposal page does not contain a lot of information, because I adapt the "grammar" based on what is feasible. Sadly cannot spend a lot of time in continuing with the reference my implementation at the moment.


I will comment based on what I have already figured out:

> - "Or" operators. "Maxspeed is 80 if it's wet or Sunday" can be
> rephrased as "Maxspeed is 80 if it's wet. Maxspeed is 80 if it's
> Sunday." IOW, these can be modelled by using two tags instead of one.

This is in fact the biggest challenge in the current state of my parser (in combination with fuzzy & context related precedence). Human language is sadly not very precise: "except taxis AND bicycles" does not mean, you must be in a vehicle that is both (it means if not taxis AND if not bicycles), "except HGV AND weight>7.5t" is by most humans interpreted differently (if not (HGV AND weight>7.5t). There are lot of other examples for amgiguities, eg. "except Fr, 10:00-18:00" does not mean complete Friday and the other days, etc.

However, in this case the preliminary parser has no problem to understand following different expressions:

maxspeed:cond = 80 if wet or Sunday

Easy tagging, isn't it?

But the grammar is flexible. Instead of 'if' I also support 'when' and '[' ']' (I am not sure about the brackets yet - they are clearly not as intuitive as 'if' or 'when').
maxspeed:cond = 80 [wet]
maxspeed:cond2 = 80 [Su]

Also understood by the parser:
maxspeed:cond = 80 when weekday is Sunday and condition is wet
or
maxspeed:cond = 80 weekday=Su or condition=wet
or
maxspeed:cond = 80 [wet]; 80 on Sundays

and many other variants. It is almost impossible to tag it wrong.


> - Brackets for expressions. If we limit ourselves to just "and"
> operators, evaluation order doesn't matter.


This is something I really want to avoid. Brackets for precedence purposes are a purely technical artefact and I not seen them on the signs with the information we want to tag...

However, the *precedence* is the major problem in the current parser.
Thus I don't think I can write a parser without any rule helping the parser and restricting the mappers. But brackets will just be introduced if I have no other option.


>> Pseudo-Javascript: (!is_motor_vehicle(vehicle_type)) ||
>> ((vehicle_type='hgv')&&  (time<'10:00' || time>'20:00')&&
>> intention='loading')

== side note ==
A assume, this is access related.
Side-note: The current access tags are IMO just abbreviations, nowadays we would write instead of

hgv=yes --> access:hgv=yes.

With the conditional value proposal it could be tagged as

access=yes if hgv

maxweight=x is an example for (vehicle) access = no if weight>x, even though it can also be a non-conditional property of an object (e.g. a bridge may have a intrinsic maximum acceptable weight, but we don't have to go into details now).
== side note end ==


I interpreted your code as your example as

access=yes
access:cond1 = no if motorized
  or "no if motor vehicle" or "no if vehicle is motorized"
access:cond2 = delivery if hgv from 10:00-20:00
  or "delivery if vehicle is a hgv and time is 10:00-20:00
  or many variants more...

If the evuation part of my parser works (yet I still working on the grammar), I may also be able to create a kind of "normalized" JavaScript expression out of the "fuzzy" human tags [but I don't have implemented a normalized attributed tree yet].


> - defining a syntax for time intervals (opening_hours)

By using on/off, this is already the first tag which moved the condition into the value. By using off/on, it reads like

off if ...
on if ...

However, the author struggled with the same basic problem, e.g. there is a (non-intuitive) difference between using ',' and ';' now.

Also, except for a basic time restrictions it is impossible to tag and also difficult to read these expressions. Clearly powerful, but too compressed for casual mappers. Can you read this?

Dec 25-Su-28 days - Mar Su[-1]: 22:00-23:00

In this case I would stick to human readable expressions like "last Sunday in March" and put the load to evaluate it onto the parser/application.


> - defining a subset hierarchy of vehicle categories (such as
> "motor_vehicle" including "hgv" as a subset)

Applications must know which vehicle you drive to evaluate certain conditional values (mainly access, speed limits and also parking conditions). Unlike the current access tags we don't need an hierarchy.

motorized is an attribute of the vehicle, the application must know about it. Not every taxi is motorized [rickshaws or bicycle taxis], the attributes for the circumstances of use should be interdependently determined by the application. The applicaton may use a tree internally, but I don't think its the mapper's job.

For a standard taxi a application may work with following attributes:
taxi = yes
weight = 2t
motorized = yes
width = 2m
length = 5m
height = ...
permits = A38, A39
public service = yes
[however, if the taxi is empty, this attribute may be no]
passengers = 2
sex = male
age = 55
engine power = 22PS
vehicle maxspeed = 180
colour = black
etc.
[I hope the concept is clear]

Independent from that does the application also require knowledge about date & time for certain conditions - and the application may need to know the weather, holidays, purpose of travel (or intention), destination of travel (especially in my area several access restrictions depend on the area you driving to).

The mappers no longer have to worry about interdependencies between such attributes, e.g is a rickshaw a motor vehicle or a bicycle or just a pedestrian with a trailer? (It can be all).

Now, whenever some properties are not known, the application simply cannot evaluate all conditional value. Either the application can accept this and continues with the safety value (e.g. no for access, lowest speed for maxspeed, etc.) or may have to ask the user (e.g. user can select if the bicycle map or the motor vehicle map should be shown).

> So how do the existing proposals fit in here? Well, the primary
> difference between the example above and "extended conditions" is that
> the latter aims for for short, manually editable strings by letting
> literals for vehicle classes, weather conditions etc., as well as time
> intervals, stand for themselves - based on the assumption that a parser
> will be able to unambiguously identify them.

1) I dislike proposals which try to solve only the situation for access
access is a tag like any other - and we shouldn't re-invent the wheel:
access:cond = yes if time is 10:00-18:00  (or simply yes [10:00-18:00])
maxspeed:cond = 80 if time is 10:00-18:00
parking:lane:left:cond = parallel time is 10:00-18:00
open:cond = yes time is 10:00-18:00

or - simply to emphasize the concept -
access:cond = yes if female
maxspeed:cond = 80 if female
parking:lane:left:cond = parallel if female
open:cond = yes if female

The extended access proposal can be used for any tags, thus no issue here:
access:hgv:(20:00-10:00) = ...
maxspeed:hgv:(20:00-10:00) = ...
parking:lane:left:hgv:(20:00-10:00) = ...
...


2) Value vs. key
I think value side conditions would be more intuitive, because the value depends on the condition.

Also, it easier to filter the things in the database, especially if left/right & forward/backward is also mixed into the conditions (or should we simple go the next step and see them as condition too?).

Disadvantage is that values can contain any characters. This makes it hard to identify the start of the condition in a parser.

3) The extended access proposal:

> motor_vehicle = no
> hgv:(20:00-10:00):loading = yes

Normal form is nice to parse - but do you think everybody can map it?
Non-normal form is not so nice for machines - thus I cannot promise that I achieve to parse it - and the discussion is theoretical until I can prove it (with reference implementation).

I also see no reason why an application may not be able to treat this as equivalent:
hgv = yes    (shortcut for:)
access:hgv = yes (which is a valid expression also in the proposed extension)
access:cond = yes [hgv]

Then backward compatibility, extended condition tags or value side conditions could be used. If applications need a parser anyway...

martinq

_______________________________________________
Tagging mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/tagging



_______________________________________________
Tagging mailing list
[email protected]
http://lists.openstreetmap.org/listinfo/tagging

Reply via email to