Re: Labels separate from localnames (Was: Best Practice for Renaming OWL Vocabulary Elements

Martin Hepp Fri, 22 Apr 2011 04:41:22 -0700

See replies inline ;-)
> Sorry to say this, but I think you are making a mistake. To say that the 
> rdfs:label has to look like a variable name because it is for Web developers 
> sounds to me like you are saying that the javadoc of a method should look 
> like a piece of code because it is addressed to programmers. I refuse to 
> believe that Web developers understand better pseudo code than natural 
> language.


I will finally give in to use English spacing and capitalization for 
rdfs:labels in GoodRelations, e.g. use

   "Business entity"@en for gr:BusinessEntity etc.

But I will keep the cardinality recommendation in the rdfs:label of properties, 
e.g.

    serial number (0..*) for gr:serialNumber

and the class type information in ontological individuals, as in

    By bank transfer in advance (payment method) for gr:ByBankTransferInAdvance

The latter should definitely not irritate human consumers, for it provides 
context; the former is to my judgment the best way of indicating cardinality 
recommendations in OWL, since the OWL cardinality constructs don't cover what 
is needed, yet I have to be able to tell modelers the intended cardinality. It 
is not nonsensical, as you state, as many users of GR have confirmed.

> Moreover, Web clients most of the time display raw data (in a nice way) 
> extracted from databases. For instance, a Wikipedia article displays a nice 
> readable title, which is exactly the raw data that is found in a column of a 
> database. Of course, you can decide that you won't use rdfs:label for human 
> readable text and reserve another property for that (eg, dc:title), but you 
> cannot decide how others will use your data and they may have a preference 
> for the rdfs:label. As a matter of fact, rdfs:label is commonly used for 
> showing people a nice readable piece of text in natural language.

I was stressing that SW apps that aim at real people will have to use 
sophisticated methods for choosing the proper label for data elements anyway; 
using the raw rdfs:label will not work for non geeks in most of the cases. Most 
ordinary people cannot process data, just information.
> 
> Now, let's imagine I have a "product browser" which aggregates information 
> about products found on the Web, leveraging the GoodRelations vocabulary and 
> possibly other vocabularies. It may display the products in a table and have 
> a column for "product type", which displays the class of the product. There 
> are chances that the client will display the rdfs:label of the class as the 
> "product type", which in the case of GoodRelations would look sibylline to a 
> casual reader, with camel-toed text and nonsensical information about arity.

Nobody except for very specialized analysts will ever want to use a product 
browser that presents raw RDF data.

> 
> Moreover, with such practice, how can you provide labels in multiple 
> languages? Paymentmethod is not even an English word!
The choice of labels for information consumers cannot be solved by the creator 
of the vocabulary, because that depends on the context (e.g. audience) in which 
the results will be displayed. 
This is independent from the question of translations. A good ontology makes 
good (context-independent, lasting, cross-cultural) choices regarding the 
categories of things. The linguistic representation of these categories in 
specific context is a completely different story.


>> But since this class is so frequently used, I want to change it to
>> simply gr:Location while remaining as much of backward compatibility
>> as possible; that is the background of the pattern I suggested.
> 
> Ouch! I'm afraid amateur Linked Data producers who are searching for terms in 
> a SemWeb search engine will find gr:Location very appropriate for *any* 
> location. As a consequence, it will be inferred that all locations recorded 
> in geonames are selling something! The Semantic Web will break and bring in 
> its downfall the World Wide Web and the Internet, then the end of the world...
> 

First, it does not hurt for him or her to use gr:Location for that purpose - 
there is no contradiction; any place or area in the universe can be said to be 
an instance of gr:Location.
Second, I cannot solve the problem of 
- amateur linked data producers in general and
- the unsatisfying state of search technology for ontologies and ontology 
elements.

The most important audience to cater for nowadays are Web developers who want 
to add RDFa to existing sites. Learn from Facebook and their findings re OGP.
>> Well, in my case that would mean I cannot change a)
>> gr:LocationOfSalesOrServiceProvisioning to gr:Location b)
>> gr:ProductOrServicesSomeInstancesPlaceholder to gr:SomeItems and c)
>> gr:ActualProductOrServiceInstance gr:Individual
> 
> Those names are horribly long but they have the merit of being little 
> ambiguous, as opposed to gr:Individual. In FOAF, the names are very short, 
> which certainly helps getting the vocabulary adopted but creates a 
> considerable amount of misuses (foaf:img, foaf:mbox, ...).  Moreover, these 
> long names are easier to discover in keyword-based search engines because 
> there is more contextual information to properly index and relate the words 
> in the name.
> 

I would put it differently: The initial long names were important for me to 
develop a clean conceptual model, because other terms would have been much less 
generic and much more industry-specific. The fact that you can use 
GoodRelations across industries (jobs, restaurants, transportation, cars, 
books, consulting, disposal, ...) is because I did not use the quick, 
context-bound words for conceptual elements.

But in the three modifications I am planning, I think the gain in brevity is 
much more relevant that the risk of wrong usage. Keep in mind that even long 
names do not prevent wrong usage.

Basically, I am evaluating only three changes (not yet confirmed with important 
stakeholders):

gr:ActualProductOrServiceInstance --> gr:Individual
gr:ProductOrServicesSomeInstancesPlaceholder --> gr:SomeItems
gr:LocationOfSalesOrServiceProvisioning --> gr:Location

The former two are always used as additional classes, so their IDs will always 
be in context:

foo:myHammer a <http://www.productontology.org/id/Hammer>, gr:Individual.
foo:someHammers a <http://www.productontology.org/id/Hammer>, gr:SomeItems.

Even I has to look up the GoodRelations Reference for the correct syntax from 
time to time, so there is a real need for improvement.

>> As said, I am considering to change the formatting from camel word to
>> non-camel style but keep the cardinality and class membership info
>> for developers. The issue of several languages is, in theory, a nice
>> feature, but extremely difficult to implement in six-sigma quality
>> due to the differences in connotations and semantic granularity of
>> natural languages. Having second-class translations would do more
>> harm than good, in my opinion. The only reliable translations I could
>> provide easily would be German, but that would really not increase
>> adoption significantly - most German Web developers speak English.
> 
> You do not need to make the translations yourself. Find fluent translators or 
> expert linguists.

I do not know whether you have ever tried to get sufficiently precise 
translations for rather abstract ideas.
You would need to get at least two independent translations for each language 
and then evaluate the differences.

BTW, I am not saying there is no need for translations, but before the 
translations could be part of the official spec, they would have to be 
extremely reliable.

It's no problem if someone on the Web publishes an RDF graph of French labels 
for GoodRelations, even if it was not 100 % accurate.

Have a look at 30 years of terminology research (e.g. http://www.termnet.org/) 
or google for Eugen Wuester.

>> Snippets or Yahoo SearchMonkey will never see the vocabulary labels,
>> only the person configuring the generation of data.
> 
> Google Rich Snippets don't show the labels because it is specifically tuned 
> for GoodRelations. But a generic tool which aggregates information from 
> various sources using various vocabularies has to make a generic assumption 
> on what to display. rdfs:label is what is often chosen by generic tools to be 
> shown to people.

I doubt the interaction with RDF data on a Web scale will be a simple 
modification of the browser paradigm of HTML content. Pivot-style approaches 
IMO pointing to the right direction, but again, you will need a hard-coded or 
pretty intelligent additional layer in between the human and the data, and 
selecting the proper name for a piece of data will be among the challenges. A 
simple regex on the labels from the vocabulary will be the least obstacle of 
all.

I don't think that we as LOD / SW researchers do already know how to implement 
the larger vision, but it will for sure require a lot more sweat, more 
creativity, and more cross-discipline effort than many seem to assume.

Best

Martin

Re: Labels separate from localnames (Was: Best Practice for Renaming OWL Vocabulary Elements

Reply via email to