enhancer: EnhancementStructureOverview.png stanbolenhancementstructure.mdtext

rwesten Fri, 23 Sep 2011 04:08:40 -0700

Author: rwesten
Date: Fri Sep 23 11:08:13 2011
New Revision: 1174655

URL: http://svn.apache.org/viewvc?rev=1174655&view=rev
Log:
Some updates to the Stanbol Enhancement Structure.


I am currently in the progress of updating this document based on the comments 
in the discussion thread on the stanbol-dev list [1].
This is ongoing work that will need some more time (and discussions). In the 
meantime I would like to keep this on the staging server.


[1] http://markmail.org/message/upzgzn5ew7cqa6ou

Added:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png
   (with props)
Modified:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png?rev=1174655&view=auto
==============================================================================
Binary file - no diff available.

Propchange: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/EnhancementStructureOverview.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext?rev=1174655&r1=1174654&r2=1174655&view=diff
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext
 (original)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/stanbolenhancementstructure.mdtext
 Fri Sep 23 11:08:13 2011
@@ -8,28 +8,33 @@ This describe the schema (ontology) used
 
 The Stanbol Enhancement Structure is build around the following main Concepts. 
Each of this concepts covers a specific aspect related to the enhancement 
process of content.
 
+
 The following list gives an overview about the concepts used by the Stanbol 
Enhancement Strucutre:
 
+![Overview about the Stanbol Enhancement 
Structure](/EnhancementStructureOverview.png "Overview of the Stanbol 
Enhancement Structure")
+
 * **ContentItem:** This is the resource representing the parsed content. The 
URI of this resource depends on how the content was parsed to the Stanbol 
Enhancer. In case an absolute URI is provided by the request, than this URI is 
used. In all other cased the Stanbol Enhancer creates an URI based on the 
configured prefix or the URL of the service. The documentation of the RESTful 
service should provide more information about that.
 
-* **Content:** Several content model distinguish between Content (data) and 
the ContentItem (Interpretation of the Data). The Enhancement Structure 
currently only defines ContentItem, because there is no need to describe the 
data for the purpose of the enhancement process. Other components (such as the 
/store endpoint) might need to formally describe the data. For such use cases 
the sic:content property will be used to refer from the ContentItem to the 
Content. The URI representing the Content will be the same to be used to 
retrieve its data via a RESTful service. 
+* **sb:Content:** Several content model distinguish between Content (data) and 
the ContentItem (Interpretation of the Data). The Enhancement Structure 
currently only defines ContentItem, because there is no need to describe the 
data for the purpose of the enhancement process. Other components (such as the 
/store endpoint) might need to formally describe the data. For such use cases 
the sic:content property will be used to refer from the ContentItem to the 
Content. The URI representing the Content will be the same to be used to 
retrieve its data via a RESTful service. 
+
+* **sb:Enhancement:** This provides metadata about extractions created by 
EnhancementEngines or present within the content. This includes the creator 
(usually a EnhancementEngine), the creation time, as well as relations to other 
enhancements. Users of the Stanbol Enhancer will typically not care about such 
data because out of the their perspective they represent Meta-Meta-Data (meta 
data about the metadata). Every feature, suggestion or other piece of 
information extracted by any EnhancementEngine need to attach the metadata 
defined for this concept.
+
+* **sb:Annotation:** An annotation describe some piece of knowledge extracted 
from the parsed content and/or the metadata of the content. Information 
provided by Annotations include the label, type and the confidence. In addition 
Annotations need to link at least to a single Occurrence and may have one or 
more Suggestions. Annotations can also be related/dependent to other 
Annotations. The EnhancementStructure defines only a small set of different 
Annotation types. Implementors of EnhancementEngines that extract specific kind 
of things (e.g. coreferences, events, â¦) may need to define there own 
Annotation types. Such Extensions should be called "**Annotation" and be 
defined as rdfs:subclass of any Annotation type defined by this Enhancement 
structure.
 
-* **Enhancement:** This provides metadata about extractions created by 
EnhancementEngines or present within the content. This includes the creator 
(usually a EnhancementEngine), the creation time, as well as relations to other 
enhancements. Users of the Stanbol Enhancer will typically not care about such 
data because out of the their perspective they represent Meta-Meta-Data (meta 
data about the metadata).
+* **sb:Suggestion** An suggestion describes an Resource (Entity, Topic, 
Category â¦) that an EnhancementEngine suggests as a possible match for an 
Annotation. Suggestions are typically created by Engines that further process - 
semantic lifting - of Annotations. However EnhancementEngines might also create 
both - the Annotation and the Suggestions. Suggestions are always linked to a 
single Annotations (functional property). They  define the label, the ID 
(typically the URI of the Resource), the type(s) of the suggested Resource and 
the confidence of the suggestion.
 
-* **Annotation:** An annotation describe a feature present within the parsed 
content. Such feature can have three sources. (1) the can originate form 
metadata present in the parsed content, (2) the can be extracted by analyzing 
the content itself and (3) they can be based on further processing Annotations 
of type (1) and (2). The Annotation provides the label, the type (e.g. Person, 
Organization, Location ) the role (e.g. Tag, Category, Keyword), the confidence 
and (if available) the link to the entity representing the extracted feature. 
It is the central concept for users that need to present all the things 
extracted from the parsed content.
+* **sb:Occurrence:** An Occurrence describes the actual location of an 
extracted feature within the content. This location may be within the content 
or within parsed metadata. Occurrences are always linked to a single Annotation 
(functional property). Based on the type of the content there will be different 
types of Occurrences. This EnhancementStructure currently focus on two types of 
Occurrences: (1) TextOccurrence and (2) MetadataOccurrence. For details on the 
model of such Occurrence types see the according sections. EnhancementEngines 
that support the extraction of Features from content types that are not covered 
by this Specification (e.g. Pictures, Sound, Video) need to define there own 
Occurrence types. Such types should use the name "***Occurrence" and be defined 
as rdfs:subClassOf any of the Occurrence types defined in this specification.
 
-* **Occurrence:** An Occurrence describes the actual location of the feature 
within the content or the metadata. Based on the type of the content there will 
be different types of Occurrences. A "text occurrence" will contain information 
such as the selected-text, the start/end position of the selection and the 
surrounding text to provide some context. An "image accurrence" will provide 
the top left and the bottom right position of the selected rectangle. A 
"metadata occurrence" will describe the property used for the annotation (e.g. 
dc:creator) the used standard (e.g. DCterms) and the value.
+Enhancements encoded based on this specification need to confirm to the 
following rules:
 
-When using the Enhancement Structure one need usually need to combine several 
of the above concepts to create meaningful statement.
-As an example take a natural language processing engine that needs to express 
the the word "Paris" found within an sentence like "I will travel to Prais next 
week" portably refers to a location.
-To express that it will need to combine the concepts 
+* sb:Annotation and sb:Suggestion MUST also be of type sb:Enhancement and 
include the required metadata defined by sb:Enhancement.
+* sb:Occurrences, sb:Annotations and Suggestions MUST include rdf:type 
information for all parent types. e.g. when adding a sb:TextOccurrences the 
rdf:type MUST include sb:TextOccurrence AND sb:Occurrences. Consumers are 
expected to NOT using any kind of reasoner therefore adding such additional 
information is the only way to ensure that queries for occurrences, annotations 
or suggestions provide the expected results.
 
-* Enhancement: to express that this feature was extracted by the Natural 
Language Processing Engine at a given time ...
-* Annotation: to express that "Paris" represents a "Location" and has the role 
"Tag"
-* Occurrence: to express where the selected text "Paris" is located within the 
analyzed content
+---
 
-The same is true for consuming Enhancements. A client interested in presenting 
Tags, Categories and Keywords needs only information provided by the Annotation 
concept. To be able to highlight the actual location of detected features 
within the content on needs to also process information provided by the 
Occurrence concept.
+The parts below are currently under work
 
+---
 
 ## Specification
 
@@ -148,6 +153,7 @@ The following properties are defined for
 * **sb:entity**: In case an annotation describes an Entity, this property 
provides the URI for the entity
 * **sb:entity-type**: In case an annotation describes an Entity, this property 
provides the rdf:types of the linked entity
 * **sb:suggestion**: Links to an other annotation that provides a suggestion 
for this one. This indicates that the Stanbol Enhancer requests the client to 
decide between the provided options - e.g. by some user interaction.
+* **sb:occurrence**: Optionally links to one or more sb:Occurrence of this 
annotation within the parsed Content. Note that there are several types of 
Occurrences (TextOccurrence, ImageOccurrence, MetadataOccurrence â¦) defined. 
If this property is missing, that the Annotation is assumed to be about the 
whole content (as referred to by the sb:extracted-from property).
 
 **Annotations Type** describe the type of the annotated feature based on a 
terminology standardized by Stanbol. Current types include
 
@@ -165,32 +171,147 @@ This list should only contain some types
 * sb:Tag: The feature can be suggested as tag for the parsed content.
 * sb:Category: The feature provides a categorization for the parsed content.
 * sb:Keyword: The feature describes a keyword within the parsed content TODO: 
describe the difference between keywords and tags
-* sb:Suggestion: The feature is a suggestion for an other Annotations. 
 
 *NOTE*: Such roles should make it more easy to support additional Annotations 
roles as suggested by 
[STANBOL-48](https://issues.apache.org/jira/browse/STANBOL-48) and 
[STANBOL-12](https://issues.apache.org/jira/browse/STANBOL-12) that includes 
[STANBOL-28](https://issues.apache.org/jira/browse/STANBOL-28) and 
[STANBOL-29](https://issues.apache.org/jira/browse/STANBOL-29).
 
-For **Suggestions** there are some additional constraints as defined by the 
following code block
+### sb:Suggestion
 
-    <a> rdf:type sb:Annotation
-    <a> dc:role !sb:Suggestion
-    <a> sb:suggestion <a1>
-        <a1> rdf:type sb:Annotation
-        <a1> dc:role sb:Suggestion
-        <a1> sb:confidence ordering^^xsd:float 
-
-This means:
-
-* an Annotation may only define suggestion if it does not have the dc:role 
sb:Suggestion. This prohibits nested suggestions
-* an Annotation lined by sb:suggestion con considered to be of the dc:role 
sb:Suggestion - even that it does not define this role explicitly.
-* Annotations used as suggestions MUST define some way to allow clients to 
show them in the right order (
-* the confidence value of annotations used as suggestions should be used to 
order suggestions when presented to the user. However Applications need to 
consider that such values are on an ordinal scale meaning that a value of "4" 
does NOT mean that it is twice as likely than a suggestion with an confidence 
of "2"!
+Suggestions are used by the Stanbol Enhancer to suggest possible values for 
the resolution features extracted from the parsed content. 
+Currently there are two different use cases for Suggestions defined
+
+* (1) Entity Resolution:* Suggests entities for an Feature extracted from the 
content. Typically such suggestions are calculated based on the name of the 
feature found within the content (e.g. the selected text of a 
sb:TextOccurrence).
+* (2) Field Value Suggestion:* Suggest a value for a specific property. This 
kind of suggestion are useful if an relation between two extracted features is 
detected. A typical example would be a person "Steve Jobs" with the role "CEO" 
of the company "Apple Inc". Such relations can be detected by NLP tools. 
However suggestions like this are also central for semantic lifting of RDFa 
annotations as shown in the example below.
+
+sb:Suggestion uses the following properties
+
+* **sb:entity**: The id of the suggested Entity
+* **sb:entity-type**: The type(s) of the suggested Entity
+* **sb:confidence**: Needed to sort in case of multiple suggestions
+* **sb:field**: Defines the property this suggestion should become the value 
if accepted by the user
+
+In addition all sb:Suggestions are also of type sb:Enhancement to allow 
EnhancementEngine to provide enhancement metadata for them.
+
+
+for details how they are used please see the following Example
+
+==== Example ====
+
+As example lets assume that the following RDFa annotated content is parsed to 
the Stanbol Enhancer
+
+   <span typeof="cal:Vevent">
+       <h3 property="dc:title"> Stanbol Teleconference </h3>
+       <span property="cal:summary>
+           <p> Agenda: </p>
+           <ul>
+               <li> ... </li>
+           <ul>
+           <p> Participants: </p>
+           <ul>
+               <li typeof="foaf:Person" property="foaf:name">Rupert 
Westenthaler</li>
+               <li typeof="foaf:Person" property="foaf:name">Olivier 
Grisel</li>
+               <li> ... </li>
+           </ul>
+       </span>
+   </span>
+
+(1) Suggest the Entities for Rupert and Olivier
+(2) Suggest to link Rupert and Olivier as values for "cal:attendee"
+
+Both for Rupert Westenthaler and Olivier Grisel an EntityAnnotation would be 
present - in that case created by the RDFa extractor, but in principle this 
could also work if the RDFa markup is missing. In such cases the 
EntityAnnotations could be created by an NLPEnhancementEngine.
+
+   <a1> rdf:type sb:EntityAnnotation
+   <a1> dc:title Rupert Westenthaler
+   <a1> sb:entity-type foaf:Person
+   <a1> sb:hasOccurrence <o1>
+   <a1> sb:hasSuggestion <s1>
+
+   <a2> rdf:type sb:EntityAnnotation
+   <a2> dc:title Olivier Grisel
+   <a1> sb:entity-type foaf:Person
+   <a2> sb:hasOccurrence <o2>
+   <a2> sb:hasSuggestion <s2>
+
+Lets ignore the occurrences - because how to create Occurrences for RDFa 
markup is a whole different story that needs to be specified - and concentrate 
on the suggestions.
+
+   <s1> rdf:type sb:Suggestion
+   <s1> sb:entity <http://www.example.com/person/Rupert_Westenthaler>
+   <s1> sb:entity-type foaf:Person, vCard:vCard, dbpedia-ont:Person
+   <s1> sb:confidence 123,456
+
+   <s2> rdf:type sb:Suggestion
+   <s2> sb:entity <http://www.example.com/person/Olivier_Grisel>
+   <s2> sb:entity-type foaf:Person, vCard:vCard, dbpedia-ont:Person
+   <s2> sb:confidence 234,567
+
+If the suggestion is accepted by the client the RDFa markup could be updated 
like this
+
+   <li about="http://www.example.com/person/Rupert_Westenthaler";
+       typeof="foaf:Person" property="foaf:name">Rupert Westenthaler</li>
+   <li about="http://www.example.com/person/Olivier_Grisel";
+       typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>
+
+Now lets have a detailed look at the suggestions to add Rupert and Olivier as 
a "cal:attendee" to the meeting.
+First we need to create an EntityAnnotation for the Meeting that would be 
created by the RDFa extractor
+
+   <a> rdf:type sb:EntityAnnotation
+   <a> dc:title "Stanbol Teleconference"
+   <a> sb:entity-type cal:Vevent
+   <a> sb:hasOccurrence <o>
+   <a> sb:hasSuggestion <s3>
+   <a> sb:hasSuggestion <s4>
+
+Again lets skip the occurrence and look at the two suggestions. What I want to 
do here is to suggest to use the Annotations for Rupert (<a1>) and Olivier 
(<a2>) as values for the property "cal:attendee".
+
+It is important to suggest here the annotations <a1> and <a2> as values and 
NOT the suggested entities (e.g. 
<http://www.example.com/person/Rupert_Westenthaler> in case of <a1>) because 
the Stanbol Enhancer can not assume that the user will accepts the suggestions 
<s1> for <a1> and <s2> for <a2>.
+
+The following suggestions also use the sb:field property to tell the user that 
the suggestions is about values for the "cal:attendee" property.
+
+   <s3> rdf:type sb:Suggestion
+   <s3> sb:field cal:attendee
+   <s3> sb:entity <a1>
+   <s3> sb:entity-type sb:EntityAnnotation
+   <s3> sb:confidence 12,34
+
+   <s4> rdf:type sb:Suggestion
+   <s4> sb:field cal:attendee
+   <s4> sb:entity <a2>
+   <s4> sb:entity-type sb:EntityAnnotation
+   <s4> sb:confidence 12,34
+
+NOTE:
+
+* I am not sure if it is a good Idea to use "sb:entity" to link to an 
annotation created by the Stanbol Enhancer because it might confuse users if 
the same property is used to link external and internal resources. However 
introducing an additional property such as "sb:value" seam also not better.
+
+Here the RDFa markup if the user accepts <s3> and <s4> but not <s1> and <s2>
+
+   <span typeof="cal:Vevent">
+       [...]
+       <p> Participants: </p>
+       <ul property="cal:attendee">
+           <li typeof="foaf:Person" property="foaf:name">Rupert 
Westenthaler</li>
+           <li typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>
+           <li> ... </li>
+       </ul>
+   </span>
+
+and finally the RDFa markup if the all suggestions are accepted by the client 
side
+
+   <span typeof="cal:Vevent">
+       [...]
+       <p> Participants: </p>
+       <ul property="cal:attendee">
+           <li about="http://www.example.com/person/Rupert_Westenthaler";
+               typeof="foaf:Person" property="foaf:name">Rupert 
Westenthaler</li>
+           <li about="http://www.example.com/person/Olivier_Grisel";
+               typeof="foaf:Person" property="foaf:name">Olivier Grisel</li>
+       </ul>
+   </span>
 
 
 ### Occurrences
 
-By default detected Features are considered to be extracted from the whole 
content. While this assumption is appropriate for things like Categorizations 
and keywords for a lot of cases it is possible to specify the exact occurrence 
of features within the content and/or the metadata of the content.
+By default detected Features are considered to be extracted from the whole 
content. While this assumption is appropriate for things like Categorizations 
and keywords for a lot of cases it is possible to specify the exact occurrence 
of features within the content and/or the metadata of the content. In such 
cases the sb:Annotation will define one or more values for the sb:occurrence 
value.
 
-Typically Occurrences are used together with sb:Annotations and sb:Enhancement 
in cases an EnhancementEngine whats to describe the position of the extracted 
Feature within the analyzed content. So propertied defined by this two context 
should be considered when reading this section.
 
 Different Occurrence descriptions are needed to describe the position of a 
feature within different types of content or within the parsed metadata.

svn commit: r1174655 - in /incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer: EnhancementStructureOverview.png stanbolenhancementstructure.mdtext

Reply via email to