Author: agruber
Date: Fri Sep 16 09:59:13 2011
New Revision: 1171482

URL: http://svn.apache.org/viewvc?rev=1171482&view=rev
Log:
updated customvocabulary description with examples

Added:
    incubator/stanbol/site/trunk/content/stanbol/docs/trunk/examples/
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/examples/anl-mappings.txt
Modified:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/customvocabulary.mdtext

Modified: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/customvocabulary.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/customvocabulary.mdtext?rev=1171482&r1=1171481&r2=1171482&view=diff
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/customvocabulary.mdtext 
(original)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/customvocabulary.mdtext 
Fri Sep 16 09:59:13 2011
@@ -1,25 +1,28 @@
 Title: Using custom/local vocabularies with Apache Stanbol
 
-For text enhancement and linking to external sources, the Entityhub provides 
you with the possibility to work with local indexes of datasets for several 
reasons. Firstly, you do not want to rely on internet connectivity to these 
services, secondly you may want to manage local changes to these public 
repository and thirdly, you may want to work with local resources only, such as 
your LDAP directory or a specific and private enterprise vocabulary of your 
domain.
+The ability to work with custom vocabularies is necessary for many 
organisations. Use cases range from being able to detect various types of named 
entities specific of a company or to detect and work with concepts from a 
specific domain.
 
-The main other possibility is to upload ontologies to the ontology manager and 
to use the reasoning components over it.
+For text enhancement and linking to external sources, the Entityhub component 
of Apache Stanbol allows to work with local indexes of datasets for several 
reasons: 
 
-This document focuses on two cases:
+- do not want to rely on internet connectivity to these services, thus working 
offline with a huge set of enties
+- want to manage local updates of these public repositories and 
+- want to work with local resources only, such as your LDAP directory or a 
specific and private enterprise vocabulary of a specific domain.
 
-- Creating and using a local SOLr index of a given vocabulary e.g. a SKOS 
thesaurus or taxonomy of your domain
-- Directly working with individual instance entities from given ontologies 
e.g. a FOAF repository.
+Creating your custom indexes the preferred way of working with custom 
vocabularies. For small vocabularies, with Entithub one can also upload simple 
ontologies together instance data directly to the Entityhub and manage them - 
but as a major downside to this approach, one can only manage one ontology per 
installation.
 
-## Creating and working with local indexes
+This document focuses on the main case: Creating and using a local SOLr 
indexes of a custom vocabularies e.g. a SKOS thesaurus or taxonomy of your 
domain.
 
-The ability to work with custom vocabularies in Stanbol is necessary for many 
organizational use cases such as beeing able to detect various types of named 
entities specific to a company or to detect and work with concepts from a 
specific domain. Stanbol provides the machinery to start with vocabularies in 
standard languages such as [SKOS - Simple Knowledge Organization 
Systems](http://www.w3.org/2004/02/skos/) or more general 
[RDF](http://www.w3.org/TR/rdf-primer/) encoded data sets. The respective 
Stanbol components, which are needed for this functionality are the Entityhub 
for creating and managing the index and several [Enhancement 
Engines](engines.html) to make use of the index during the enhancement process.
+## Creating and working with custom local indexes
 
-### Create your own index
+Stanbol provides the machinery to start with vocabularies in standard 
languages such as [SKOS - Simple Knowledge Organization 
Systems](http://www.w3.org/2004/02/skos/) or more general 
[RDF](http://www.w3.org/TR/rdf-primer/) encoded data sets. The respective 
Stanbol components, which are needed for this functionality are the Entityhub 
for creating and managing the index and several [Enhancement 
Engines](engines.html) to make use of the indexes during the enhancement 
process.
+
+### A. Create your own index
 
 **Step 1 : Create the indexing tool**
 
 The indexing tool provides a default configuration for creating a SOLr index 
of RDF files (e.g. a SKOS export of a thesaurus or a set of foaf files).
 
-(1) If not yet built during the Stanbol build process of the entityhub call
+If not yet built during the Stanbol build process of the entityhub call
 
     mvn install
 
@@ -40,7 +43,14 @@ Initialize the tool with
 
     java -jar 
org.apache.stanbol.entityhub.indexing.genericrdf-*-jar-with-dependencies.jar 
init
 
-You will get a directory with the default configuration files, one for the 
sources and a distribution directory for the resulting files. Make sure, that 
you adapt the default configuration with at least the name of your index and 
namespaces and properties you need to include to the index and copy your source 
files into the respective directory <code>indexing/resources/rdfdata</code>. 
Several standard formats for RDF, multiple files and archives of them are 
supported. *For details of possible configurations, please consult the 
<code>{root}/entityhub/indexing/genericrdf/readme.md</code>.*
+You will get a directory with the default configuration files, one for the 
sources and a distribution directory for the resulting files. Make sure, that 
you adapt the default configuration with at least 
+
+- the id/name and licence information of your data and 
+- namespaces and properties mapping you want to include to the index (see 
example of a [mappings.txt](examples/anl-mappings.txt) including default and 
specific mappings for one dataset)
+
+Then, copy your source files into the respective directory 
<code>indexing/resources/rdfdata</code>. Several standard formats for RDF, 
multiple files and archives of them are supported. 
+
+*For more details of possible configurations, please consult the README at 
<code>{root}/entityhub/indexing/genericrdf/</code>.*
 
 Then, you can start the index by running
 
@@ -54,7 +64,7 @@ Depending on your hardware and on comple
 At your running Stanbol instance, copy the ZIP archive into 
<code>{root}/sling/datafiles</code>. Then, at the "Bundles" tab of the 
administration console add and start the 
<code>org.apache.stanbol.data.site.{name}-{version}.jar</code>.
 
 
-### Configuring the enhancement engines
+### B. Configure and use the index with enhancement engines
 
 Before you can make use of the custom vocabulary you need to decide, which 
kind of enhancements you want to support. If your enhancements are 
NamedEntities in its more strict sense (Persons, Locations, Organizations), 
then you can may use the standard NER engine together with its 
EntityLinkingEngine to configure the destination of your links.
 
@@ -69,15 +79,15 @@ In the following the configuration optio
 
 (2) Open the configuration console at 
http://localhost:8080/system/console/configMgr and navigate to the 
TaxonomyLinkingEngine. Its main options are configurable via the UI.
 
-- Referenced Site: {put the id/name of your index} (required)
-- Label Field: {the property to search for}
+- Referenced Site: {put the id/name of your index}
+- Label Field: {the property to search for} 
 - Use Simple Tokenizer: {deactivate to use language specific tokenizers}
 - Min Token Length: {set minimal token length}
 - Use Chunker: {disable/enable language specific chunkers}
 - Suggestions: {maximum number of suggestions}
 - Number of Required Tokens: {minimal required tokens}
 
-*For further details please on the engine and its configuration please consult 
the according Readme file at TODO: create the readme 
<code>{root}/stanbol/enhancer/engines/taxonomylinking/<code>.*
+*For further details please on the engine and its configuration please refer 
to the according README at 
<code>{root}/stanbol/enhancer/engines/taxonomylinking/</code>.* (TODO: create 
the Readme)
        
 
 **Use several instances of the TaxonomyLinkingEngine**
@@ -87,28 +97,18 @@ To work at the same time with different 
 
 **Use the TaxonomyLinkingEngine together with the NER engine and the 
EntityLinkingEngine**
 
-If your text corpus contains and you are interested in both, generic 
NamedEntities and custom thesaurus you may use   
-
-
-
-### Demos and Examples
-
-- The full demo installation of Stanbol is configured to also work with an 
environmental thesaurus - if you test it with unstructured text from the 
domain, you should get enhancements with additional results for specific 
"concepts".
-- One example can be found with metadata from the Austrian National Library is 
described (TODO: link) here.
-
-(TODO) - Examples
-
+If your text corpus contains and you are interested in both, generic 
NamedEntities and custom thesaurus you may use (TODO)  
 
-## Create a custom index for dbpedia
 
-(TODO) dbpedia indexing (<-- olivier)
+## Specific Examples
 
+**Create your custom index for dbpedia:** (TODO: dbpedia indexing (<-- 
olivier))
 
-## Working with ontologies in EntityHub
 
-(TODO)
+## Resources
 
-### Demos and Examples
+- The full [demo](http://dev.iks-project.eu:8081/) installation of Stanbol is 
configured to also work with an environmental thesaurus - if you test it with 
unstructured text from the domain, you should get enhancements with additional 
results for specific "concepts".
+- Download custom test indexes and installer bundles for Stanbol from 
[here](http://dev.iks-project.eu/downloads/stanbol-indices/) (e.g. for GEMET 
environmental thesaurus, or a big dbpedia index).
+- Another concrete example with metadata from the Austrian National Library is 
described (TODO: link) here.
 
-(TODO)
 

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/examples/anl-mappings.txt
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/examples/anl-mappings.txt?rev=1171482&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/examples/anl-mappings.txt
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/examples/anl-mappings.txt
 Fri Sep 16 09:59:13 2011
@@ -0,0 +1,164 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+#NOTE: THIS IS A DEFAULT MAPPING SPECIFICATION THAT INCLUDES MAPPINGS FOR
+#      COMMON ONTOLOGIES. USERS MIGHT WANT TO ADAPT THIS CONFIGURATION AB
+#      COMMENTING/UNCOMMENTING AND/OR ADDING NEW MAPPINGS
+
+# --- Define the Languages for all fields ---
+# to restrict languages to be imported (for all fields)
+#| @=null;en;de;fr;it
+
+#NOTE: null is used to import labels with no specified language
+
+# to import all languages leave this empty
+
+# --- RDF RDFS and OWL Mappings ---
+# This configuration only index properties that are typically used to store
+# instance data defined by such namespaces. This excludes Ontology definitions
+
+# NOTE that nearly all other ontologies are are using properties of these three
+#      schemas, therefore it is strongly recommended to include such 
information!
+
+rdf:type | d=entityhub:ref
+
+rdfs:label 
+rdfs:comment
+rdfs:seeAlso | d=entityhub:ref
+
+
+owl:sameAs | d=entityhub:ref
+
+#If one likes to also index Ontologies one should add the following statements
+#owl:*
+#rdfs:*
+
+# --- Dublin Core (DC) ---
+# The default configuration imports all dc-terms data and copies vlaues for the
+# old dc-elements standard over to the according properties ofthe dc-terms
+#standard.
+
+# NOTE that a lot of other ontologies are also using DC for some of there data
+#      therefore it is strongly recommended to include such information!
+
+#mapping for all dc-terms properties
+dc:*
+
+# copy dc:title to rdfs:label
+dc:title > rdfs:label
+
+# deactivated by default, because such mappings are mapped to dc-terms
+#dc-elements:*
+
+# mappings for the dc-elements properties to the dc-terms
+dc-elements:contributor > dc:contributor
+dc-elements:coverage > dc:coverage
+dc-elements:creator > dc:creator
+dc-elements:date > dc:date
+dc-elements:description > dc:description
+dc-elements:format > dc:format
+dc-elements:identifier > dc:identifier
+dc-elements:language > dc:language
+dc-elements:publisher > dc:publisher
+dc-elements:relation > dc:relation
+dc-elements:rights > dc:rights
+dc-elements:source > dc:source
+dc-elements:subject > dc:subject
+dc-elements:title > dc:title
+dc-elements:type > dc:type
+#also use ec-elements:title as label
+dc-elements:title > rdfs:label
+
+# --- Social Networks (via foaf) ---
+#The Friend of a Friend schema often used to describe social relations between 
people
+foaf:*
+
+# copy the name of a person over to rdfs:label
+foaf:name > rdfs:label
+
+# additional data types checks
+foaf:knows | d=entityhub:ref
+foaf:made | d=entityhub:ref
+foaf:maker | d=entityhub:ref
+foaf:member | d=entityhub:ref
+foaf:homepage | d=xsd:anyURI
+foaf:depiction | d=xsd:anyURI
+foaf:img | d=xsd:anyURI
+foaf:logo | d=xsd:anyURI
+#page about the entity
+foaf:page | d=xsd:anyURI
+
+
+# --- Simple Knowledge Organization System (SKOS) ---
+
+# A common data model for sharing and linking knowledge organization systems 
+# via the Semantic Web. Typically used to encode controlled vocabularies auch 
as
+# a thesaurus  
+skos:*
+
+# copy the preferred label  over to rdfs:label
+skos:prefLabel > rdfs:label
+
+# copy values of **Match relations to the according related, broader and 
narrower
+skos:relatedMatch > skos:related
+skos:broadMatch > skos:broader
+skos:narrowMatch > skos:skos:narrower
+
+#similar mappings for transitive variants are not contained, because transitive
+#reasoning is not directly supported by the Entityhub.
+
+# Some SKOS thesaurus do use "skos:transitiveBroader" and 
"skos:transitiveNarrower"
+# however such properties are only intended to be used by reasoners to
+# calculate transitive closures over broader/narrower hierarchies.
+# see http://www.w3.org/TR/skos-reference/#L2413 for details
+# to correct such cases we will copy transitive relations to there counterpart
+skos:narrowerTransitive > skos:narrower
+skos:broaderTransitive > skos:broader
+
+
+# --- Semantically-Interlinked Online Communities (SIOC) ---
+
+# an ontology for describing the information in online communities. 
+# This information can be used to export information from online communities 
+# and to link them together. The scope of the application areas that SIOC can 
+# be used for includes (and is not limited to) weblogs, message boards, 
+# mailing lists and chat channels.
+sioc:*
+
+# --- biographical information (bio)
+# A vocabulary for describing biographical information about people, both 
living
+# and dead. (see http://vocab.org/bio/0.1/)
+bio:*
+
+# --- Rich Site Summary (rss) ---
+rss:*
+
+# --- GoodRelations (gr) ---
+# GoodRelations is a standardised vocabulary for product, price, and company 
data
+gr:*
+
+# --- Creative Commons Rights Expression Language (cc)
+# The Creative Commons Rights Expression Language (CC REL) lets you describe 
+# copyright licenses in RDF.
+cc:*
+
+# --- Additional namespaces added for the Europeana dataset 
(http://ckan.net/dataset/europeana-lod) ---
+http://www.europeana.eu/schemas/edm/*
+http://www.openarchives.org/ore/terms/*
+
+
+
+
+


Reply via email to