Author: rwesten
Date: Thu Jan 26 07:19:47 2012
New Revision: 1236056
URL: http://svn.apache.org/viewvc?rev=1236056&view=rev
Log:
Added Documentation for EnhancementChains (root), ChainManager, ListChain and
ExecutionPlan
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancer-listchain-config.png
(with props)
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/listchain.mdtext
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext?rev=1236056&view=auto
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
(added)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
Thu Jan 26 07:19:47 2012
@@ -0,0 +1,55 @@
+Title: ChainManager
+
+The ChainManager provides name based access to all active [Enhancement
Chain](enhancementchain.html) and there ServiceReferences. This interface is
typically used by components that need to lookup Chains based on there name.
However the ChainsTracker implementation can also be used to track specific
Chains.
+
+### ChainManager interface
+
+This is the Java API providing access to registered Chains in the ways as
described above. This interface includes the following methods:
+
+ /** Constant for the name of the DefaultChain */
+ DEFAULT_CHAIN_NAME : String
+ /** Getter for all names with active Chains */
+ getActiveChainNames() : Set<String>
+ /** Getter for the ServiceReference to the Chain
+ with a given name sorted by service ranking */
+ getReference(String name) : ServiceReference
+ /** Getter for all ServiceReferences to Chains
+ with a given name */
+ getReferences(String name)
+ /** Getter for the Chain with a given name */
+ + getChain(Stirng name) : Chain
+ /** Getter for all Chains with a given name sorted
+ by service ranking */
+ + getChains(String name) : List<Chain>
+ /** Getter for a Chain based on a service reference */
+ + getChain(ServiceReference ref) : Chain
+ /** Checks if there is a chain for the given name */
+ + isChain(String name) : boolean
+ /** Getter for the default chain */
+ + getDefault() : Chain
+
+There are two implementations of this interface available:
+
+#### ChainManager Service
+
+This is an implementation of the ChainManager interface that is registered as
OSGI service. It can be used e.g. by using the @Reference annotation
+
+ @Reference
+ ChainManager chainManager
+
+This service is provided by the "org.apache.stanbol.enhancer.chainmanger"
module and is included in all Stanbol launchers.
+
+#### ChainsTracker
+
+This is an Utility similar to the standard OSGI ServiceTracker that allows to
track some/all Chains. It also supports the usage of a ServiceTrackerCustomizer
so that users of that utility can directly react to changes of tracked Chains.
+
+ //track only "myChain" and "otherChain"
+ ChainsTracker tracker = new ChainsTracker(
+ context, "myChain","otherChain");
+ tracker.open(); //start tracking
+
+ //the tracker need to be closed if no longer needed
+ tracker.close()
+ tracker = null;
+
+For most users the ChainManager service is sufficient and preferable. Direct
use of the ChainTracker is only recommended if one needs only to track some
specific chains and especially if one needs to get notified an changes of such
chains.
\ No newline at end of file
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext?rev=1236056&view=auto
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
(added)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
Thu Jan 26 07:19:47 2012
@@ -0,0 +1,109 @@
+Title: Enhancement Chains
+
+An Enhancement Chain defines how Content parsed to the Stanbol Enhancer is
processed. More concrete it defines what engines and in what order are used to
process ContentItems. Chains are not responsible for the actual processing of
ContentItems. They provide the [ExecutionPlan](executionplan.html) to the
EnhancementJobManger that does the actual processing of the ContentItem.
+
+In the RESTful API enhancement chains can be accessed by there name under
+
+ http://{host}:{port}/{stanbol-path}/enhancer/chain/{chain-name}
+
+Enhancement requestes issued to
+
+ http://{host}:{port}/{stanbol-path}/enhancer
+ http://{host}:{port}/{stanbol-path}/engines
+
+are processed by using the default enhancement chain.
+
+When using the Java API Chains can be looked up as OSGI services. The the
[ChainManager](chainmanager.html) service is designed to ease this by providing
a API that allows to access Chains by their name. Because Chains are not
responsible to perform the actual execution but only provide the
[ExecutionPlan](executionplan.html) one needs to also lookup an
EnhancementJobManager instance to enhance a contentItem
+
+ @Reference
+ EnhancementJobManager jobManager;
+
+ @Reference
+ ChainManager chainManager;
+
+ //enhance a ContentItem ci
+ ContentItem ci;
+ //by using the Chain "demo"
+ String chainName;
+ Chain chain = chainManager.getChain(chainName);
+ if(chain != null){
+ jobManager.enhanceContent(ci,chain);
+ } else {
+ //Chain with name "demo" is not active
+ }
+ //the enhancement results are now available in the metadata
+ MGraph enhancementResults = ci.getMetadata();
+
+To enhance a ContentItem with the default chain the
"enhanceContent(ContentItem ci)" can be used.
+<
+## Chain Interface
+
+The Chain interface is very simplistic. It only defines three methods.
+
+ /** Getter for the name of the Chain */
+ + getName() : String
+ /** Getter for the execution plan */
+ + getExecutionPlan() : Graph
+ /** Getter for the name of the Engines referenced by this Chain */
+ + getEngines() : Set<String>
+ /** Constant for the property used to for the name of the Chain */
+ + PROPERTY_NAME : String
+
+Each Chain has an name assigned. This is typically provided by the chain
configuration and MUST me set as value to the property
"stanbol.enhancer.chain.name" of the service registration. The getter for the
name MUST return the same value. Chain implementation will usually get the name
typically by calling
+
+ this.name = (String)ComponentContext.getProperties(Chain.PROPERTY_NAME);
+
+within the activate method of the Chain. There is also an AbstractChain
implementation provided by the servicesapi module of the Stanbol Enhancer that
already implements this functionality.
+
+The getEngines method returns the name of all EnhancementEngines referenced by
a Chain. Note that this method returns a Set. This method is intended to allow
fast access to the referenced engines and does not provide any information
about the execution order.
+
+Components that need to know the details about a Chain need to process the
[ExecutionPlan](executionplan.html) returned by the getExectuonPlan() method.
The [ExecutionPlan](executionplan.html) is represented as an RDF graph
following the ExecutionPlan Ontology. It formally describes how a ContentItem
must be processed by the EnhancementJobManager. For details see the
documentation for the [ExecutionPlan](executionplan.html).
+
+For Chain implementation it is important that the returned Graph holding the
execution plan MUST BE read-only AND final. Meaning that a change in the
configuration of a Chain MUST NOT change the graph returned by calls to the
getExecutionPlan method.
+
+Because the configuration of a Chain might change at any time
EnhancementJobManager implementation MUST retrieve the execution plan once and
than use this instance for the whole enhancement process. Because of the above
requirement that the execution plan is stored in an read-only and final Graph
this ensures that the plan can not change even for long lasting enhancement
processes. Therefore any change to the configuration of a chain will not
influence ongoing enhancement processes.
+
+## Enhancement Chain Management
+
+This section describes how Enhancement Cahins are managed by the Stanbol
Enhancer and how they can be selected/accessed. It also describes how the
"default" Chain is determined.
+
+For every Stanbol Enhancer a single Chain MUST BE present. If this is not the
case enhance request MUST throw a ChainException with an according error
message. However typically multiple EnhancementChains will be configured.
+
+### Chain Name Conflicts
+
+Chains are identified by the value of the "stanbol.enhancer.chain.name"
property - the name of the chain. If more than one Chain do use the same name,
than the normal OSGI procedure to select the default service is used. This
means that
+
+1. the Chain with the highest "service.ranking" and
+2. the Chain with the lowest "service.id"
+
+will be selected on requests for a given Chain name. Via the RESTful service
API there is no possibility to call the other chains for a given name. However
the ChainManager interface allows to access all registered services for a given
name.
+
+### Default Chain
+
+The second important concept of the Chain management is the definition of the
"default chain". The default Chin is used for enhancement requests that do not
specify a Chain. This is true for requests to the "/engines" and "/enhancer"
RESTful services as well as API calls to the
"EnhancementJobManager.enhanceContent(ContentItem ci)" method.
+
+The default Chain is determined by the following rules:
+
+1. the Chain with the name "default". If more than one Chain is present with
that name, than the above rules for resolving name conflicts apply. If none
+2. the Chain with the highest "service.ranking". If several have the same
ranking
+3. the Cahin with the lowest "service.id"
+
+If no chain is active a ChainException with an according message MUST BE
thrown.
+
+All Stanbol launchers are configured with the [Default
Chain](defaultchain.html) enabled. This registers itself with the name
"default" and the lowest possible service ranking - Integer.MIN_VALUE. This
default provides a Chain that considered all currently active
EnhancementEngines and sorts them based on there ordering information (see the
[Calculation of the Execution Plan based on the EnhancementEngine
Ordering](weightedchain.html#calculation_of_the_executionplan) for details).
+
+### [ChainManager interface](chainmanager.html)
+
+This is the management interface for EnhancementChains that can be used by
components to lookup chains based on there name. It also provides a getter for
the default chain. There is also OSGI ServiceTracker like implementation that
can be used to track only chains with specific names and to get even notified
on any change of such chains.
+
+## Chain implementations
+
+The following Chain implementations are included within the default Stanbol
Enhancer distribution:
+
+* __[DefaultChain](defaultchain.html)__: Implementation that includes all
currently active EnhancementEngine. If enabled it registers itself under the
name "default" with the service ranking Integer.MIN_VALUE. This makes this
chain to the default chain as long users do not deactivate it or register an
other chain with the name "default".
+* __[ListChain](listchain.html)__: Implementation that creates the
ExecutionPlan by chaining the EnhancementEngines in the exact order as
specified by the parsed list. This Chain does not support parallel execution of
engines.
+* __[WeightedChain](weightedchain.html)__: This Chain implementation takes a
List of Engines names as input and uses the
"org.apache.stanbol.enhancer.engine.order " metadata provided by such engines
to calculate the ExecutionGraph.
+* __[GraphChain](graphchain.html)__: This Chain implementation is based on a
ExecutionGraph parsed os configuration.
+* __SingleEngineChain__: An Adapter that allows to execute a single
EnhancementEngine within a Chain. This types of Chains will not be registered
as OSGI service. Instances will be created on request for single
EnhancementEngines and directly parsed to the EnhancementJobManager
implementation.
+
+
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancer-listchain-config.png
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancer-listchain-config.png?rev=1236056&view=auto
==============================================================================
Binary file - no diff available.
Propchange:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancer-listchain-config.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext?rev=1236056&view=auto
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
(added)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
Thu Jan 26 07:19:47 2012
@@ -0,0 +1,96 @@
+Title: ExecutionPlan
+
+The ExecutionPlan is represented as an RDF graph following the ExecutionPlan
Ontology. It needs to be provided by the [Enhancement
Chain](enhancementchain.html) and is used by the EnhancementJobManager to
enhance ContentItems.
+
+## ExecutionPlan Ontology
+
+The RDFS schema used for the execution plan is defined as follows.
+
+ * Namespace: ep : http://stanbol.apache.org/ontology/enhancer/executionplan#
+ * __ep:ExecutionPlan__ : Represent an execution plan defined by all linked
execution nodes.
+ * __ep:hasExecutionNode__ (domain: ep:ExecutionPlan; range:
ep:ExecutionNode; inverseOf: ep:inExecutionPlan): links the execution plan with
all the execution nodes.
+ * __ep:chain__ (domain: ep:ExecutionPlan; range: xsd:string): The name of
the Chain this execution plan is used for.
+ * __ep:ExecutionNode__ : Class used for all Nodes representing the execution
of an Enhancement Engine.
+ * __ep:inExecutionPlan__ (domain: ep:ExecutionNode; range:
ep:ExecutionPlan ;inverseOf: ep:hasExecutionNode): functional property that
links the execution node with an execution plan
+ * __ep:engine__ (domain: ep:ExecutionNode; range: xsd:string): The
property used to link to the Enhancement Engine by the name of the engine.
+ * __ep:dependsOn__ (domain: ep:ExecutionNode; range: ep:ExecutionNode)
Defines that the execution of this node depends on the completion of the
referenced one.
+ * __ep:optional__ (domain: ep:ExecutionNode; range: xsd:boolean) Can be
used to specify that the execution of this EnhancementEngine is optional. If
this property is set to TRUE an engine will be marked as executed even if it
execution was not possible (e.g. because an engine with this name was not
active) or the execution failed (e.g. because of the Exception).
+
+Note the the data for the ep:ExecutionPlan and the
ep:hasExecutionNode/ep:inExecutionPlan typically need not to be parsed as
configuration of a Chain. This information are typically automatically added
based on the assumption that all ep:ExecutionNode parsed in the configuration
for a chain are member of the execution plan for such chain. Therefore this
information is typically added by the Chain itself when the configuration is
parsed and validated.
+
+#### Example:
+
+This example shows an ExecutionPlan with three nodes for the "langId", "ner",
"dbpediaLinking" "geonamesLinking" and "zemanta" engine. Note that this names
refer to actual EnhancementEngine Services registered with the current OSGI
Environment.
+
+This example assumes that
+
+* "langId" is the singleton instance of LangIdEnhancementEngine
+* "ner" is the default instance of the NamedEntityExtractionEnhancementEngine
engine
+* "dbpediaLinking" is an instance of the NamedEntityTaggingEngine configured
to use the dbpedia.org ReferencedSite of the Entityhub
+* "geonamesLinking" is an instance of the NamedEntityTaggingEngine configured
to use the geonames.org ReferencedSite
+* "zemanta" is the singleton instance of the ZemantaEnhancementEngine
+
+The RDF graph of such a chain would look:
+
+ urn:execPlan
+ rdf:type ep:ExecutionPlan
+ ep:hasExecutionNode urn:node1, urn:node2, urn:node3, urn:node4,
urn:node5
+ ep:chain "demoChain"
+
+ urn:node1
+ rdf:type stanbol:ExecutionNode
+ ep:inExecutionPlan urn:execPlan
+ stanbol:engine langId
+
+ urn:node2
+ rdf:type ep:ExecutionNode
+ ep:inExecutionPlan urn:execPlan
+ ep:dependsOn urn:node1
+ ep:engine ner
+
+ urn:node3
+ rdf:type ep:ExecutionNode
+ ep:inExecutionPlan urn:execPlan
+ ep:dependsOn urn:node1
+ ep:engine dbpediaLinking
+
+ urn:node4
+ rdf:type ep:ExecutionNode
+ ep:inExecutionPlan urn:execPlan
+ ep:dependsOn urn:node1
+ ep:engine geonamesLinking
+
+ urn:node5
+ rdf:type ep:ExecutionNode
+ ep:inExecutionPlan urn:execPlan
+ ep:engine zemanta
+ ep:optional "true"^^xsd:boolean
+
+This plan defines that the "langId" and the "zemanta" engine do not depend on
anything and can therefore be executed from the start (even in parallel if the
JobManager execution this chains supports this). The execution of the "ner"
engine depends on the extraction of the language and the execution of the
entity linking to dbpedia and geonames depends on the "ner" engine. Note that
the execution of the "dbpediaLinking" and "geonamesLinking" could be also
processed in parallel.
+
+
+#### ExecutionPlan Utility:
+
+The Enhancer MUST also define an Utility that provides the following utility
+
+ /** Getter for the list of executable ep:ExecutionNodes */
+ + getExecuteable(Graph executionPlan, Set<NonLiteral> completed) :
Collection<NonLiteral>
+
+This method takes an execution plan and the list of already executed nodes as
input and return the list of ExecutionNodes that can be executed next. The
existing utility methods within the EnhancementEngineHelper can be used to
retrieve further information from the ex:ExecutionNode's returned by this
method.
+
+Typically code using this utility will look like this (pseudo code)
+
+ Graph executionPlan = chain.getExecuctionPlan();
+ Map<String, EnhancementEngine> engines =
enhancementEngineManager.getActiveEngines(chain);
+ Collection<NonLiteral> executed = new HashSet<NonLiteral>();
+ Collection<NonLiteral> next;
+ while(!(next = ExecutionPlanUtils.getExecuteable(plan,
executed)).isEmpty()){
+ for(NonLiteral node : next){
+ EnhancementEngine engine = engines.get(
+ EnhancementEngineHelper.getString(executionPlan,node,
EX_ENGINE));
+ Boolean optional = EnhancementEngineHelper.get(
+ executionPlan,node,EX_OPTIONAL,Boolean.class,literalFactory);
+ /* Execute the Engine */
+ completed.add(node);
+ }
+ }
Added:
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/listchain.mdtext
URL:
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/listchain.mdtext?rev=1236056&view=auto
==============================================================================
---
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/listchain.mdtext
(added)
+++
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/listchain.mdtext
Thu Jan 26 07:19:47 2012
@@ -0,0 +1,33 @@
+Title: ListChain
+
+The List Chain creates the ExecutionPlan based on the exact order of the
configured EnhancementEngines. This provides users with a simple possibility
configure the exact oder in that the referenced EnhancementEngines are called
during the enhancement process of a content item. However the ListChain can not
support parallel execution of engines a considerable disadvantage in contrast
to the [GraphChain](graphchain.html).
+
+A typical usage scenario would be that users start of with configuring a
ListChain and later optimize the execution by migrating functional
configuration to [GraphChain](graphchain.html)s.
+
+### Configuration
+
+The property "stanbol.enhancer.chain.list.enginelist" is used to provide the
list of engine names. This configuration MUST BE parsed as an Array as string
because the ordering if the configured entries is central for the configuration.
+
+In addition it is possible to define Engines as optional. This allows to
specify that the enhancement process should not fail if an Engine is not active
or fails while processing a content item.
+
+The syntax to define an Engine as optional is as follows
+
+ <name>;optional
+ <name>;optional=true
+
+both variants result that the execution of the engine with the name <name> is
optional.
+
+The following figure shows the configuration dialog for ListCahins as provided
by the Apache Felix Web Console.
+
+
+
+It is also possible to configure a ListChain by directly installing a
configuration with the name "{classname}-{configName}.config". Note that the
{configName} needs not to be the same as the name of the chain. The
{configName} is just used by the OSGI environment to distinguish different
configuration for {classname}.
+
+To create the same configuration as in the above screenshot the file would
need to look like this:
+
+ stanbol.enhancer.chain.name="list"
+
stanbol.enhancer.chain.list.enginelist=["metaxa;optional","langid","ner","dbpediaLinking"]
+
+### Calculation of the ExecutionPlan
+
+The ExecutionPlan is created based on the exact order of the
EnhancementEngines provided by the "stanbol.enhancer.chain.list.enginelist"
property. The configuration MUST contain at least a single engine. In addition
no engine MUST be mentioned twice.