chains: chainmanager.mdtext enhancementchain.mdtext enhancer-listchain-config.png executionplan.mdtext listchain.mdtext

rwesten Wed, 25 Jan 2012 23:20:24 -0800

Author: rwesten
Date: Thu Jan 26 07:19:47 2012
New Revision: 1236056

URL: http://svn.apache.org/viewvc?rev=1236056&view=rev
Log:
Added Documentation for EnhancementChains (root), ChainManager, ListChain and 
ExecutionPlan


Added:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancer-listchain-config.png
   (with props)
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/listchain.mdtext

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext?rev=1236056&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/chainmanager.mdtext
 Thu Jan 26 07:19:47 2012
@@ -0,0 +1,55 @@
+Title: ChainManager
+
+The ChainManager provides name based access to all active [Enhancement 
Chain](enhancementchain.html) and there ServiceReferences. This interface is 
typically used by components that need to lookup Chains based on there name. 
However the ChainsTracker implementation can also be used to track specific 
Chains.
+
+### ChainManager interface
+
+This is the Java API providing access to registered Chains in the ways as 
described above. This interface includes the following methods:
+
+    /** Constant for the name of the DefaultChain */
+    DEFAULT_CHAIN_NAME : String
+    /** Getter for all names with active Chains */
+    getActiveChainNames() : Set<String>
+    /** Getter for the ServiceReference to the Chain 
+        with a given name sorted by service ranking */
+    getReference(String name) : ServiceReference
+    /** Getter for all ServiceReferences to Chains 
+        with a given name */
+    getReferences(String name)
+    /** Getter for the Chain with a given name */
+    + getChain(Stirng name) : Chain
+    /** Getter for all Chains with a given name sorted 
+        by service ranking */
+    + getChains(String name) : List<Chain>
+    /** Getter for a Chain based on a service reference */
+    + getChain(ServiceReference ref) : Chain
+    /** Checks if there is a chain for the given name */
+    + isChain(String name) : boolean
+    /** Getter for the default chain */
+    + getDefault() : Chain
+
+There are two implementations of this interface available:
+
+#### ChainManager Service
+
+This is an implementation of the ChainManager interface that is registered as 
OSGI service. It can be used e.g. by using the @Reference annotation
+
+    @Reference
+    ChainManager chainManager
+
+This service is provided by the "org.apache.stanbol.enhancer.chainmanger" 
module and is included in all Stanbol launchers.
+
+#### ChainsTracker
+
+This is an Utility similar to the standard OSGI ServiceTracker that allows to 
track some/all Chains. It also supports the usage of a ServiceTrackerCustomizer 
so that users of that utility can directly react to changes of tracked Chains.
+
+    //track only "myChain" and "otherChain"
+    ChainsTracker tracker = new ChainsTracker(
+        context, "myChain","otherChain");
+    tracker.open(); //start tracking
+ 
+    //the tracker need to be closed if no longer needed
+    tracker.close()
+    tracker = null;
+
+For most users the ChainManager service is sufficient and preferable. Direct 
use of the ChainTracker is only recommended if one needs only to track some 
specific chains and especially if one needs to get notified an changes of such 
chains.
\ No newline at end of file

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext?rev=1236056&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancementchain.mdtext
 Thu Jan 26 07:19:47 2012
@@ -0,0 +1,109 @@
+Title: Enhancement Chains
+
+An Enhancement Chain defines how Content parsed to the Stanbol Enhancer is 
processed. More concrete it defines what engines and in what order are used to 
process ContentItems. Chains are not responsible for the actual processing of 
ContentItems. They provide the [ExecutionPlan](executionplan.html) to the 
EnhancementJobManger that does the actual processing of the ContentItem.
+
+In the RESTful API enhancement chains can be accessed by there name under
+
+    http://{host}:{port}/{stanbol-path}/enhancer/chain/{chain-name}
+
+Enhancement requestes issued to 
+
+    http://{host}:{port}/{stanbol-path}/enhancer
+    http://{host}:{port}/{stanbol-path}/engines
+
+are processed by using the default enhancement chain.
+
+When using the Java API Chains can be looked up as OSGI services. The the 
[ChainManager](chainmanager.html) service is designed to ease this by providing 
a API that allows to access Chains by their name. Because Chains are not 
responsible to perform the actual execution but only provide the 
[ExecutionPlan](executionplan.html) one needs to also lookup an 
EnhancementJobManager instance to enhance a contentItem
+
+    @Reference
+    EnhancementJobManager jobManager;
+
+    @Reference
+    ChainManager chainManager;
+
+    //enhance a ContentItem ci 
+    ContentItem ci;
+    //by using the Chain "demo"
+    String chainName;
+    Chain chain = chainManager.getChain(chainName);
+    if(chain != null){
+        jobManager.enhanceContent(ci,chain);
+    } else {
+        //Chain with name "demo" is not active
+    }
+    //the enhancement results are now available in the metadata
+    MGraph enhancementResults = ci.getMetadata();
+
+To enhance a ContentItem with the default chain the 
"enhanceContent(ContentItem ci)" can be used.
+<
+## Chain Interface
+
+The Chain interface is very simplistic. It only defines three methods.
+
+    /** Getter for the name of the Chain */
+    + getName() : String
+    /** Getter for the execution plan */
+    + getExecutionPlan() : Graph
+    /** Getter for the name of the Engines referenced by this Chain */
+    + getEngines() : Set<String>
+    /** Constant for the property used to for the name of the Chain */
+    + PROPERTY_NAME : String
+
+Each Chain has an name assigned. This is typically provided by the chain 
configuration and MUST me set as value to the property 
"stanbol.enhancer.chain.name" of the service registration. The getter for the 
name MUST return the same value. Chain implementation will usually get the name 
typically by calling
+
+   this.name = (String)ComponentContext.getProperties(Chain.PROPERTY_NAME);
+
+within the activate method of the Chain. There is also an AbstractChain 
implementation provided by the servicesapi module of the Stanbol Enhancer that 
already implements this functionality.
+
+The getEngines method returns the name of all EnhancementEngines referenced by 
a Chain. Note that this method returns a Set. This method is intended to allow 
fast access to the referenced engines and does not provide any information 
about the execution order.
+
+Components that need to know the details about a Chain need to process the 
[ExecutionPlan](executionplan.html) returned by the getExectuonPlan() method. 
The [ExecutionPlan](executionplan.html) is represented as an RDF graph 
following the ExecutionPlan Ontology. It formally describes how a ContentItem 
must be processed by the EnhancementJobManager. For details see the 
documentation for the [ExecutionPlan](executionplan.html).
+
+For Chain implementation it is important that the returned Graph holding the 
execution plan MUST BE read-only AND final. Meaning that a change in the 
configuration of a Chain MUST NOT change the graph returned by calls to the 
getExecutionPlan method.
+
+Because the configuration of a Chain might change at any time 
EnhancementJobManager implementation MUST retrieve the execution plan once and 
than use this instance for the whole enhancement process. Because of the above 
requirement that the execution plan is stored in an read-only and final Graph 
this ensures that the plan can not change even for long lasting enhancement 
processes. Therefore any change to the configuration of a chain will not 
influence ongoing enhancement processes.
+
+## Enhancement Chain Management
+
+This section describes how Enhancement Cahins are managed by the Stanbol 
Enhancer and how they can be selected/accessed. It also describes how the 
"default" Chain is determined.
+
+For every Stanbol Enhancer a single Chain MUST BE present. If this is not the 
case enhance request MUST throw a ChainException with an according error 
message. However typically multiple EnhancementChains will be configured. 
+
+### Chain Name Conflicts
+
+Chains are identified by the value of the "stanbol.enhancer.chain.name" 
property - the name of the chain. If more than one Chain do use the same name, 
than the normal OSGI procedure to select the default service is used. This 
means that
+
+1. the Chain with the highest "service.ranking" and
+2. the Chain with the lowest "service.id"
+
+will be selected on requests for a given Chain name. Via the RESTful service 
API there is no possibility to call the other chains for a given name. However 
the ChainManager interface allows to access all registered services for a given 
name.
+
+### Default Chain
+
+The second important concept of the Chain management is the definition of the 
"default chain". The default Chin is used for enhancement requests that do not 
specify a Chain. This is true for requests to the "/engines" and "/enhancer" 
RESTful services as well as API calls to the 
"EnhancementJobManager.enhanceContent(ContentItem ci)" method.
+
+The default Chain is determined by the following rules:
+
+1. the Chain with the name "default". If more than one Chain is present with 
that name, than the above rules for resolving name conflicts apply. If none
+2. the Chain with the highest "service.ranking". If several have the same 
ranking
+3. the Cahin with the lowest "service.id"
+
+If no chain is active a ChainException with an according message MUST BE 
thrown.
+
+All Stanbol launchers are configured with the [Default 
Chain](defaultchain.html) enabled. This registers itself with the name 
"default" and the lowest possible service ranking - Integer.MIN_VALUE. This 
default provides a Chain that considered all currently active 
EnhancementEngines and sorts them based on there ordering information (see the 
[Calculation of the Execution Plan based on the EnhancementEngine 
Ordering](weightedchain.html#calculation_of_the_executionplan) for details).
+
+### [ChainManager interface](chainmanager.html)
+
+This is the management interface for EnhancementChains that can be used by 
components to lookup chains based on there name. It also provides a getter for 
the default chain. There is also OSGI ServiceTracker like implementation that 
can be used to track only chains with specific names and to get even notified 
on any change of such chains.
+
+## Chain implementations
+
+The following Chain implementations are included within the default Stanbol 
Enhancer distribution:
+
+* __[DefaultChain](defaultchain.html)__: Implementation that includes all 
currently active EnhancementEngine. If enabled it registers itself under the 
name "default" with the service ranking Integer.MIN_VALUE. This makes this 
chain to the default chain as long users do not deactivate it or register an 
other chain with the name "default".
+* __[ListChain](listchain.html)__: Implementation that creates the 
ExecutionPlan by chaining the EnhancementEngines in the exact order as 
specified by the parsed list. This Chain does not support parallel execution of 
engines.
+* __[WeightedChain](weightedchain.html)__: This Chain implementation takes a 
List of Engines names as input and uses the 
"org.apache.stanbol.enhancer.engine.order " metadata provided by such engines 
to calculate the ExecutionGraph.
+* __[GraphChain](graphchain.html)__: This Chain implementation is based on a 
ExecutionGraph parsed os configuration.
+* __SingleEngineChain__: An Adapter that allows to execute a single 
EnhancementEngine within a Chain. This types of Chains will not be registered 
as OSGI service. Instances will be created on request for single 
EnhancementEngines and directly parsed to the EnhancementJobManager 
implementation. 
+
+

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancer-listchain-config.png
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancer-listchain-config.png?rev=1236056&view=auto
==============================================================================
Binary file - no diff available.

Propchange: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancer-listchain-config.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext?rev=1236056&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/executionplan.mdtext
 Thu Jan 26 07:19:47 2012
@@ -0,0 +1,96 @@
+Title: ExecutionPlan
+
+The ExecutionPlan is represented as an RDF graph following the ExecutionPlan 
Ontology. It needs to be provided by the [Enhancement 
Chain](enhancementchain.html) and is used by the EnhancementJobManager to 
enhance ContentItems.
+
+## ExecutionPlan Ontology
+
+The RDFS schema used for the execution plan is defined as follows.
+
+ * Namespace: ep : http://stanbol.apache.org/ontology/enhancer/executionplan#
+ * __ep:ExecutionPlan__ : Represent an execution plan defined by all linked 
execution nodes.
+     * __ep:hasExecutionNode__ (domain: ep:ExecutionPlan; range: 
ep:ExecutionNode; inverseOf: ep:inExecutionPlan): links the execution plan with 
all the execution nodes.
+     * __ep:chain__ (domain: ep:ExecutionPlan; range: xsd:string): The name of 
the Chain this execution plan is used for.
+ * __ep:ExecutionNode__ : Class used for all Nodes representing the execution 
of an Enhancement Engine.
+     * __ep:inExecutionPlan__ (domain: ep:ExecutionNode; range: 
ep:ExecutionPlan ;inverseOf: ep:hasExecutionNode): functional property that 
links the execution node with an execution plan
+     * __ep:engine__ (domain: ep:ExecutionNode; range: xsd:string): The 
property used to link to the Enhancement Engine by the name of the engine.
+     * __ep:dependsOn__ (domain: ep:ExecutionNode; range: ep:ExecutionNode) 
Defines that the execution of this node depends on the completion of the 
referenced one.
+     * __ep:optional__ (domain: ep:ExecutionNode; range: xsd:boolean) Can be 
used to specify that the execution of this EnhancementEngine is optional. If 
this property is set to TRUE an engine will be marked as executed even if it 
execution was not possible (e.g. because an engine with this name was not 
active) or the execution failed (e.g. because of the Exception). 
+
+Note the the data for the ep:ExecutionPlan and the 
ep:hasExecutionNode/ep:inExecutionPlan typically need not to be parsed as 
configuration of a Chain. This information are typically automatically added 
based on the assumption that all ep:ExecutionNode parsed in the configuration 
for a chain are member of the execution plan for such chain. Therefore this 
information is typically added by the Chain itself when the configuration is 
parsed and validated.
+
+#### Example:
+
+This example shows an ExecutionPlan with three nodes for the "langId", "ner", 
"dbpediaLinking" "geonamesLinking" and "zemanta" engine. Note that this names 
refer to actual EnhancementEngine Services registered with the current OSGI 
Environment.
+
+This example assumes that
+
+* "langId" is the singleton instance of LangIdEnhancementEngine
+* "ner" is the default instance of the NamedEntityExtractionEnhancementEngine 
engine
+* "dbpediaLinking" is an instance of the NamedEntityTaggingEngine configured 
to use the dbpedia.org ReferencedSite of the Entityhub
+* "geonamesLinking" is an instance of the NamedEntityTaggingEngine configured 
to use the geonames.org ReferencedSite
+* "zemanta" is the singleton instance of the ZemantaEnhancementEngine
+
+The RDF graph of such a chain would look:
+
+    urn:execPlan
+        rdf:type ep:ExecutionPlan
+        ep:hasExecutionNode urn:node1, urn:node2, urn:node3, urn:node4, 
urn:node5
+        ep:chain "demoChain"
+
+    urn:node1
+        rdf:type stanbol:ExecutionNode
+        ep:inExecutionPlan urn:execPlan
+        stanbol:engine langId
+
+    urn:node2
+        rdf:type ep:ExecutionNode
+        ep:inExecutionPlan urn:execPlan
+        ep:dependsOn urn:node1
+        ep:engine ner
+
+    urn:node3
+        rdf:type ep:ExecutionNode
+        ep:inExecutionPlan urn:execPlan
+        ep:dependsOn urn:node1
+        ep:engine dbpediaLinking
+
+    urn:node4
+        rdf:type ep:ExecutionNode
+        ep:inExecutionPlan urn:execPlan
+        ep:dependsOn urn:node1
+        ep:engine geonamesLinking
+
+    urn:node5
+        rdf:type ep:ExecutionNode
+        ep:inExecutionPlan urn:execPlan
+        ep:engine zemanta
+        ep:optional "true"^^xsd:boolean
+
+This plan defines that the "langId" and the "zemanta" engine do not depend on 
anything and can therefore be executed from the start (even in parallel if the 
JobManager execution this chains supports this). The execution of the "ner" 
engine depends on the extraction of the language and the execution of the 
entity linking to dbpedia and geonames depends on the "ner" engine. Note that 
the execution of the "dbpediaLinking" and "geonamesLinking" could be also 
processed in parallel.
+
+
+#### ExecutionPlan Utility:
+
+The Enhancer MUST also define an Utility that provides the following utility
+    
+    /** Getter for the list of executable ep:ExecutionNodes */
+    + getExecuteable(Graph executionPlan, Set<NonLiteral> completed) : 
Collection<NonLiteral>
+
+This method takes an execution plan and the list of already executed nodes as 
input and return the list of ExecutionNodes that can be executed next. The 
existing utility methods within the EnhancementEngineHelper can be used to 
retrieve further information from the ex:ExecutionNode's returned by this 
method.
+
+Typically code using this utility will look like this (pseudo code)
+
+    Graph executionPlan = chain.getExecuctionPlan();
+    Map<String, EnhancementEngine> engines = 
enhancementEngineManager.getActiveEngines(chain);
+    Collection<NonLiteral> executed = new HashSet<NonLiteral>();
+    Collection<NonLiteral> next;
+    while(!(next = ExecutionPlanUtils.getExecuteable(plan, 
executed)).isEmpty()){
+        for(NonLiteral node : next){
+            EnhancementEngine engine = engines.get(
+                EnhancementEngineHelper.getString(executionPlan,node, 
EX_ENGINE));
+            Boolean optional = EnhancementEngineHelper.get(
+                executionPlan,node,EX_OPTIONAL,Boolean.class,literalFactory);
+            /* Execute the Engine */
+            completed.add(node);
+        }
+    }

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/listchain.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/listchain.mdtext?rev=1236056&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/listchain.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/listchain.mdtext
 Thu Jan 26 07:19:47 2012
@@ -0,0 +1,33 @@
+Title: ListChain
+
+The List Chain creates the ExecutionPlan based on the exact order of the 
configured EnhancementEngines. This provides users with a simple possibility 
configure the exact oder in that the referenced EnhancementEngines are called 
during the enhancement process of a content item. However the ListChain can not 
support parallel execution of engines a considerable disadvantage in contrast 
to the [GraphChain](graphchain.html).
+
+A typical usage scenario would be that users start of with configuring a 
ListChain and later optimize the execution by migrating functional 
configuration to [GraphChain](graphchain.html)s.
+
+### Configuration
+
+The property "stanbol.enhancer.chain.list.enginelist" is used to provide the 
list of engine names. This configuration MUST BE parsed as an Array as string 
because the ordering if the configured entries is central for the configuration.
+
+In addition it is possible to define Engines as optional. This allows to 
specify that the enhancement process should not fail if an Engine is not active 
or fails while processing a content item.
+
+The syntax to define an Engine as optional is as follows
+
+    <name>;optional
+    <name>;optional=true
+
+both variants result that the execution of the engine with the name <name> is 
optional.
+
+The following figure shows the configuration dialog for ListCahins as provided 
by the Apache Felix Web Console.
+
+![Configuration Dialog for the ListChain](enhancer-listchain-config.png 
"Sceenshot of the Configuration Dialog for a ListChain with required and 
optional Engines")
+
+It is also possible to configure a ListChain by directly installing a 
configuration with the name "{classname}-{configName}.config". Note that the 
{configName} needs not to be the same as the name of the chain. The 
{configName} is just used by the OSGI environment to distinguish different 
configuration for {classname}.
+
+To create the same configuration as in the above screenshot the file would 
need to look like this:
+
+    stanbol.enhancer.chain.name="list"
+    
stanbol.enhancer.chain.list.enginelist=["metaxa;optional","langid","ner","dbpediaLinking"]
+
+### Calculation of the ExecutionPlan
+
+The ExecutionPlan is created based on the exact order of the 
EnhancementEngines provided by the "stanbol.enhancer.chain.list.enginelist" 
property. The configuration MUST contain at least a single engine. In addition 
no engine MUST be mentioned twice.

svn commit: r1236056 - in /incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains: chainmanager.mdtext enhancementchain.mdtext enhancer-listchain-config.png executionplan.mdtext listchain.mdtext

Reply via email to