Author: rwesten
Date: Wed Jan 25 07:24:22 2012
New Revision: 1235653

URL: http://svn.apache.org/viewvc?rev=1235653&view=rev
Log:
Added GraphChain documentation,

Added:
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancer-graphchain-config.png
   (with props)
    
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/graphchain.mdtext

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancer-graphchain-config.png
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancer-graphchain-config.png?rev=1235653&view=auto
==============================================================================
Binary file - no diff available.

Propchange: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/enhancer-graphchain-config.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/graphchain.mdtext
URL: 
http://svn.apache.org/viewvc/incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/graphchain.mdtext?rev=1235653&view=auto
==============================================================================
--- 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/graphchain.mdtext
 (added)
+++ 
incubator/stanbol/site/trunk/content/stanbol/docs/trunk/enhancer/chains/graphchain.mdtext
 Wed Jan 25 07:24:22 2012
@@ -0,0 +1,66 @@
+Title: GraphChain
+
+
+
+### Configuration
+
+The GraphChain supports two variants to configure the ExecutionPlan
+
+#### GraphResource
+
+A GraphResource is a RDF file available via the DataFileProvider. The easiest 
way is to copy the RDF file defining the ExecutionPlan to the "/sling/datafile" 
directory within the Stanbol home directory. The configuration of the 
GraphChain needs than only to refer to that file such as:
+
+    stanbol.enhancer.chain.graph.graphresource=myExecutionPlan.rdf
+
+The used RDF encoding is guessed by the file extension. If the extension is 
not recognized the format can be also parsed as additional parameter
+
+    
stanbol.enhancer.chain.graph.graphresource=myExecutionPlan.something;format=application/rdf+xml
+
+The GraphCain will track for that file and activate itself as soon as the file 
gets available. Removing the file, waiting some seconds and providing the new 
version afterwards should also work. Just replacing the file will not work, 
because the DataFileProvider does not have supports for updates. In such cases 
it might be needed to deactivate/activate the GraphChain.
+
+#### ChainList
+
+This allows to directly configure the ExecutionPlan as value of the 
"stanbol.enhancer.chain.graph.chainlist" property. Both arrays and Collections 
are supported. 
+
+The Syntax is defined as follows:
+
+    {engine-name};[optional];[dependsOn={engine-name1},{engine-name2}]
+
+The following Example shows how this Syntax can be used to define an 
ExecutionPlan.
+
+    metaxa;optional
+    langId;dependsOn=metaxa
+    ner;dependsOn=langId
+    zemanta;optional
+    dbpedia-linking;dependsOn=ner
+    geonames;optional;dependsOn=ner
+    refactor;dependsOn=geonames,dbpedia-linking,zemanta
+
+Not that the internal oder of the list does not influence the resulting 
ExecutionPlan. Only the "dependsOn" properties are used to determine the 
execution order of the Engines and if Engines can be executed in parallel.
+
+Within an osgi configuration file 
(org.apache.stanbol.enhancer.chain.graph.impl.GraphChain-myGraphChain.config) 
this would look like
+
+    
stanbol.enhancer.chain.graph.chainlist=["metaxa;optional","langId;dependsOn\=metaxa","ner;dependsOn\=langId","zemanta;optional","dbpedia-linking;dependsOn\=ner","geonames;optional;dependsOn\=ner","refactor;dependsOn\=geonames,dbpedia-linking,zemanta"]
+
+A better visual expression provides this screenshot of the Apache Feilx 
Webconsole showing the dialog for the same configuration
+
+![GraphChain configuration Dialog with configured 
ChainList](enhancer-graphchain-config.png "A ChainList allows to define one 
ExecutionNodes per line. The ExecutionPlan is calculated based on the dependsOn 
properties. The ordering of the list element has no influence on the 
ExecutionPlan.")
+
+### Execution
+
+In contrast to other Chain implementation the ExecutionPlan must not be 
calculated but is directly parsed by the user. This provides the most possible 
freedom in defining how the execution should take place.
+
+#### Optional Engines
+
+The execution of optional engines is not mandatory. If they are not active or 
the execution fails the enhancement process continues. For users it is 
important to not that even Engines that depend on an optional Engine that was 
not executed will be called.
+
+Given the above example this means that even if the 'metaxa' engine can not be 
executed the 'langId' will be called by the EnhancementJobManager.
+
+#### Parallel Execution
+
+Engines are executed as soon as all Engines they dependOn have completed. This 
also includes if optional engines where skipped (because they are not active) 
or failed. This means that in most cases several EnhancementEngines can be 
executed in parallel.
+
+Given the above Example both the 'zemanta' and the 'metaxa' engine are 
executed as soon as the enhancement process starts.
+When 'metaxa' finished the 'langid' engine is called. After the 'langid' 
finishes its work the EnhancementJobManager calls the 'ner' engine. After that 
both the 'dbpedia-linking' and the 'geonames' engine are called. At this time 
three engines might run simultaneously assuming that 'zemanta' has not finished 
yet. Before the 'refactor' engine can be executed it need to wait for all this 
engines to complete.
+
+Note that for parallel execution to be activated both the used 
EnhancementJobManager and the different engines must support asynchronous 
enhancement.


Reply via email to