[ 
https://issues.apache.org/jira/browse/STANBOL-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287496#comment-13287496
 ] 

Suat Gonul commented on STANBOL-498:
------------------------------------

With my last commit, I have checked in the file based implementation for the 
Store interface. That commit also includes a Apache Derby based revision 
management mechanism and serialization/deserialization mechanism for content 
parts. Comments are very welcome for the implementation. 

>From now on, I'm planning to write some tests and afterwards continue with the 
>indexing part.
                
> Contenthub: Enhanced ContentItem Store
> --------------------------------------
>
>                 Key: STANBOL-498
>                 URL: https://issues.apache.org/jira/browse/STANBOL-498
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Content Hub
>            Reporter: Rupert Westenthaler
>
> Simple Storage interface for enhanced ContentItems.
> This Store is used to
> 1. save the ContentItems after they are enhanced by the Enahncer
>     * The Blobs (original content and transcoded versions)
>     * The Metadata (Enhancement Results)
> 2. retrieve ContentItems while semantic indexing
>     * Iterator over the IDs
>     * Get ContentItem by ID
> This store is NOT intended to be used for search! It is only used for ID 
> based lookup.
> Implementations:
> -----------------------
>  * CMS Adapter: An implementation based on the CMS Adapter provides the 
> possibility to store the Enhancement Results directly within the CMS. 
> Typically this will be the CMS also sending the request to the Contenthub, 
> but this is no requirement.
>  * Clerezza based implementation: Clerezza - as RDF based CMS - provides the 
> required functionality to store both the content AND the metadata of the 
> contentItem
> * File based: Simple file based storage without any external dependencies. 
> This could be used as default and for testing
> Interface:
> -------------
> The interface will be based-on/replace the 
> [Store](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/Store.java)
>  interface already present in the Contenthub. However the suggestion is to 
> remove the "getEnhancementGraph()" as this is not required by the usecases 
> (1) and (2) mentioned above. In addition the store interface should be 
> extended with a remove method to allow manual deletion of ContentItems.
>     /** stores the parsed ContentItem */
>     + put(ContentItem ci) : UriRef
>     /** Getter for the ContentItem with the parsed ID */
>     + get(UriRef id) : ContentItem
> ### Revisions
> Revisions are used to re-synchronize semantic indexes with the enhanced 
> ContentItems managed by this store. Every time the ContentHub indexes 
> enhanced ContentItem - as managed by this store - to a SemanticIndex it 
> provides the highest revision. SemanticIndexes MUST persist such revisions 
> and MUST ensure they are even available after a re-start because this number 
> will be later used by the ContentHub to apply changes to enhances 
> ContentItmes.
> In detail a revision is defined as a change (add, update, removal) to one or 
> more ContentItems managed by the Store. Every such change MUST BE result in 
> an increase of the revision. Revisions MUST only use positive numbers. 
> Implementers might use <code>System.currentTimeMillis()</code> as revision 
> but this is no requirement.
> The store interface provides a method that returns an Iterator over all 
> changed ContentItems that where changed (added, updated, removed) since a 
> given revision. 
>     /** Iterator over all contentItems added/removed after revision */
>     + changes(long revision, int offset, int batchSize) : ChangeSet
>     class ChangeSet {
>         /** the lowest included revision */
>         + from() : long
>         /** the id of changed ContentItems */
>         + changed() : Map<UriRef>
>         /** the highest included revision */
>         + to() : long
>     }
> Calls to chages(..) MUST return only changes with a higher revision as the 
> provided number. ChangeSet with the parsed revision number MUST BE excluded. 
> Note that ChangeSet does not provide information about the type of the 
> change. This will be only available after a call to Store#get(..).
> The revisions MUST NOT to keep a history of changes. Only the revision of the 
> latest change MUST be kept. This ensures that rebuilding a semantic index 
> (from revsion -1) does only perform indexing steps corresponding to 
> historical state of the Store. Note also that the revisions do not provide 
> information about the type of the change. If a ContentItem is still present 
> (added, updated) or was removed will be indicated by the get(..) method of 
> the store returning a ContentItem instance or <code>null</code>
> #### Example:
> e.g. if first the contentItem 1,2 and 3 are added, later content item 2 is 
> updated and 3 is deleted and in a third step contentitem 3 and 4 are added 
> this would result in the following revision data
> After step 1: 
>     :::text
>     1 : urn:contentItem.1 //added
>     1 : urn:contentItem.2 //added
>     1 : urn:contentItem.3 //added
> After step 2: 
>     :::text
>     1 : urn:contentItem.1 //added
>     2 : urn:contentItem.2 //updated
>     2 : urn:contentItem.3 //removed
> After step 3: 
>     :::text
>     1 : urn:contentItem.1 //added
>     2 : urn:contentItem.2 //updated
>     3 : urn:contentItem.3 //added
>     3 : urn:contentItem.4 //added

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to