[ https://issues.apache.org/jira/browse/STANBOL-498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13287496#comment-13287496 ]
Suat Gonul commented on STANBOL-498: ------------------------------------ With my last commit, I have checked in the file based implementation for the Store interface. That commit also includes a Apache Derby based revision management mechanism and serialization/deserialization mechanism for content parts. Comments are very welcome for the implementation. >From now on, I'm planning to write some tests and afterwards continue with the >indexing part. > Contenthub: Enhanced ContentItem Store > -------------------------------------- > > Key: STANBOL-498 > URL: https://issues.apache.org/jira/browse/STANBOL-498 > Project: Stanbol > Issue Type: Sub-task > Components: Content Hub > Reporter: Rupert Westenthaler > > Simple Storage interface for enhanced ContentItems. > This Store is used to > 1. save the ContentItems after they are enhanced by the Enahncer > * The Blobs (original content and transcoded versions) > * The Metadata (Enhancement Results) > 2. retrieve ContentItems while semantic indexing > * Iterator over the IDs > * Get ContentItem by ID > This store is NOT intended to be used for search! It is only used for ID > based lookup. > Implementations: > ----------------------- > * CMS Adapter: An implementation based on the CMS Adapter provides the > possibility to store the Enhancement Results directly within the CMS. > Typically this will be the CMS also sending the request to the Contenthub, > but this is no requirement. > * Clerezza based implementation: Clerezza - as RDF based CMS - provides the > required functionality to store both the content AND the metadata of the > contentItem > * File based: Simple file based storage without any external dependencies. > This could be used as default and for testing > Interface: > ------------- > The interface will be based-on/replace the > [Store](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/Store.java) > interface already present in the Contenthub. However the suggestion is to > remove the "getEnhancementGraph()" as this is not required by the usecases > (1) and (2) mentioned above. In addition the store interface should be > extended with a remove method to allow manual deletion of ContentItems. > /** stores the parsed ContentItem */ > + put(ContentItem ci) : UriRef > /** Getter for the ContentItem with the parsed ID */ > + get(UriRef id) : ContentItem > ### Revisions > Revisions are used to re-synchronize semantic indexes with the enhanced > ContentItems managed by this store. Every time the ContentHub indexes > enhanced ContentItem - as managed by this store - to a SemanticIndex it > provides the highest revision. SemanticIndexes MUST persist such revisions > and MUST ensure they are even available after a re-start because this number > will be later used by the ContentHub to apply changes to enhances > ContentItmes. > In detail a revision is defined as a change (add, update, removal) to one or > more ContentItems managed by the Store. Every such change MUST BE result in > an increase of the revision. Revisions MUST only use positive numbers. > Implementers might use <code>System.currentTimeMillis()</code> as revision > but this is no requirement. > The store interface provides a method that returns an Iterator over all > changed ContentItems that where changed (added, updated, removed) since a > given revision. > /** Iterator over all contentItems added/removed after revision */ > + changes(long revision, int offset, int batchSize) : ChangeSet > class ChangeSet { > /** the lowest included revision */ > + from() : long > /** the id of changed ContentItems */ > + changed() : Map<UriRef> > /** the highest included revision */ > + to() : long > } > Calls to chages(..) MUST return only changes with a higher revision as the > provided number. ChangeSet with the parsed revision number MUST BE excluded. > Note that ChangeSet does not provide information about the type of the > change. This will be only available after a call to Store#get(..). > The revisions MUST NOT to keep a history of changes. Only the revision of the > latest change MUST be kept. This ensures that rebuilding a semantic index > (from revsion -1) does only perform indexing steps corresponding to > historical state of the Store. Note also that the revisions do not provide > information about the type of the change. If a ContentItem is still present > (added, updated) or was removed will be indicated by the get(..) method of > the store returning a ContentItem instance or <code>null</code> > #### Example: > e.g. if first the contentItem 1,2 and 3 are added, later content item 2 is > updated and 3 is deleted and in a third step contentitem 3 and 4 are added > this would result in the following revision data > After step 1: > :::text > 1 : urn:contentItem.1 //added > 1 : urn:contentItem.2 //added > 1 : urn:contentItem.3 //added > After step 2: > :::text > 1 : urn:contentItem.1 //added > 2 : urn:contentItem.2 //updated > 2 : urn:contentItem.3 //removed > After step 3: > :::text > 1 : urn:contentItem.1 //added > 2 : urn:contentItem.2 //updated > 3 : urn:contentItem.3 //added > 3 : urn:contentItem.4 //added -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira