[ 
https://issues.apache.org/jira/browse/STANBOL-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Suat Gonul updated STANBOL-498:
-------------------------------

    Description: 
Simple Storage interface for enhanced ContentItems.

This Store is used to

1. save the ContentItems after they are enhanced by the Enahncer
    * The Blobs (original content and transcoded versions)
    * The Metadata (Enhancement Results)
2. retrieve ContentItems while semantic indexing
    * Iterator over the IDs
    * Get ContentItem by ID

This store is NOT intended to be used for search! It is only used for ID based 
lookup.


Implementations:
-----------------------

 * CMS Adapter: An implementation based on the CMS Adapter provides the 
possibility to store the Enhancement Results directly within the CMS. Typically 
this will be the CMS also sending the request to the Contenthub, but this is no 
requirement.
 * Clerezza based implementation: Clerezza - as RDF based CMS - provides the 
required functionality to store both the content AND the metadata of the 
contentItem
* File based: Simple file based storage without any external dependencies. This 
could be used as default and for testing

Interface:
-------------

The interface will be based-on/replace the 
[Store](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/Store.java)
 interface already present in the Contenthub. However the suggestion is to 
remove the "getEnhancementGraph()" as this is not required by the usecases (1) 
and (2) mentioned above. In addition the store interface should be extended 
with a remove method to allow manual deletion of ContentItems.

    /** stores the parsed ContentItem */
    + put(ContentItem ci) : UriRef
    /** Getter for the ContentItem with the parsed ID */
    + get(UriRef id) : ContentItem

### Revisions

Revisions are used to re-synchronize semantic indexes with the enhanced 
ContentItems managed by this store. Every time the ContentHub indexes enhanced 
ContentItem - as managed by this store - to a SemanticIndex it provides the 
highest revision. SemanticIndexes MUST persist such revisions and MUST ensure 
they are even available after a re-start because this number will be later used 
by the ContentHub to apply changes to enhances ContentItmes.

In detail a revision is defined as a change (add, update, removal) to one or 
more ContentItems managed by the Store. Every such change MUST BE result in an 
increase of the revision. Revisions MUST only use positive numbers. 
Implementers might use <code>System.currentTimeMillis()</code> as revision but 
this is no requirement.

The store interface provides a method that returns an Iterator over all changed 
ContentItems that where changed (added, updated, removed) since a given 
revision. 

    /** Iterator over all contentItems added/removed after revision */
    + changes(long revision, int offset, int batchSize) : ChangeSet

    class ChangeSet {
        /** the lowest included revision */
        + from() : long
        /** the id of changed ContentItems */
        + changed() : Map<UriRef>
        /** the highest included revision */
        + to() : long
    }


Calls to chages(..) MUST return only changes with a higher revision as the 
provided number. ChangeSet with the parsed revision number MUST BE excluded. 
Note that ChangeSet does not provide information about the type of the change. 
This will be only available after a call to Store#get(..).

The revisions MUST NOT to keep a history of changes. Only the revision of the 
latest change MUST be kept. This ensures that rebuilding a semantic index (from 
revsion -1) does only perform indexing steps corresponding to historical state 
of the Store. Note also that the revisions do not provide information about the 
type of the change. If a ContentItem is still present (added, updated) or was 
removed will be indicated by the get(..) method of the store returning a 
ContentItem instance or <code>null</code>

#### Example:

e.g. if first the contentItem 1,2 and 3 are added, later content item 2 is 
updated and 3 is deleted and in a third step contentitem 3 and 4 are added this 
would result in the following revision data

After step 1: 

    :::text
    1 : urn:contentItem.1 //added
    1 : urn:contentItem.2 //added
    1 : urn:contentItem.3 //added

After step 2: 

    :::text
    1 : urn:contentItem.1 //added
    2 : urn:contentItem.2 //updated
    2 : urn:contentItem.3 //removed

After step 3: 

    :::text
    1 : urn:contentItem.1 //added
    2 : urn:contentItem.2 //updated
    3 : urn:contentItem.3 //added
    3 : urn:contentItem.4 //added



  was:
Simple Storage interface for enhanced ContentItems.

This Store is used to

1. save the ContentItems after they are enhanced by the Enahncer
    * The Blobs (original content and transcoded versions)
    * The Metadata (Enhancement Results)
2. retrieve ContentItems while semantic indexing
    * Iterator over the IDs
    * Get ContentItem by ID

This store is NOT intended to be used for search! It is only used for ID based 
lookup.


Implementations:
-----------------------

 * CMS Adapter: An implementation based on the CMS Adapter provides the 
possibility to store the Enhancement Results directly within the CMS. Typically 
this will be the CMS also sending the request to the Contenthub, but this is no 
requirement.
 * Clerezza based implementation: Clerezza - as RDF based CMS - provides the 
required functionality to store both the content AND the metadata of the 
contentItem
* File based: Simple file based storage without any external dependencies. This 
could be used as default and for testing

Interface:
-------------

The interface will be based-on/replace the 
[Store](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/Store.java)
 interface already present in the Contenthub. However the suggestion is to 
remove the "getEnhancementGraph()" as this is not required by the usecases (1) 
and (2) mentioned above. In addition the store interface should be extended 
with a remove method to allow manual deletion of ContentItems.

    /** creates a new ContentItem */
    + create(UriRef id, byte[] content, String contentType) : ContentItem
    + create(UriRef id, InputStream in, String contentType) : ContentItem
    /** stores the parsed ContentItem */
    + put(ContentItem ci) : UriRef
    /** Getter for the ContentItem with the parsed ID */
    + get(UriRef id) : ContentItem
    
### Revisions

Revisions are used to re-synchronize semantic indexes with the enhanced 
ContentItems managed by this store. Every time the ContentHub indexes enhanced 
ContentItem - as managed by this store - to a SemanticIndex it provides the 
highest revision. SemanticIndexes MUST persist such revisions and MUST ensure 
they are even available after a re-start because this number will be later used 
by the ContentHub to apply changes to enhances ContentItmes.

In detail a revision is defined as a change (add, update, removal) to one or 
more ContentItems managed by the Store. Every such change MUST BE result in an 
increase of the revision. Revisions MUST only use positive numbers. 
Implementers might use <code>System.currentTimeMillis()</code> as revision but 
this is no requirement.

The store interface provides a method that returns an Iterator over all changed 
ContentItems that where changed (added, updated, removed) since a given 
revision. 

    /** Iterator over all contentItems added/removed after revision */
    + changes(long revision, int maxEntries) : Changes

    class Changes {
        /** the lowest included revision */
        + from() : long
        /** the id of changed ContentItems */
        + changed() : Map<UriRef>
        /** the highest included revision */
        + to() : long
    }


Calls to chages(..) MUST return only changes with a higher revision as the 
provided number. Changes with the parsed revision number MUST BE excluded. Note 
that Changes does not provide information about the type of the change. This 
will be only available after a call to Store#get(..).

The revisions MUST NOT to keep a history of changes. Only the revision of the 
latested change MUST be kept. This ensures that rebuilding a semantic index 
(from revsion -1) does only perform indexing steps corresponding to historical 
state of the Store. Note also that the revisions do not provide information 
about the type of the change. If a ContentItem is still present (added, 
updated) or was removed will be indicated by the get(..) method of the store 
returning a ContentItem instance or <code>null</code>

#### Example:

e.g. if first the contentItem 1,2 and 3 are added, later content item 2 is 
updated and 3 is deleted and in a third step contentitem 3 and 4 are added this 
would result in the following revision data

After step 1: 

    :::text
    1 : urn:contentItem.1 //added
    1 : urn:contentItem.2 //added
    1 : urn:contentItem.3 //added

After step 2: 

    :::text
    1 : urn:contentItem.1 //added
    2 : urn:contentItem.2 //updated
    2 : urn:contentItem.3 //removed

After step 3: 

    :::text
    1 : urn:contentItem.1 //added
    2 : urn:contentItem.2 //updated
    3 : urn:contentItem.3 //added
    3 : urn:contentItem.4 //added



    
> Contenthub: Enhanced ContentItem Store
> --------------------------------------
>
>                 Key: STANBOL-498
>                 URL: https://issues.apache.org/jira/browse/STANBOL-498
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Content Hub
>            Reporter: Rupert Westenthaler
>
> Simple Storage interface for enhanced ContentItems.
> This Store is used to
> 1. save the ContentItems after they are enhanced by the Enahncer
>     * The Blobs (original content and transcoded versions)
>     * The Metadata (Enhancement Results)
> 2. retrieve ContentItems while semantic indexing
>     * Iterator over the IDs
>     * Get ContentItem by ID
> This store is NOT intended to be used for search! It is only used for ID 
> based lookup.
> Implementations:
> -----------------------
>  * CMS Adapter: An implementation based on the CMS Adapter provides the 
> possibility to store the Enhancement Results directly within the CMS. 
> Typically this will be the CMS also sending the request to the Contenthub, 
> but this is no requirement.
>  * Clerezza based implementation: Clerezza - as RDF based CMS - provides the 
> required functionality to store both the content AND the metadata of the 
> contentItem
> * File based: Simple file based storage without any external dependencies. 
> This could be used as default and for testing
> Interface:
> -------------
> The interface will be based-on/replace the 
> [Store](http://svn.apache.org/repos/asf/incubator/stanbol/trunk/contenthub/servicesapi/src/main/java/org/apache/stanbol/contenthub/servicesapi/store/Store.java)
>  interface already present in the Contenthub. However the suggestion is to 
> remove the "getEnhancementGraph()" as this is not required by the usecases 
> (1) and (2) mentioned above. In addition the store interface should be 
> extended with a remove method to allow manual deletion of ContentItems.
>     /** stores the parsed ContentItem */
>     + put(ContentItem ci) : UriRef
>     /** Getter for the ContentItem with the parsed ID */
>     + get(UriRef id) : ContentItem
> ### Revisions
> Revisions are used to re-synchronize semantic indexes with the enhanced 
> ContentItems managed by this store. Every time the ContentHub indexes 
> enhanced ContentItem - as managed by this store - to a SemanticIndex it 
> provides the highest revision. SemanticIndexes MUST persist such revisions 
> and MUST ensure they are even available after a re-start because this number 
> will be later used by the ContentHub to apply changes to enhances 
> ContentItmes.
> In detail a revision is defined as a change (add, update, removal) to one or 
> more ContentItems managed by the Store. Every such change MUST BE result in 
> an increase of the revision. Revisions MUST only use positive numbers. 
> Implementers might use <code>System.currentTimeMillis()</code> as revision 
> but this is no requirement.
> The store interface provides a method that returns an Iterator over all 
> changed ContentItems that where changed (added, updated, removed) since a 
> given revision. 
>     /** Iterator over all contentItems added/removed after revision */
>     + changes(long revision, int offset, int batchSize) : ChangeSet
>     class ChangeSet {
>         /** the lowest included revision */
>         + from() : long
>         /** the id of changed ContentItems */
>         + changed() : Map<UriRef>
>         /** the highest included revision */
>         + to() : long
>     }
> Calls to chages(..) MUST return only changes with a higher revision as the 
> provided number. ChangeSet with the parsed revision number MUST BE excluded. 
> Note that ChangeSet does not provide information about the type of the 
> change. This will be only available after a call to Store#get(..).
> The revisions MUST NOT to keep a history of changes. Only the revision of the 
> latest change MUST be kept. This ensures that rebuilding a semantic index 
> (from revsion -1) does only perform indexing steps corresponding to 
> historical state of the Store. Note also that the revisions do not provide 
> information about the type of the change. If a ContentItem is still present 
> (added, updated) or was removed will be indicated by the get(..) method of 
> the store returning a ContentItem instance or <code>null</code>
> #### Example:
> e.g. if first the contentItem 1,2 and 3 are added, later content item 2 is 
> updated and 3 is deleted and in a third step contentitem 3 and 4 are added 
> this would result in the following revision data
> After step 1: 
>     :::text
>     1 : urn:contentItem.1 //added
>     1 : urn:contentItem.2 //added
>     1 : urn:contentItem.3 //added
> After step 2: 
>     :::text
>     1 : urn:contentItem.1 //added
>     2 : urn:contentItem.2 //updated
>     2 : urn:contentItem.3 //removed
> After step 3: 
>     :::text
>     1 : urn:contentItem.1 //added
>     2 : urn:contentItem.2 //updated
>     3 : urn:contentItem.3 //added
>     3 : urn:contentItem.4 //added

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to