Re: New features for Contenthub

Fabian Christ Thu, 26 Jan 2012 10:36:28 -0800

Hi Ali and Suat,

sorry for the mistake. My mail was meant to be addressed in reply to Ali ;)


Am 26. Januar 2012 19:33 schrieb Fabian Christ <[email protected]>:
> Hi Suat,
>
> this is a really impressive list of changes and features. Do you have
> plans regarding documentation, demos, tutorials?
>
> Best,
>  - Fabian
>
> Am 26. Januar 2012 16:46 schrieb Ali Anil Sinaci <[email protected]>:
>> Dear Stanbolers,
>>
>> I have committed major changes related to Contenthub. Below, you can find
>> some explanations about the changes. I have grouped them under two major
>> issues in Jira (STANBOL-469 and STANBOL-470) although there are several
>> sub-issues. Later improvements will be issued under their specific topics.
>>
>> Contenthub includes two main parts: store and search. Solr is the back-end
>> for all store and retrieve operations of content items (SolrContentItem
>> extends ContentItem). Major improvements are as follows:
>>
>> - Store maintains a default Solr core (called "contenthub") through the
>> EmbeddedSolrServer. This default core indexes several semantic properties of
>> entities in case they are retrieved from the referenced sites. (Current
>> dbpedia index does not include most of these properties. We have a larger
>> index for this)
>>
>> - LDPath has been integrated into Contenthub.
>>    * Several Solr cores can be managed through LDProgramManager of
>> Contenthub.
>>    * Each LDPath program corresponds to a unique Solr core. LDPath programs
>> (hence Solr cores) are uniquely identified through their names.
>> LDProgramManager and SolrCoreManager provides the required synchronization
>> between Solr cores and LDPath programs.
>>    * Submitted LDPath programs are saved into separate files and accessed
>> via a simple cache mechanism.
>>    * CRD operations for LDPath programs are provided through
>> LDProgramManager
>>    * ClerezzaBackend is implemented as an LDPath backend.
>>    * LDProgramManager has a special method (executeProgram) to execute the
>> LDPath programs on Clerezza MGraphs.
>>    * REST services are ready for LDProgramManager functionalities.
>>    * Contenthub Store and Search parts (all interfaces and REST APIs) are
>> adjusted so that they can operate with LDPath programs.
>>
>> - Web GUI of Contenthub only operates on the default Solr index
>> ("contenthub"). Enabling other cores (generated through LDPath programs) is
>> in the TODO list.
>>
>> - Search logic has been implemented from scratch.
>>    * Search engine pattern has been removed for document search.
>>    * Content items are indexed through Solr cores. Therefore all search on
>> the content items are performed through Solr indexes.
>>    * Search interface has been splitted into there different interfaces:
>> SolrSearch, RelatedKeywordSearch and FeaturedSearch.
>>    * SolrSearch is compatible with SolrJ. That is, clients who have already
>> been using SolrJ can easily switch to SolrSearch API of Contenthub. As a
>> result of LDPath integration, additional methods exist in this interface to
>> accept LDPath program names (Solr core names). There is a single
>> implementation of this interface in Contenthub.
>>    * RelatedKeywordSearch exposes a "search engine" pattern, but only to
>> search for related keywords. RelatedKeywordSearchManager is the manager to
>> handle several implementations of this interface (engines).
>>    * In addition to the search results retrieved from SolrSearch, users can
>> now send their search keywords (query terms) to RelatedKeywordSearchManager
>> to retrieve related keywords from different sources. This can be performed
>> as a separate process from SolrSearch.
>>    * RelatedKeywordSearch has been implemented by WordnetSearch,
>> OntologyResourceSearch and ReferencedSiteSearch. As their names indicate,
>> they look for related keywords within their resources. (WordnetSearch can be
>> excluded until the license issue is resolved or a new client library is
>> used)
>>    * FeaturedSearch combines the capabilities of SolrSearch and
>> RelatedKeywordSearch in case a client wants to retrieve all results (content
>> items and related keywords) from Contenthub search.
>>    * FeaturedSearch provides a similar interface to SolrSearch with
>> additional methods. However, behaviour is different, it is "featured" in
>> this implementation.
>>    * FeaturedSearch provides a special method: tokenizeEntities. This method
>> takes a query string and finds out whether there exists any entities in the
>> query or not. Based on the discovered entities, FeaturedSearch prepares Solr
>> queries in special formats to boost the results related with the entities.
>> However, this method should be improved to cover a massive number of
>> possible cases which can occur during keyword searches.
>>    * FeaturedSearch provides special methods to ease the faceted search. Web
>> GUI of Contenthub makes use of this interface to enable faceted search.
>>
>> Some minor improvements are as follows:
>>
>> - Web resources of Contenthub has been adjusted according to the latest
>> improvements.
>>
>> - Contenthub/core bundle has been removed. Refactoring Contenthub has leaded
>> to a more efficient use of several classes, hence currently there is no need
>> for a separate core bundle.
>>
>> - Contenthub parent pom has been adjusted. All dependencies has been moved
>> into Stanbol parent.
>>
>> - helper/cnn-importer repacked under crawler/cnn
>>
>> - api repacked under servicesapi
>>
>> - Sling based unit and integration tests are on the way.
>>
>> All the best,
>> Anil.
>
>
>
> --
> Fabian
> http://twitter.com/fctwitt



-- 
Fabian
http://twitter.com/fctwitt

Re: New features for Contenthub

Reply via email to