Re: New features for Contenthub

Ali Anil Sinaci Thu, 26 Jan 2012 23:52:35 -0800

Hi Fabian,

Web GUI and documentation are the next steps in our plan. Afterwards wewill prepare a demo for these features.


Best,
Anil.

On 01/26/2012 08:35 PM, Fabian Christ wrote:

Hi Ali and Suat,

sorry for the mistake. My mail was meant to be addressed in reply to Ali ;)

Am 26. Januar 2012 19:33 schrieb Fabian Christ<[email protected]>:

Hi Suat,

this is a really impressive list of changes and features. Do you have
plans regarding documentation, demos, tutorials?

Best,
  - Fabian

Am 26. Januar 2012 16:46 schrieb Ali Anil Sinaci<[email protected]>:

Dear Stanbolers,

I have committed major changes related to Contenthub. Below, you can find
some explanations about the changes. I have grouped them under two major
issues in Jira (STANBOL-469 and STANBOL-470) although there are several
sub-issues. Later improvements will be issued under their specific topics.

Contenthub includes two main parts: store and search. Solr is the back-end
for all store and retrieve operations of content items (SolrContentItem
extends ContentItem). Major improvements are as follows:

- Store maintains a default Solr core (called "contenthub") through the
EmbeddedSolrServer. This default core indexes several semantic properties of
entities in case they are retrieved from the referenced sites. (Current
dbpedia index does not include most of these properties. We have a larger
index for this)

- LDPath has been integrated into Contenthub.
    * Several Solr cores can be managed through LDProgramManager of
Contenthub.
    * Each LDPath program corresponds to a unique Solr core. LDPath programs
(hence Solr cores) are uniquely identified through their names.
LDProgramManager and SolrCoreManager provides the required synchronization
between Solr cores and LDPath programs.
    * Submitted LDPath programs are saved into separate files and accessed
via a simple cache mechanism.
    * CRD operations for LDPath programs are provided through
LDProgramManager
    * ClerezzaBackend is implemented as an LDPath backend.
    * LDProgramManager has a special method (executeProgram) to execute the
LDPath programs on Clerezza MGraphs.
    * REST services are ready for LDProgramManager functionalities.
    * Contenthub Store and Search parts (all interfaces and REST APIs) are
adjusted so that they can operate with LDPath programs.

- Web GUI of Contenthub only operates on the default Solr index
("contenthub"). Enabling other cores (generated through LDPath programs) is
in the TODO list.

- Search logic has been implemented from scratch.
    * Search engine pattern has been removed for document search.
    * Content items are indexed through Solr cores. Therefore all search on
the content items are performed through Solr indexes.
    * Search interface has been splitted into there different interfaces:
SolrSearch, RelatedKeywordSearch and FeaturedSearch.
    * SolrSearch is compatible with SolrJ. That is, clients who have already
been using SolrJ can easily switch to SolrSearch API of Contenthub. As a
result of LDPath integration, additional methods exist in this interface to
accept LDPath program names (Solr core names). There is a single
implementation of this interface in Contenthub.
    * RelatedKeywordSearch exposes a "search engine" pattern, but only to
search for related keywords. RelatedKeywordSearchManager is the manager to
handle several implementations of this interface (engines).
    * In addition to the search results retrieved from SolrSearch, users can
now send their search keywords (query terms) to RelatedKeywordSearchManager
to retrieve related keywords from different sources. This can be performed
as a separate process from SolrSearch.
    * RelatedKeywordSearch has been implemented by WordnetSearch,
OntologyResourceSearch and ReferencedSiteSearch. As their names indicate,
they look for related keywords within their resources. (WordnetSearch can be
excluded until the license issue is resolved or a new client library is
used)
    * FeaturedSearch combines the capabilities of SolrSearch and
RelatedKeywordSearch in case a client wants to retrieve all results (content
items and related keywords) from Contenthub search.
    * FeaturedSearch provides a similar interface to SolrSearch with
additional methods. However, behaviour is different, it is "featured" in
this implementation.
    * FeaturedSearch provides a special method: tokenizeEntities. This method
takes a query string and finds out whether there exists any entities in the
query or not. Based on the discovered entities, FeaturedSearch prepares Solr
queries in special formats to boost the results related with the entities.
However, this method should be improved to cover a massive number of
possible cases which can occur during keyword searches.
    * FeaturedSearch provides special methods to ease the faceted search. Web
GUI of Contenthub makes use of this interface to enable faceted search.

Some minor improvements are as follows:

- Web resources of Contenthub has been adjusted according to the latest
improvements.

- Contenthub/core bundle has been removed. Refactoring Contenthub has leaded
to a more efficient use of several classes, hence currently there is no need
for a separate core bundle.

- Contenthub parent pom has been adjusted. All dependencies has been moved
into Stanbol parent.

- helper/cnn-importer repacked under crawler/cnn

- api repacked under servicesapi

- Sling based unit and integration tests are on the way.

All the best,
Anil.



--
Fabian
http://twitter.com/fctwitt

Re: New features for Contenthub

Reply via email to