Hi everyone,
I have just committed the initial implementation of the index part of
the 2-layered structure of Contenthub. So, we have initial
implementations for both Store and Index layers now. Currently, this
work is carried on under the "contenthub-two-layered-structure" branch.
So, to try out this new structure, contenthub module under this branch
should be built.
I would be very glad to hear your feedbacks. Below, you can see the logs
from the commit:
Best,
Suat
Logs:
Initial version of the default implementation of the SemanticIndex
interface which is defined in STANBOL-499.
SemanticIndex is one part of the 2-layered structure of Contenthub. The
other part is the Store which is defined in STANBOL-498.
Default implementation of the SemanticIndex interface
(LDPathSemanticIndex) is based on the LDPath language. A new
LDPathSemanticIndex can be created by providing name, description and
LDPath values. In the scope of LDPathSemanticIndex the provided LDPath
program is used in two ways which will be explained later in this log.
Each instance of this implementation checks the changes in the Store at
regular intervals in a separate thread and the interval length is
configurable. After processing the changes in the Store, the last
revision is stored persistently. In this way, when the index is
restarted it will check the the changes as of the latest persisted
revision. However, when the LDPath is changed the LDPathSemanticIndex
will index the ContentItems from scratch. In this period the index will
be REINDEXING state, and during this period, it does not allow other
index or remove operations. After reindexing is completed, the state of
the index will be ACTIVE.
LDPath usages in LDPathSemanticIndex
====================================
a) It is used to configure the underlying Solr core. With an LDPath the
index fields are determined and Solr specific properties such as
"multiValued", "termVectors" can be configured.
b) When indexing of a ContentItem is in progress, each named entity
contained in the enhancements of the ContentItem will be queried through
the Entityhub. Then, the values obtained from Entityhub will be indexed
along with the actual content as additional metadata. And the additional
metadata will be completely compatible with the underlying Solr core.
This ability to create customized indexes allows compatibility with
different domains or use-cases.
Creating,Retrieving LDPathSemanticIndex instances
=================================================
{stanbol_host}/index endpoint can be used to retrieve already registered
SemanticIndexes. An LDPathSemantic index can be created through the
RESTful service i.e {stanbol_host}/index/ldpath or through the Felix Web
Console by configuring a "Apache Stanbol Contenthub LDPath Based
Semantic Index".
Each instance of LDPathSemanticIndex is registered as an OSGi component.
So, they can be obtained through ServiceTracker/@Reference.
Name(Semantic-Index-Name) and description(Semantic-Index-Name)
properties can be used to retrieve specific instances of
LDPathSemanticIndex from OSGi environment. Also, the
SemanticIndexManager service, provides retrieval of indexes according to
their names and EndpointTypes.
Search over the LDPathSemanticIndex
===================================
The previous search functionality of the Contenthub has not changed.
They are wrapped under two types of endpoints: 1) RESTful endpoints 2)
OSGi based Java endpoints. There are two RESTful endpoints which are
SOLR and CONTENTHUB. SOLR endpoint can be used to query the actual
underlying Solr core. CONTENTHUB endpoint offers a search option of
which results contain additional information in addition to the
resultant documents. Those additional information are facets regarding
the resultant documents and related keywords about the original query
term. This endpoint is more experimental one which is open to changes.