** Faceted search
From the documentation:
There is also the model of "One document equals one entity" model that
might be more appropriate faceted search. It returns the subject URI
with a Lucene document for multiple triples.
"""
When using this integration model, text:query returns the subject URI
for the document
"""
There then needs to be a facet property function. Would someone like to
sketch one out as a GH issue?
** ElasticSearch - if we can negotiate the licensing issues (the client
libs are OSS but to test them needs a server so it impacts the build;
there may be a testcontainers.io way round this, or optional tests - we
need the build to be clean as well as the produced binaries), then this
could be done and/or solr. It does need someone or someones to take an
interest in this both now and for keeping the code maintained especially
if any security issues arise.
Andy
On 15/06/2023 12:49, Adrian Gschwend wrote:
On 14.06.23 14:45, Øyvind Gjesdal wrote:
Hi Øyvind,
Facet/aggregation was not implemented as extension functions in SPARQL
and
I believe that it also used the same abstraction described in the
jena-text
docs:
One Jena*triple* equals one Lucene*document*
which makes aggregations/facets not available or usable neither from the
Elasticsearch APIs.
yes I saw that and I also thought that's probably not ideal. I don't
know much about Elastic in practice, I mainly read tutorials &
documentation. What I had in mind was that we could define for example
via SHACL shape (or something comparable) what a "document" contains. So
it's shapes that would define how we see the document and we could use
this abstraction for search. So the integration would take SHACL shapes,
create a "document" out of it that is consumable by Elastic and then we
could use this for search.
The second thing is that I'm mainly interested in an integration that we
don't have to update the Elastic index on our own. I guess that the
Fuseki integration takes care of that so it's "in sync" all the time. I
would want the Elastic API available as well as this is easier to use
for the facet use-cases than pure SPARQL. Paging is not trivial in
SPARQL for use-cases like this, the Elastic API however is built for that.
We switched to jena-text with Lucene after some weeks, which didn't have
aggregations either, but there was much more activity and usage for the
module, and the options for configuring from the assembler files were
much
richer.
ok, any example of what you configure in there? I don't think I saw much
in the documentation for that so far. Aggregations are definitely
something I would like to have. One example are archival records, where
we have a hierarchy in the data. And I need to be able to show that
hierarchy per record (which has it's own IRI) and to browse by hierarchy
levels as well. This is super easy to represent in RDF but super hard to
query efficiently.
At the moment I'm unsure if I inspected and looked at the Elasticsearch
APIs directly to check the structure of the documents in the index
itself,
after indexing.
What versions did you work on with Elastic?
regards
Adrian