The maven central artifact com.marklogic:marklogic-jena is 3.0.6 but our code depends on 3.1.0 - what code is it using?

On 13/09/2019 01:18, Alex To wrote:
I created a small program to try out Lucene with MarkLogic Jena here

https://github.com/AlexTo/jena-lab/blob/master/src/main/java/com/company/MainMarkLogic.java


My observation is as follows (see my comment at line 54 & 56)

1. If the model reads a small file with 2 triples, the loading can finish
quickly
2. If the model reads a slightly larger file (1.5MB), the loading takes
forever so I have to terminate it

Pure speculation but parts 1 & 2 sounds like the data load is not going to MarkLogic as a single transaction but as "autocommit" - one transaction for each triple added.

    Andy


3. After loading the small file, searching the Lucene index direct shows
that the triples are indexed
4. After loading the small file, run SPARQL query with "text:query" won't
finish

For now I created 2 separate implementation in my program to support Full
Text search with Jena or MarkLogic but I look forward to know more whether
it is still possible to use Jena Elastic indexing with TextDataset because
then I can provide a single UI to users to configure their search
regardless of the back end. :)


On Fri, Sep 13, 2019 at 1:07 AM Dan Davis <[email protected]> wrote:

I am incorrect, and apologize. Virtuoso's Jena 3 driver includes an
implementation of Dataset, and so while application is only using the
virtuoso.jena.driver.VirtGraph and
virtuoso.jena.driver.VirtuosoQueryExecution (and factory), a more flexible
integration is possible. I look forward to experimenting with it and seeing
what I can do on the backend.

On Thu, Sep 12, 2019 at 10:19 AM Dan Davis <[email protected]> wrote:

Virtuoso's Jena driver implements the model interface, rather than the
DatasetGraphAPI.  is translating the SPARQL query into its own JDBC
interface. You can see the architecture at

http://docs.openlinksw.com/virtuoso/rdfnativestorageprovidersjena/#rdfnativestorageprovidersjenawhatisv.
However,
Virtuoso has its own full-text indexing, which can be effective. Its
rules
for translating words into queries is not as flexible as
lucene/solr/elastic, but it does allow you to specify what should be
indexed - e.g. which objects from which which data properties in which
graphs.

I use Virtuoso behind virt_jena and virt_jdbc.  You can see the code at
https://github.com/HHS/lodestar, which is run underneath
https://github.com/HHS/meshrdf.   You will see that
https://github.com/HHS/lodestar is a fork from EBI, but the NLM copy has
been updated to Jena 3. The EBI version is ahead on UI features however.

I cannot speak to MarkLogic, Stardog, etc.





EBI's lodestar still uses Jena 2, but the fork at HHS has been updated to
Jena 3.

Virtuoso has its own full-text indexing, which is not as flexible in how
it indexes as Elastic/Solr/Lucene.   It still works.




On Thu, Sep 12, 2019 at 7:03 AM Andy Seaborne <[email protected]> wrote:

Yes, probably - but.

The Jena text index will work in conjunction with any (Jena)
DatasetGraphAPI implementation. 3rd party systems are not tested in the
build.

The "but" is efficiency. Both those systems have their own built-in text
indexing which execute as part of the native query engine. This may be a
factor for you, it may not.

Let us know how you get on trying it.

----

There is a SPARQL 1.2 issue about standardizing text query.

Issue 40 : SPARQL 1.2 Community Group:
https://github.com/w3c/sparql-12/issues/40

      Andy

On 12/09/2019 02:53, Alex To wrote:
Hi

I have so far been happy with Jena + Lucene / Elastic. Just trying to
get a
quick answer whether it can work with other Jena based API like
Virtuoso /
MarkLogic.

If I wrap a MarkLogic Dataset in a Jena TextDataset, can it work as
expected ?

Given that a MarkLogic / Virtuoso Dataset implements Jena Dataset
interface, it may work but I am not sure because the "text:query"
seems
to
be more Jena specific.

I will try out myself in the next couple of days to see if it works
but
if
there is a quick answer it may save me a couple of hours :)

Thank a lot

Regards






Reply via email to