I created a small program to try out Lucene with MarkLogic Jena here

https://github.com/AlexTo/jena-lab/blob/master/src/main/java/com/company/MainMarkLogic.java


My observation is as follows (see my comment at line 54 & 56)

1. If the model reads a small file with 2 triples, the loading can finish
quickly
2. If the model reads a slightly larger file (1.5MB), the loading takes
forever so I have to terminate it
3. After loading the small file, searching the Lucene index direct shows
that the triples are indexed
4. After loading the small file, run SPARQL query with "text:query" won't
finish

For now I created 2 separate implementation in my program to support Full
Text search with Jena or MarkLogic but I look forward to know more whether
it is still possible to use Jena Elastic indexing with TextDataset because
then I can provide a single UI to users to configure their search
regardless of the back end. :)


On Fri, Sep 13, 2019 at 1:07 AM Dan Davis <dansm...@gmail.com> wrote:

> I am incorrect, and apologize. Virtuoso's Jena 3 driver includes an
> implementation of Dataset, and so while application is only using the
> virtuoso.jena.driver.VirtGraph and
> virtuoso.jena.driver.VirtuosoQueryExecution (and factory), a more flexible
> integration is possible. I look forward to experimenting with it and seeing
> what I can do on the backend.
>
> On Thu, Sep 12, 2019 at 10:19 AM Dan Davis <dansm...@gmail.com> wrote:
>
> > Virtuoso's Jena driver implements the model interface, rather than the
> > DatasetGraphAPI.  is translating the SPARQL query into its own JDBC
> > interface. You can see the architecture at
> >
> http://docs.openlinksw.com/virtuoso/rdfnativestorageprovidersjena/#rdfnativestorageprovidersjenawhatisv.
> However,
> > Virtuoso has its own full-text indexing, which can be effective. Its
> rules
> > for translating words into queries is not as flexible as
> > lucene/solr/elastic, but it does allow you to specify what should be
> > indexed - e.g. which objects from which which data properties in which
> > graphs.
> >
> > I use Virtuoso behind virt_jena and virt_jdbc.  You can see the code at
> > https://github.com/HHS/lodestar, which is run underneath
> > https://github.com/HHS/meshrdf.   You will see that
> > https://github.com/HHS/lodestar is a fork from EBI, but the NLM copy has
> > been updated to Jena 3. The EBI version is ahead on UI features however.
> >
> > I cannot speak to MarkLogic, Stardog, etc.
> >
> >
> >
> >
> >
> > EBI's lodestar still uses Jena 2, but the fork at HHS has been updated to
> > Jena 3.
> >
> > Virtuoso has its own full-text indexing, which is not as flexible in how
> > it indexes as Elastic/Solr/Lucene.   It still works.
> >
> >
> >
> >
> > On Thu, Sep 12, 2019 at 7:03 AM Andy Seaborne <a...@apache.org> wrote:
> >
> >> Yes, probably - but.
> >>
> >> The Jena text index will work in conjunction with any (Jena)
> >> DatasetGraphAPI implementation. 3rd party systems are not tested in the
> >> build.
> >>
> >> The "but" is efficiency. Both those systems have their own built-in text
> >> indexing which execute as part of the native query engine. This may be a
> >> factor for you, it may not.
> >>
> >> Let us know how you get on trying it.
> >>
> >> ----
> >>
> >> There is a SPARQL 1.2 issue about standardizing text query.
> >>
> >> Issue 40 : SPARQL 1.2 Community Group:
> >> https://github.com/w3c/sparql-12/issues/40
> >>
> >>      Andy
> >>
> >> On 12/09/2019 02:53, Alex To wrote:
> >> > Hi
> >> >
> >> > I have so far been happy with Jena + Lucene / Elastic. Just trying to
> >> get a
> >> > quick answer whether it can work with other Jena based API like
> >> Virtuoso /
> >> > MarkLogic.
> >> >
> >> > If I wrap a MarkLogic Dataset in a Jena TextDataset, can it work as
> >> > expected ?
> >> >
> >> > Given that a MarkLogic / Virtuoso Dataset implements Jena Dataset
> >> > interface, it may work but I am not sure because the "text:query"
> seems
> >> to
> >> > be more Jena specific.
> >> >
> >> > I will try out myself in the next couple of days to see if it works
> but
> >> if
> >> > there is a quick answer it may save me a couple of hours :)
> >> >
> >> > Thank a lot
> >> >
> >> > Regards
> >> >
> >>
> >
>


-- 

Alex To

PhD Candidate

School of Computer Science

Knowledge Discovery and Management Research Group

Faculty of Engineering & IT

THE UNIVERSITY OF SYDNEY | NSW | 2006

Desk 4e69 | Building J12| 1 Cleveland Street

M. +61423330656 <%2B61450061602>

Reply via email to