Re: Can Jena Full Text search work with other Jena based API like Virtuoso Jena or MarkLogic Jena ?

Alex To Mon, 16 Sep 2019 16:59:42 -0700

Hi Andy

I ended up creating separate implementation for Jena and MarkLogic full
text search for now due to time constraints of the project. I will
investigate further  at a later time.


Thank you

Best Regards

On Sun, Sep 15, 2019 at 6:53 PM Andy Seaborne <a...@apache.org> wrote:

> Alex,
>
> I can't try it out - I don't have a Marklogic system.
>
> Can you see in the server logs what is happening?
>
>  > Pure speculation but parts 1 & 2 sounds like the data load is not going
>  > to MarkLogic as a single transaction but as "autocommit" - one
>  > transaction for each triple added.
>
>      Andy
>
> On 13/09/2019 23:04, Andy Seaborne wrote:
> > The maven central artifact com.marklogic:marklogic-jena is 3.0.6 but our
> > code depends on 3.1.0 - what code is it using?
> >
> > On 13/09/2019 01:18, Alex To wrote:
> >> I created a small program to try out Lucene with MarkLogic Jena here
> >>
> >>
> https://github.com/AlexTo/jena-lab/blob/master/src/main/java/com/company/MainMarkLogic.java
> >>
> >>
> >>
> >> My observation is as follows (see my comment at line 54 & 56)
> >>
> >> 1. If the model reads a small file with 2 triples, the loading can
> finish
> >> quickly
> >> 2. If the model reads a slightly larger file (1.5MB), the loading takes
> >> forever so I have to terminate it
> >
> > Pure speculation but parts 1 & 2 sounds like the data load is not going
> > to MarkLogic as a single transaction but as "autocommit" - one
> > transaction for each triple added.
> >
> >      Andy
> >
> >
> >> 3. After loading the small file, searching the Lucene index direct shows
> >> that the triples are indexed
> >> 4. After loading the small file, run SPARQL query with "text:query"
> won't
> >> finish
> >>
> >> For now I created 2 separate implementation in my program to support
> Full
> >> Text search with Jena or MarkLogic but I look forward to know more
> >> whether
> >> it is still possible to use Jena Elastic indexing with TextDataset
> >> because
> >> then I can provide a single UI to users to configure their search
> >> regardless of the back end. :)
> >>
> >>
> >> On Fri, Sep 13, 2019 at 1:07 AM Dan Davis <dansm...@gmail.com> wrote:
> >>
> >>> I am incorrect, and apologize. Virtuoso's Jena 3 driver includes an
> >>> implementation of Dataset, and so while application is only using the
> >>> virtuoso.jena.driver.VirtGraph and
> >>> virtuoso.jena.driver.VirtuosoQueryExecution (and factory), a more
> >>> flexible
> >>> integration is possible. I look forward to experimenting with it and
> >>> seeing
> >>> what I can do on the backend.
> >>>
> >>> On Thu, Sep 12, 2019 at 10:19 AM Dan Davis <dansm...@gmail.com> wrote:
> >>>
> >>>> Virtuoso's Jena driver implements the model interface, rather than the
> >>>> DatasetGraphAPI.  is translating the SPARQL query into its own JDBC
> >>>> interface. You can see the architecture at
> >>>>
> >>>
> http://docs.openlinksw.com/virtuoso/rdfnativestorageprovidersjena/#rdfnativestorageprovidersjenawhatisv.
>
> >>>
> >>> However,
> >>>> Virtuoso has its own full-text indexing, which can be effective. Its
> >>> rules
> >>>> for translating words into queries is not as flexible as
> >>>> lucene/solr/elastic, but it does allow you to specify what should be
> >>>> indexed - e.g. which objects from which which data properties in which
> >>>> graphs.
> >>>>
> >>>> I use Virtuoso behind virt_jena and virt_jdbc.  You can see the code
> at
> >>>> https://github.com/HHS/lodestar, which is run underneath
> >>>> https://github.com/HHS/meshrdf.   You will see that
> >>>> https://github.com/HHS/lodestar is a fork from EBI, but the NLM copy
> >>>> has
> >>>> been updated to Jena 3. The EBI version is ahead on UI features
> >>>> however.
> >>>>
> >>>> I cannot speak to MarkLogic, Stardog, etc.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> EBI's lodestar still uses Jena 2, but the fork at HHS has been
> >>>> updated to
> >>>> Jena 3.
> >>>>
> >>>> Virtuoso has its own full-text indexing, which is not as flexible in
> >>>> how
> >>>> it indexes as Elastic/Solr/Lucene.   It still works.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Sep 12, 2019 at 7:03 AM Andy Seaborne <a...@apache.org>
> wrote:
> >>>>
> >>>>> Yes, probably - but.
> >>>>>
> >>>>> The Jena text index will work in conjunction with any (Jena)
> >>>>> DatasetGraphAPI implementation. 3rd party systems are not tested in
> >>>>> the
> >>>>> build.
> >>>>>
> >>>>> The "but" is efficiency. Both those systems have their own built-in
> >>>>> text
> >>>>> indexing which execute as part of the native query engine. This may
> >>>>> be a
> >>>>> factor for you, it may not.
> >>>>>
> >>>>> Let us know how you get on trying it.
> >>>>>
> >>>>> ----
> >>>>>
> >>>>> There is a SPARQL 1.2 issue about standardizing text query.
> >>>>>
> >>>>> Issue 40 : SPARQL 1.2 Community Group:
> >>>>> https://github.com/w3c/sparql-12/issues/40
> >>>>>
> >>>>>       Andy
> >>>>>
> >>>>> On 12/09/2019 02:53, Alex To wrote:
> >>>>>> Hi
> >>>>>>
> >>>>>> I have so far been happy with Jena + Lucene / Elastic. Just trying
> to
> >>>>> get a
> >>>>>> quick answer whether it can work with other Jena based API like
> >>>>> Virtuoso /
> >>>>>> MarkLogic.
> >>>>>>
> >>>>>> If I wrap a MarkLogic Dataset in a Jena TextDataset, can it work as
> >>>>>> expected ?
> >>>>>>
> >>>>>> Given that a MarkLogic / Virtuoso Dataset implements Jena Dataset
> >>>>>> interface, it may work but I am not sure because the "text:query"
> >>> seems
> >>>>> to
> >>>>>> be more Jena specific.
> >>>>>>
> >>>>>> I will try out myself in the next couple of days to see if it works
> >>> but
> >>>>> if
> >>>>>> there is a quick answer it may save me a couple of hours :)
> >>>>>>
> >>>>>> Thank a lot
> >>>>>>
> >>>>>> Regards
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>

Re: Can Jena Full Text search work with other Jena based API like Virtuoso Jena or MarkLogic Jena ?

Reply via email to