Re: Can Jena Full Text search work with other Jena based API like Virtuoso Jena or MarkLogic Jena ?

Dan Davis Wed, 18 Sep 2019 17:55:14 -0700

dbpedia is not actually that large.  Make sure you test with RDF datasets
that really represent your data.


On Wed, Sep 18, 2019 at 8:14 PM Alex To <tonhud...@gmail.com> wrote:

> Update: I switched from Lucene to Elasticsearch 6.4.3 and Kibana. Both Jena
> and MarkLogic Jena works with indexing, I haven't tried querying MarkLogic
> with text:query though.
>
> Using Kibana, I could see the number of documents increasing while
> importing data with MarkLogic however it is very slow.
>
> Importing dbpedia.owl (2.5MB)  with MarkLogic Jena takes less than a minute
> without indexing.
>
> With TextDataset wrapping around MarkLogic dataset, it takes 13 minutes so
> I guess MarkLogic dataset does not seem to send triples in batch when using
> with TextDataset.
>
>
>
> On Tue, Sep 17, 2019 at 9:58 AM Alex To <tonhud...@gmail.com> wrote:
>
> > Hi Andy
> >
> > I ended up creating separate implementation for Jena and MarkLogic full
> > text search for now due to time constraints of the project. I will
> > investigate further  at a later time.
> >
> > Thank you
> >
> > Best Regards
> >
> > On Sun, Sep 15, 2019 at 6:53 PM Andy Seaborne <a...@apache.org> wrote:
> >
> >> Alex,
> >>
> >> I can't try it out - I don't have a Marklogic system.
> >>
> >> Can you see in the server logs what is happening?
> >>
> >>  > Pure speculation but parts 1 & 2 sounds like the data load is not
> going
> >>  > to MarkLogic as a single transaction but as "autocommit" - one
> >>  > transaction for each triple added.
> >>
> >>      Andy
> >>
> >> On 13/09/2019 23:04, Andy Seaborne wrote:
> >> > The maven central artifact com.marklogic:marklogic-jena is 3.0.6 but
> >> our
> >> > code depends on 3.1.0 - what code is it using?
> >> >
> >> > On 13/09/2019 01:18, Alex To wrote:
> >> >> I created a small program to try out Lucene with MarkLogic Jena here
> >> >>
> >> >>
> >>
> https://github.com/AlexTo/jena-lab/blob/master/src/main/java/com/company/MainMarkLogic.java
> >> >>
> >> >>
> >> >>
> >> >> My observation is as follows (see my comment at line 54 & 56)
> >> >>
> >> >> 1. If the model reads a small file with 2 triples, the loading can
> >> finish
> >> >> quickly
> >> >> 2. If the model reads a slightly larger file (1.5MB), the loading
> takes
> >> >> forever so I have to terminate it
> >> >
> >> > Pure speculation but parts 1 & 2 sounds like the data load is not
> going
> >> > to MarkLogic as a single transaction but as "autocommit" - one
> >> > transaction for each triple added.
> >> >
> >> >      Andy
> >> >
> >> >
> >> >> 3. After loading the small file, searching the Lucene index direct
> >> shows
> >> >> that the triples are indexed
> >> >> 4. After loading the small file, run SPARQL query with "text:query"
> >> won't
> >> >> finish
> >> >>
> >> >> For now I created 2 separate implementation in my program to support
> >> Full
> >> >> Text search with Jena or MarkLogic but I look forward to know more
> >> >> whether
> >> >> it is still possible to use Jena Elastic indexing with TextDataset
> >> >> because
> >> >> then I can provide a single UI to users to configure their search
> >> >> regardless of the back end. :)
> >> >>
> >> >>
> >> >> On Fri, Sep 13, 2019 at 1:07 AM Dan Davis <dansm...@gmail.com>
> wrote:
> >> >>
> >> >>> I am incorrect, and apologize. Virtuoso's Jena 3 driver includes an
> >> >>> implementation of Dataset, and so while application is only using
> the
> >> >>> virtuoso.jena.driver.VirtGraph and
> >> >>> virtuoso.jena.driver.VirtuosoQueryExecution (and factory), a more
> >> >>> flexible
> >> >>> integration is possible. I look forward to experimenting with it and
> >> >>> seeing
> >> >>> what I can do on the backend.
> >> >>>
> >> >>> On Thu, Sep 12, 2019 at 10:19 AM Dan Davis <dansm...@gmail.com>
> >> wrote:
> >> >>>
> >> >>>> Virtuoso's Jena driver implements the model interface, rather than
> >> the
> >> >>>> DatasetGraphAPI.  is translating the SPARQL query into its own JDBC
> >> >>>> interface. You can see the architecture at
> >> >>>>
> >> >>>
> >>
> http://docs.openlinksw.com/virtuoso/rdfnativestorageprovidersjena/#rdfnativestorageprovidersjenawhatisv
> .
> >>
> >> >>>
> >> >>> However,
> >> >>>> Virtuoso has its own full-text indexing, which can be effective.
> Its
> >> >>> rules
> >> >>>> for translating words into queries is not as flexible as
> >> >>>> lucene/solr/elastic, but it does allow you to specify what should
> be
> >> >>>> indexed - e.g. which objects from which which data properties in
> >> which
> >> >>>> graphs.
> >> >>>>
> >> >>>> I use Virtuoso behind virt_jena and virt_jdbc.  You can see the
> code
> >> at
> >> >>>> https://github.com/HHS/lodestar, which is run underneath
> >> >>>> https://github.com/HHS/meshrdf.   You will see that
> >> >>>> https://github.com/HHS/lodestar is a fork from EBI, but the NLM
> >> copy
> >> >>>> has
> >> >>>> been updated to Jena 3. The EBI version is ahead on UI features
> >> >>>> however.
> >> >>>>
> >> >>>> I cannot speak to MarkLogic, Stardog, etc.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> EBI's lodestar still uses Jena 2, but the fork at HHS has been
> >> >>>> updated to
> >> >>>> Jena 3.
> >> >>>>
> >> >>>> Virtuoso has its own full-text indexing, which is not as flexible
> in
> >> >>>> how
> >> >>>> it indexes as Elastic/Solr/Lucene.   It still works.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> On Thu, Sep 12, 2019 at 7:03 AM Andy Seaborne <a...@apache.org>
> >> wrote:
> >> >>>>
> >> >>>>> Yes, probably - but.
> >> >>>>>
> >> >>>>> The Jena text index will work in conjunction with any (Jena)
> >> >>>>> DatasetGraphAPI implementation. 3rd party systems are not tested
> in
> >> >>>>> the
> >> >>>>> build.
> >> >>>>>
> >> >>>>> The "but" is efficiency. Both those systems have their own
> built-in
> >> >>>>> text
> >> >>>>> indexing which execute as part of the native query engine. This
> may
> >> >>>>> be a
> >> >>>>> factor for you, it may not.
> >> >>>>>
> >> >>>>> Let us know how you get on trying it.
> >> >>>>>
> >> >>>>> ----
> >> >>>>>
> >> >>>>> There is a SPARQL 1.2 issue about standardizing text query.
> >> >>>>>
> >> >>>>> Issue 40 : SPARQL 1.2 Community Group:
> >> >>>>> https://github.com/w3c/sparql-12/issues/40
> >> >>>>>
> >> >>>>>       Andy
> >> >>>>>
> >> >>>>> On 12/09/2019 02:53, Alex To wrote:
> >> >>>>>> Hi
> >> >>>>>>
> >> >>>>>> I have so far been happy with Jena + Lucene / Elastic. Just
> trying
> >> to
> >> >>>>> get a
> >> >>>>>> quick answer whether it can work with other Jena based API like
> >> >>>>> Virtuoso /
> >> >>>>>> MarkLogic.
> >> >>>>>>
> >> >>>>>> If I wrap a MarkLogic Dataset in a Jena TextDataset, can it work
> as
> >> >>>>>> expected ?
> >> >>>>>>
> >> >>>>>> Given that a MarkLogic / Virtuoso Dataset implements Jena Dataset
> >> >>>>>> interface, it may work but I am not sure because the "text:query"
> >> >>> seems
> >> >>>>> to
> >> >>>>>> be more Jena specific.
> >> >>>>>>
> >> >>>>>> I will try out myself in the next couple of days to see if it
> works
> >> >>> but
> >> >>>>> if
> >> >>>>>> there is a quick answer it may save me a couple of hours :)
> >> >>>>>>
> >> >>>>>> Thank a lot
> >> >>>>>>
> >> >>>>>> Regards
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >> >>
> >
> >
>

Re: Can Jena Full Text search work with other Jena based API like Virtuoso Jena or MarkLogic Jena ?

Reply via email to