Hi Dan
Thanks for your suggestion but I am not trying to load large dataset yet.

I am trying to see if I can use Jena Full text search with other Jena based
API such as MarkLogic or Virtuoso but seems like it doesn't work as
expected. Not a Jena problem though. My set up is

1. Input file: dbpedia.owl (2.5MB)
2. Import using MarkLogic Jena without TextDataset: 1 minute
3. Import using MarkLogic Jena with TextDataset wrapping about it: 13
minutes

Regards

On Thu, Sep 19, 2019 at 10:54 AM Dan Davis <[email protected]> wrote:

> dbpedia is not actually that large.  Make sure you test with RDF datasets
> that really represent your data.
>
> On Wed, Sep 18, 2019 at 8:14 PM Alex To <[email protected]> wrote:
>
> > Update: I switched from Lucene to Elasticsearch 6.4.3 and Kibana. Both
> Jena
> > and MarkLogic Jena works with indexing, I haven't tried querying
> MarkLogic
> > with text:query though.
> >
> > Using Kibana, I could see the number of documents increasing while
> > importing data with MarkLogic however it is very slow.
> >
> > Importing dbpedia.owl (2.5MB)  with MarkLogic Jena takes less than a
> minute
> > without indexing.
> >
> > With TextDataset wrapping around MarkLogic dataset, it takes 13 minutes
> so
> > I guess MarkLogic dataset does not seem to send triples in batch when
> using
> > with TextDataset.
> >
> >
> >
> > On Tue, Sep 17, 2019 at 9:58 AM Alex To <[email protected]> wrote:
> >
> > > Hi Andy
> > >
> > > I ended up creating separate implementation for Jena and MarkLogic full
> > > text search for now due to time constraints of the project. I will
> > > investigate further  at a later time.
> > >
> > > Thank you
> > >
> > > Best Regards
> > >
> > > On Sun, Sep 15, 2019 at 6:53 PM Andy Seaborne <[email protected]> wrote:
> > >
> > >> Alex,
> > >>
> > >> I can't try it out - I don't have a Marklogic system.
> > >>
> > >> Can you see in the server logs what is happening?
> > >>
> > >>  > Pure speculation but parts 1 & 2 sounds like the data load is not
> > going
> > >>  > to MarkLogic as a single transaction but as "autocommit" - one
> > >>  > transaction for each triple added.
> > >>
> > >>      Andy
> > >>
> > >> On 13/09/2019 23:04, Andy Seaborne wrote:
> > >> > The maven central artifact com.marklogic:marklogic-jena is 3.0.6 but
> > >> our
> > >> > code depends on 3.1.0 - what code is it using?
> > >> >
> > >> > On 13/09/2019 01:18, Alex To wrote:
> > >> >> I created a small program to try out Lucene with MarkLogic Jena
> here
> > >> >>
> > >> >>
> > >>
> >
> https://github.com/AlexTo/jena-lab/blob/master/src/main/java/com/company/MainMarkLogic.java
> > >> >>
> > >> >>
> > >> >>
> > >> >> My observation is as follows (see my comment at line 54 & 56)
> > >> >>
> > >> >> 1. If the model reads a small file with 2 triples, the loading can
> > >> finish
> > >> >> quickly
> > >> >> 2. If the model reads a slightly larger file (1.5MB), the loading
> > takes
> > >> >> forever so I have to terminate it
> > >> >
> > >> > Pure speculation but parts 1 & 2 sounds like the data load is not
> > going
> > >> > to MarkLogic as a single transaction but as "autocommit" - one
> > >> > transaction for each triple added.
> > >> >
> > >> >      Andy
> > >> >
> > >> >
> > >> >> 3. After loading the small file, searching the Lucene index direct
> > >> shows
> > >> >> that the triples are indexed
> > >> >> 4. After loading the small file, run SPARQL query with "text:query"
> > >> won't
> > >> >> finish
> > >> >>
> > >> >> For now I created 2 separate implementation in my program to
> support
> > >> Full
> > >> >> Text search with Jena or MarkLogic but I look forward to know more
> > >> >> whether
> > >> >> it is still possible to use Jena Elastic indexing with TextDataset
> > >> >> because
> > >> >> then I can provide a single UI to users to configure their search
> > >> >> regardless of the back end. :)
> > >> >>
> > >> >>
> > >> >> On Fri, Sep 13, 2019 at 1:07 AM Dan Davis <[email protected]>
> > wrote:
> > >> >>
> > >> >>> I am incorrect, and apologize. Virtuoso's Jena 3 driver includes
> an
> > >> >>> implementation of Dataset, and so while application is only using
> > the
> > >> >>> virtuoso.jena.driver.VirtGraph and
> > >> >>> virtuoso.jena.driver.VirtuosoQueryExecution (and factory), a more
> > >> >>> flexible
> > >> >>> integration is possible. I look forward to experimenting with it
> and
> > >> >>> seeing
> > >> >>> what I can do on the backend.
> > >> >>>
> > >> >>> On Thu, Sep 12, 2019 at 10:19 AM Dan Davis <[email protected]>
> > >> wrote:
> > >> >>>
> > >> >>>> Virtuoso's Jena driver implements the model interface, rather
> than
> > >> the
> > >> >>>> DatasetGraphAPI.  is translating the SPARQL query into its own
> JDBC
> > >> >>>> interface. You can see the architecture at
> > >> >>>>
> > >> >>>
> > >>
> >
> http://docs.openlinksw.com/virtuoso/rdfnativestorageprovidersjena/#rdfnativestorageprovidersjenawhatisv
> > .
> > >>
> > >> >>>
> > >> >>> However,
> > >> >>>> Virtuoso has its own full-text indexing, which can be effective.
> > Its
> > >> >>> rules
> > >> >>>> for translating words into queries is not as flexible as
> > >> >>>> lucene/solr/elastic, but it does allow you to specify what should
> > be
> > >> >>>> indexed - e.g. which objects from which which data properties in
> > >> which
> > >> >>>> graphs.
> > >> >>>>
> > >> >>>> I use Virtuoso behind virt_jena and virt_jdbc.  You can see the
> > code
> > >> at
> > >> >>>> https://github.com/HHS/lodestar, which is run underneath
> > >> >>>> https://github.com/HHS/meshrdf.   You will see that
> > >> >>>> https://github.com/HHS/lodestar is a fork from EBI, but the NLM
> > >> copy
> > >> >>>> has
> > >> >>>> been updated to Jena 3. The EBI version is ahead on UI features
> > >> >>>> however.
> > >> >>>>
> > >> >>>> I cannot speak to MarkLogic, Stardog, etc.
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> EBI's lodestar still uses Jena 2, but the fork at HHS has been
> > >> >>>> updated to
> > >> >>>> Jena 3.
> > >> >>>>
> > >> >>>> Virtuoso has its own full-text indexing, which is not as flexible
> > in
> > >> >>>> how
> > >> >>>> it indexes as Elastic/Solr/Lucene.   It still works.
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> On Thu, Sep 12, 2019 at 7:03 AM Andy Seaborne <[email protected]>
> > >> wrote:
> > >> >>>>
> > >> >>>>> Yes, probably - but.
> > >> >>>>>
> > >> >>>>> The Jena text index will work in conjunction with any (Jena)
> > >> >>>>> DatasetGraphAPI implementation. 3rd party systems are not tested
> > in
> > >> >>>>> the
> > >> >>>>> build.
> > >> >>>>>
> > >> >>>>> The "but" is efficiency. Both those systems have their own
> > built-in
> > >> >>>>> text
> > >> >>>>> indexing which execute as part of the native query engine. This
> > may
> > >> >>>>> be a
> > >> >>>>> factor for you, it may not.
> > >> >>>>>
> > >> >>>>> Let us know how you get on trying it.
> > >> >>>>>
> > >> >>>>> ----
> > >> >>>>>
> > >> >>>>> There is a SPARQL 1.2 issue about standardizing text query.
> > >> >>>>>
> > >> >>>>> Issue 40 : SPARQL 1.2 Community Group:
> > >> >>>>> https://github.com/w3c/sparql-12/issues/40
> > >> >>>>>
> > >> >>>>>       Andy
> > >> >>>>>
> > >> >>>>> On 12/09/2019 02:53, Alex To wrote:
> > >> >>>>>> Hi
> > >> >>>>>>
> > >> >>>>>> I have so far been happy with Jena + Lucene / Elastic. Just
> > trying
> > >> to
> > >> >>>>> get a
> > >> >>>>>> quick answer whether it can work with other Jena based API like
> > >> >>>>> Virtuoso /
> > >> >>>>>> MarkLogic.
> > >> >>>>>>
> > >> >>>>>> If I wrap a MarkLogic Dataset in a Jena TextDataset, can it
> work
> > as
> > >> >>>>>> expected ?
> > >> >>>>>>
> > >> >>>>>> Given that a MarkLogic / Virtuoso Dataset implements Jena
> Dataset
> > >> >>>>>> interface, it may work but I am not sure because the
> "text:query"
> > >> >>> seems
> > >> >>>>> to
> > >> >>>>>> be more Jena specific.
> > >> >>>>>>
> > >> >>>>>> I will try out myself in the next couple of days to see if it
> > works
> > >> >>> but
> > >> >>>>> if
> > >> >>>>>> there is a quick answer it may save me a couple of hours :)
> > >> >>>>>>
> > >> >>>>>> Thank a lot
> > >> >>>>>>
> > >> >>>>>> Regards
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >> >>
> > >
> > >
> >
>


-- 

Alex To

PhD Candidate

School of Computer Science

Knowledge Discovery and Management Research Group

Faculty of Engineering & IT

THE UNIVERSITY OF SYDNEY | NSW | 2006

Desk 4e69 | Building J12| 1 Cleveland Street

M. +61423330656 <%2B61450061602>

Reply via email to