dbpedia is not actually that large. Make sure you test with RDF datasets that really represent your data.
On Wed, Sep 18, 2019 at 8:14 PM Alex To <tonhud...@gmail.com> wrote: > Update: I switched from Lucene to Elasticsearch 6.4.3 and Kibana. Both Jena > and MarkLogic Jena works with indexing, I haven't tried querying MarkLogic > with text:query though. > > Using Kibana, I could see the number of documents increasing while > importing data with MarkLogic however it is very slow. > > Importing dbpedia.owl (2.5MB) with MarkLogic Jena takes less than a minute > without indexing. > > With TextDataset wrapping around MarkLogic dataset, it takes 13 minutes so > I guess MarkLogic dataset does not seem to send triples in batch when using > with TextDataset. > > > > On Tue, Sep 17, 2019 at 9:58 AM Alex To <tonhud...@gmail.com> wrote: > > > Hi Andy > > > > I ended up creating separate implementation for Jena and MarkLogic full > > text search for now due to time constraints of the project. I will > > investigate further at a later time. > > > > Thank you > > > > Best Regards > > > > On Sun, Sep 15, 2019 at 6:53 PM Andy Seaborne <a...@apache.org> wrote: > > > >> Alex, > >> > >> I can't try it out - I don't have a Marklogic system. > >> > >> Can you see in the server logs what is happening? > >> > >> > Pure speculation but parts 1 & 2 sounds like the data load is not > going > >> > to MarkLogic as a single transaction but as "autocommit" - one > >> > transaction for each triple added. > >> > >> Andy > >> > >> On 13/09/2019 23:04, Andy Seaborne wrote: > >> > The maven central artifact com.marklogic:marklogic-jena is 3.0.6 but > >> our > >> > code depends on 3.1.0 - what code is it using? > >> > > >> > On 13/09/2019 01:18, Alex To wrote: > >> >> I created a small program to try out Lucene with MarkLogic Jena here > >> >> > >> >> > >> > https://github.com/AlexTo/jena-lab/blob/master/src/main/java/com/company/MainMarkLogic.java > >> >> > >> >> > >> >> > >> >> My observation is as follows (see my comment at line 54 & 56) > >> >> > >> >> 1. If the model reads a small file with 2 triples, the loading can > >> finish > >> >> quickly > >> >> 2. If the model reads a slightly larger file (1.5MB), the loading > takes > >> >> forever so I have to terminate it > >> > > >> > Pure speculation but parts 1 & 2 sounds like the data load is not > going > >> > to MarkLogic as a single transaction but as "autocommit" - one > >> > transaction for each triple added. > >> > > >> > Andy > >> > > >> > > >> >> 3. After loading the small file, searching the Lucene index direct > >> shows > >> >> that the triples are indexed > >> >> 4. After loading the small file, run SPARQL query with "text:query" > >> won't > >> >> finish > >> >> > >> >> For now I created 2 separate implementation in my program to support > >> Full > >> >> Text search with Jena or MarkLogic but I look forward to know more > >> >> whether > >> >> it is still possible to use Jena Elastic indexing with TextDataset > >> >> because > >> >> then I can provide a single UI to users to configure their search > >> >> regardless of the back end. :) > >> >> > >> >> > >> >> On Fri, Sep 13, 2019 at 1:07 AM Dan Davis <dansm...@gmail.com> > wrote: > >> >> > >> >>> I am incorrect, and apologize. Virtuoso's Jena 3 driver includes an > >> >>> implementation of Dataset, and so while application is only using > the > >> >>> virtuoso.jena.driver.VirtGraph and > >> >>> virtuoso.jena.driver.VirtuosoQueryExecution (and factory), a more > >> >>> flexible > >> >>> integration is possible. I look forward to experimenting with it and > >> >>> seeing > >> >>> what I can do on the backend. > >> >>> > >> >>> On Thu, Sep 12, 2019 at 10:19 AM Dan Davis <dansm...@gmail.com> > >> wrote: > >> >>> > >> >>>> Virtuoso's Jena driver implements the model interface, rather than > >> the > >> >>>> DatasetGraphAPI. is translating the SPARQL query into its own JDBC > >> >>>> interface. You can see the architecture at > >> >>>> > >> >>> > >> > http://docs.openlinksw.com/virtuoso/rdfnativestorageprovidersjena/#rdfnativestorageprovidersjenawhatisv > . > >> > >> >>> > >> >>> However, > >> >>>> Virtuoso has its own full-text indexing, which can be effective. > Its > >> >>> rules > >> >>>> for translating words into queries is not as flexible as > >> >>>> lucene/solr/elastic, but it does allow you to specify what should > be > >> >>>> indexed - e.g. which objects from which which data properties in > >> which > >> >>>> graphs. > >> >>>> > >> >>>> I use Virtuoso behind virt_jena and virt_jdbc. You can see the > code > >> at > >> >>>> https://github.com/HHS/lodestar, which is run underneath > >> >>>> https://github.com/HHS/meshrdf. You will see that > >> >>>> https://github.com/HHS/lodestar is a fork from EBI, but the NLM > >> copy > >> >>>> has > >> >>>> been updated to Jena 3. The EBI version is ahead on UI features > >> >>>> however. > >> >>>> > >> >>>> I cannot speak to MarkLogic, Stardog, etc. > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> EBI's lodestar still uses Jena 2, but the fork at HHS has been > >> >>>> updated to > >> >>>> Jena 3. > >> >>>> > >> >>>> Virtuoso has its own full-text indexing, which is not as flexible > in > >> >>>> how > >> >>>> it indexes as Elastic/Solr/Lucene. It still works. > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> On Thu, Sep 12, 2019 at 7:03 AM Andy Seaborne <a...@apache.org> > >> wrote: > >> >>>> > >> >>>>> Yes, probably - but. > >> >>>>> > >> >>>>> The Jena text index will work in conjunction with any (Jena) > >> >>>>> DatasetGraphAPI implementation. 3rd party systems are not tested > in > >> >>>>> the > >> >>>>> build. > >> >>>>> > >> >>>>> The "but" is efficiency. Both those systems have their own > built-in > >> >>>>> text > >> >>>>> indexing which execute as part of the native query engine. This > may > >> >>>>> be a > >> >>>>> factor for you, it may not. > >> >>>>> > >> >>>>> Let us know how you get on trying it. > >> >>>>> > >> >>>>> ---- > >> >>>>> > >> >>>>> There is a SPARQL 1.2 issue about standardizing text query. > >> >>>>> > >> >>>>> Issue 40 : SPARQL 1.2 Community Group: > >> >>>>> https://github.com/w3c/sparql-12/issues/40 > >> >>>>> > >> >>>>> Andy > >> >>>>> > >> >>>>> On 12/09/2019 02:53, Alex To wrote: > >> >>>>>> Hi > >> >>>>>> > >> >>>>>> I have so far been happy with Jena + Lucene / Elastic. Just > trying > >> to > >> >>>>> get a > >> >>>>>> quick answer whether it can work with other Jena based API like > >> >>>>> Virtuoso / > >> >>>>>> MarkLogic. > >> >>>>>> > >> >>>>>> If I wrap a MarkLogic Dataset in a Jena TextDataset, can it work > as > >> >>>>>> expected ? > >> >>>>>> > >> >>>>>> Given that a MarkLogic / Virtuoso Dataset implements Jena Dataset > >> >>>>>> interface, it may work but I am not sure because the "text:query" > >> >>> seems > >> >>>>> to > >> >>>>>> be more Jena specific. > >> >>>>>> > >> >>>>>> I will try out myself in the next couple of days to see if it > works > >> >>> but > >> >>>>> if > >> >>>>>> there is a quick answer it may save me a couple of hours :) > >> >>>>>> > >> >>>>>> Thank a lot > >> >>>>>> > >> >>>>>> Regards > >> >>>>>> > >> >>>>> > >> >>>> > >> >>> > >> >> > >> >> > > > > >