It would be of tremendous value to my project if this works; I wish I had time to try it also.
On Wed, Sep 18, 2019, 10:03 PM Alex To <tonhud...@gmail.com> wrote: > Hi Dan > Thanks for your suggestion but I am not trying to load large dataset yet. > > I am trying to see if I can use Jena Full text search with other Jena based > API such as MarkLogic or Virtuoso but seems like it doesn't work as > expected. Not a Jena problem though. My set up is > > 1. Input file: dbpedia.owl (2.5MB) > 2. Import using MarkLogic Jena without TextDataset: 1 minute > 3. Import using MarkLogic Jena with TextDataset wrapping about it: 13 > minutes > > Regards > > On Thu, Sep 19, 2019 at 10:54 AM Dan Davis <dansm...@gmail.com> wrote: > > > dbpedia is not actually that large. Make sure you test with RDF datasets > > that really represent your data. > > > > On Wed, Sep 18, 2019 at 8:14 PM Alex To <tonhud...@gmail.com> wrote: > > > > > Update: I switched from Lucene to Elasticsearch 6.4.3 and Kibana. Both > > Jena > > > and MarkLogic Jena works with indexing, I haven't tried querying > > MarkLogic > > > with text:query though. > > > > > > Using Kibana, I could see the number of documents increasing while > > > importing data with MarkLogic however it is very slow. > > > > > > Importing dbpedia.owl (2.5MB) with MarkLogic Jena takes less than a > > minute > > > without indexing. > > > > > > With TextDataset wrapping around MarkLogic dataset, it takes 13 minutes > > so > > > I guess MarkLogic dataset does not seem to send triples in batch when > > using > > > with TextDataset. > > > > > > > > > > > > On Tue, Sep 17, 2019 at 9:58 AM Alex To <tonhud...@gmail.com> wrote: > > > > > > > Hi Andy > > > > > > > > I ended up creating separate implementation for Jena and MarkLogic > full > > > > text search for now due to time constraints of the project. I will > > > > investigate further at a later time. > > > > > > > > Thank you > > > > > > > > Best Regards > > > > > > > > On Sun, Sep 15, 2019 at 6:53 PM Andy Seaborne <a...@apache.org> > wrote: > > > > > > > >> Alex, > > > >> > > > >> I can't try it out - I don't have a Marklogic system. > > > >> > > > >> Can you see in the server logs what is happening? > > > >> > > > >> > Pure speculation but parts 1 & 2 sounds like the data load is not > > > going > > > >> > to MarkLogic as a single transaction but as "autocommit" - one > > > >> > transaction for each triple added. > > > >> > > > >> Andy > > > >> > > > >> On 13/09/2019 23:04, Andy Seaborne wrote: > > > >> > The maven central artifact com.marklogic:marklogic-jena is 3.0.6 > but > > > >> our > > > >> > code depends on 3.1.0 - what code is it using? > > > >> > > > > >> > On 13/09/2019 01:18, Alex To wrote: > > > >> >> I created a small program to try out Lucene with MarkLogic Jena > > here > > > >> >> > > > >> >> > > > >> > > > > > > https://github.com/AlexTo/jena-lab/blob/master/src/main/java/com/company/MainMarkLogic.java > > > >> >> > > > >> >> > > > >> >> > > > >> >> My observation is as follows (see my comment at line 54 & 56) > > > >> >> > > > >> >> 1. If the model reads a small file with 2 triples, the loading > can > > > >> finish > > > >> >> quickly > > > >> >> 2. If the model reads a slightly larger file (1.5MB), the loading > > > takes > > > >> >> forever so I have to terminate it > > > >> > > > > >> > Pure speculation but parts 1 & 2 sounds like the data load is not > > > going > > > >> > to MarkLogic as a single transaction but as "autocommit" - one > > > >> > transaction for each triple added. > > > >> > > > > >> > Andy > > > >> > > > > >> > > > > >> >> 3. After loading the small file, searching the Lucene index > direct > > > >> shows > > > >> >> that the triples are indexed > > > >> >> 4. After loading the small file, run SPARQL query with > "text:query" > > > >> won't > > > >> >> finish > > > >> >> > > > >> >> For now I created 2 separate implementation in my program to > > support > > > >> Full > > > >> >> Text search with Jena or MarkLogic but I look forward to know > more > > > >> >> whether > > > >> >> it is still possible to use Jena Elastic indexing with > TextDataset > > > >> >> because > > > >> >> then I can provide a single UI to users to configure their search > > > >> >> regardless of the back end. :) > > > >> >> > > > >> >> > > > >> >> On Fri, Sep 13, 2019 at 1:07 AM Dan Davis <dansm...@gmail.com> > > > wrote: > > > >> >> > > > >> >>> I am incorrect, and apologize. Virtuoso's Jena 3 driver includes > > an > > > >> >>> implementation of Dataset, and so while application is only > using > > > the > > > >> >>> virtuoso.jena.driver.VirtGraph and > > > >> >>> virtuoso.jena.driver.VirtuosoQueryExecution (and factory), a > more > > > >> >>> flexible > > > >> >>> integration is possible. I look forward to experimenting with it > > and > > > >> >>> seeing > > > >> >>> what I can do on the backend. > > > >> >>> > > > >> >>> On Thu, Sep 12, 2019 at 10:19 AM Dan Davis <dansm...@gmail.com> > > > >> wrote: > > > >> >>> > > > >> >>>> Virtuoso's Jena driver implements the model interface, rather > > than > > > >> the > > > >> >>>> DatasetGraphAPI. is translating the SPARQL query into its own > > JDBC > > > >> >>>> interface. You can see the architecture at > > > >> >>>> > > > >> >>> > > > >> > > > > > > http://docs.openlinksw.com/virtuoso/rdfnativestorageprovidersjena/#rdfnativestorageprovidersjenawhatisv > > > . > > > >> > > > >> >>> > > > >> >>> However, > > > >> >>>> Virtuoso has its own full-text indexing, which can be > effective. > > > Its > > > >> >>> rules > > > >> >>>> for translating words into queries is not as flexible as > > > >> >>>> lucene/solr/elastic, but it does allow you to specify what > should > > > be > > > >> >>>> indexed - e.g. which objects from which which data properties > in > > > >> which > > > >> >>>> graphs. > > > >> >>>> > > > >> >>>> I use Virtuoso behind virt_jena and virt_jdbc. You can see the > > > code > > > >> at > > > >> >>>> https://github.com/HHS/lodestar, which is run underneath > > > >> >>>> https://github.com/HHS/meshrdf. You will see that > > > >> >>>> https://github.com/HHS/lodestar is a fork from EBI, but the > NLM > > > >> copy > > > >> >>>> has > > > >> >>>> been updated to Jena 3. The EBI version is ahead on UI features > > > >> >>>> however. > > > >> >>>> > > > >> >>>> I cannot speak to MarkLogic, Stardog, etc. > > > >> >>>> > > > >> >>>> > > > >> >>>> > > > >> >>>> > > > >> >>>> > > > >> >>>> EBI's lodestar still uses Jena 2, but the fork at HHS has been > > > >> >>>> updated to > > > >> >>>> Jena 3. > > > >> >>>> > > > >> >>>> Virtuoso has its own full-text indexing, which is not as > flexible > > > in > > > >> >>>> how > > > >> >>>> it indexes as Elastic/Solr/Lucene. It still works. > > > >> >>>> > > > >> >>>> > > > >> >>>> > > > >> >>>> > > > >> >>>> On Thu, Sep 12, 2019 at 7:03 AM Andy Seaborne <a...@apache.org > > > > > >> wrote: > > > >> >>>> > > > >> >>>>> Yes, probably - but. > > > >> >>>>> > > > >> >>>>> The Jena text index will work in conjunction with any (Jena) > > > >> >>>>> DatasetGraphAPI implementation. 3rd party systems are not > tested > > > in > > > >> >>>>> the > > > >> >>>>> build. > > > >> >>>>> > > > >> >>>>> The "but" is efficiency. Both those systems have their own > > > built-in > > > >> >>>>> text > > > >> >>>>> indexing which execute as part of the native query engine. > This > > > may > > > >> >>>>> be a > > > >> >>>>> factor for you, it may not. > > > >> >>>>> > > > >> >>>>> Let us know how you get on trying it. > > > >> >>>>> > > > >> >>>>> ---- > > > >> >>>>> > > > >> >>>>> There is a SPARQL 1.2 issue about standardizing text query. > > > >> >>>>> > > > >> >>>>> Issue 40 : SPARQL 1.2 Community Group: > > > >> >>>>> https://github.com/w3c/sparql-12/issues/40 > > > >> >>>>> > > > >> >>>>> Andy > > > >> >>>>> > > > >> >>>>> On 12/09/2019 02:53, Alex To wrote: > > > >> >>>>>> Hi > > > >> >>>>>> > > > >> >>>>>> I have so far been happy with Jena + Lucene / Elastic. Just > > > trying > > > >> to > > > >> >>>>> get a > > > >> >>>>>> quick answer whether it can work with other Jena based API > like > > > >> >>>>> Virtuoso / > > > >> >>>>>> MarkLogic. > > > >> >>>>>> > > > >> >>>>>> If I wrap a MarkLogic Dataset in a Jena TextDataset, can it > > work > > > as > > > >> >>>>>> expected ? > > > >> >>>>>> > > > >> >>>>>> Given that a MarkLogic / Virtuoso Dataset implements Jena > > Dataset > > > >> >>>>>> interface, it may work but I am not sure because the > > "text:query" > > > >> >>> seems > > > >> >>>>> to > > > >> >>>>>> be more Jena specific. > > > >> >>>>>> > > > >> >>>>>> I will try out myself in the next couple of days to see if it > > > works > > > >> >>> but > > > >> >>>>> if > > > >> >>>>>> there is a quick answer it may save me a couple of hours :) > > > >> >>>>>> > > > >> >>>>>> Thank a lot > > > >> >>>>>> > > > >> >>>>>> Regards > > > >> >>>>>> > > > >> >>>>> > > > >> >>>> > > > >> >>> > > > >> >> > > > >> >> > > > > > > > > > > > > > > > > -- > > Alex To > > PhD Candidate > > School of Computer Science > > Knowledge Discovery and Management Research Group > > Faculty of Engineering & IT > > THE UNIVERSITY OF SYDNEY | NSW | 2006 > > Desk 4e69 | Building J12| 1 Cleveland Street > > M. +61423330656 <%2B61450061602> >