Re: Multiword Jena text queries

Øyvind Gjesdal Thu, 08 Oct 2020 03:56:15 -0700

I have a working setup (on fuseki 3.14) where I can see different results
using AND/OR, and where "~" fuzzy operator also works.


Differences from your config seem to be that I haven't configured much for
the index, only the directory and entity map. Would testing if a minimal
config works and then rebuilding index from command line, with more
configuration each time until it breaks help?

    <#text_index> a text:TextIndexLucene ;    text:directory
</var/fuseki/databases/place-name-data/Lucene> ;    text:entityMap
<#entMap> ;    .

Best regards,

Øyvind



tor. 8. okt. 2020 kl. 10:23 skrev Mikael Pesonen <[email protected]
>:

> Anyone got any idea how to fix this? I'm out of ideas.
>
> On Mon, 5 Oct 2020 at 14:33, Mikael Pesonen <[email protected]>
> wrote:
>
> >
> > Sorry, correction: "language AND <any other words here>" and "language
> > OR <any other words here>" return same results as "language <any other
> > words here>" and same results as "language".
> >
> > On 5.10.2020 14:27, Mikael Pesonen wrote:
> > >
> > > Hi,
> > >
> > > forgot to mention that AND and OR in query returns also no results.
> > > I'm somewhat familiar with Lucene syntax but seems like none of the
> > > syntax works with my setup.
> > > There are no errors in Jena log, only the warning about
> > > AnalyzingQueryParser.
> > >
> > >
> > >
> > > On 5.10.2020 13:49, Lorenz Buehmann wrote:
> > >> It's Lucene syntax so a look into its documentation[1] could help.
> > >>
> > >> Regarding multiple words, default Boolean operator is "OR", i.e.
> > >>
> > >> "language <any other words here>" is equivalent to "language OR <any
> > >> other words here>". Obviously the result will contain all at least
> > >> documents with "language". Use AND operator if it must contain both.
> > >>
> > >> Fuzzy queries and proximity queries are also explained in the Lucene
> > >> docs[1].
> > >>
> > >>
> > >>
> > >> [1] https://lucene.apache.org/core/8_6_2/queryparser/index.html
> > >>
> > >> On 05.10.20 11:22, Mikael Pesonen wrote:
> > >>> I'm having trouble making other that one word queries.
> > >>>
> > >>> For example "language <any other words here>" gives same result,
> > >>> regardless of the other words.
> > >>>
> > >>> Using quotes "\"some query\"" returns no results.
> > >>>
> > >>>
> > >>>
> > >>> So I would like to make "fuzzy" multiword queries where for example
> > >>>
> > >>> "language technology" returns different results  that "language
> > >>> management"
> > >>>
> > >>> And also to query "\"language technology\"" which should return exact
> > >>> matches.
> > >>>
> > >>>
> > >>>
> > >>> I'm using latest Jena with  AnalyzingQueryParser, which gives warning
> > >>>
> > >>>   WARN  TextIndexLucene :: Deprecated query parser type
> > >>> 'AnalyzingQueryParser'. Defaulting to standard QueryParser
> > >>>
> > >>> Also tried other parsers.
> > >>>
> > >>>
> > >>> Config:
> > >>>
> > >>> @prefix :<http://localhost/jena_example/#>  .
> > >>> @prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>  .
> > >>> @prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>  .
> > >>> @prefix tdb:<http://jena.hpl.hp.com/2008/tdb#>  .
> > >>> @prefix ja:<http://jena.hpl.hp.com/2005/11/Assembler#> .
> > >>> @prefix text:<http://jena.apache.org/text#>  .
> > >>> @prefix skos:<http://www.w3.org/2004/02/skos/core#> .
> > >>> @prefix fuseki:<http://jena.apache.org/fuseki#>  .
> > >>> @prefix vcard:<http://www.w3.org/2006/vcard/ns#> .
> > >>> @prefix dcterms:<http://purl.org/dc/terms/> .
> > >>>
> > >>> @prefix lsrm:<https://resource.lingsoft.fi/ns/resource_meta#> .
> > >>>
> > >>> ## Example of a TDB dataset and text index
> > >>> ## Initialize TDB
> > >>> [] ja:loadClass "org.apache.jena.tdb.TDB" .
> > >>> tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
> > >>> tdb:GraphTDB    rdfs:subClassOf  ja:Model .
> > >>>
> > >>> ## Initialize text query
> > >>> [] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
> > >>> # A TextDataset is a regular dataset with a text index.
> > >>> text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
> > >>> # Lucene index
> > >>> text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
> > >>>
> > >>>
> > >>> :text_dataset rdf:type     text:TextDataset ;
> > >>>       text:dataset   :my_dataset ;
> > >>>       text:index     <#indexLucene> ;
> > >>>       .
> > >>>
> > >>> # A TDB dataset used for RDF storage
> > >>> :my_dataset rdf:type      tdb:DatasetTDB ;
> > >>>       tdb:location "/home/text/tools/jena_data/" ;
> > >>> #    tdb:unionDefaultGraph true ; # Optional
> > >>>       .
> > >>>
> > >>> # Text index description
> > >>> <#indexLucene> a text:TextIndexLucene ;
> > >>>       text:directory <file:/home/text/tools/jena_text_index/> ;
> > >>>       text:entityMap <#entMap> ;
> > >>>       text:storeValues true ;
> > >>>       text:analyzer [ a text:StandardAnalyzer ] ;
> > >>>       text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
> > >>>       text:queryParser text:AnalyzingQueryParser ;
> > >>>       text:multilingualSupport true ;
> > >>>    .
> > >>>
> > >>> <#entMap> a text:EntityMap ;
> > >>>       text:defaultField     "vcard_fn" ;
> > >>>       text:entityField      "uri" ;
> > >>>       text:uidField         "uid" ;
> > >>>       text:langField        "lang" ;
> > >>>       text:graphField       "graph" ;
> > >>>       text:map (
> > >>>            [ text:field "vcard_fn" ; text:predicate vcard:fn ]
> > >>>            [ text:field "skos_prefLabel"  ; text:predicate
> > >>> skos:prefLabel ]
> > >>>            [ text:field "skos_altLabel"  ; text:predicate
> > >>> skos:altLabel ]
> > >>>            [ text:field "lsrm_content" ; text:predicate lsrm:content]
> > >>>            [ text:field "dcterms_title" ; text:predicate
> dcterms:title]
> > >>>            [ text:field "dcterms_description" ; text:predicate
> > >>> dcterms:description]
> > >>>            ) .
> > >>>
> > >>> <#service> rdf:type fuseki:Service ;
> > >>>       fuseki:name                     "/ds" ;   #
> > >>> http://host:port/ds-ro
> > >>>       fuseki:serviceQuery             "query" ;    # SPARQL query
> > >>> service
> > >>>       fuseki:serviceQuery             "sparql" ;   # SPARQL query
> > >>> service
> > >>>       fuseki:serviceUpdate            "update" ;   # SPARQL update
> > >>> service
> > >>>       fuseki:serviceUpload            "upload" ;   # Non-SPARQL
> upload
> > >>> service
> > >>>       fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph
> > >>> store protocol (read and write)
> > >>>       fuseki:dataset           :text_dataset ;
> > >>>       .
> > >
> >
> >
>

Re: Multiword Jena text queries

Reply via email to