LowerCaseKeywordAnalyzer

Todd Detwiler Wed, 10 Jun 2015 16:32:47 -0700

I'm having difficulty getting the text indexer to use theLowerCaseKeywordAnalyzer. I was someone might be able to suggest what Iam doing wrong. Here are my details:


1. My dataset is in TDB
2. I am building a Lucene text index
3. I index based on multiple ontology properties
4. I am serving both via Fuseki

5. I am connecting from a remote application to the Fuseki service toanswer SPARQL queries.

TDB and Fuseki are running fine and accessible. The index exists and itwill answer text queries (and the index appears to cover all of theproperties that I included). But, the results seem consistent with theStandardAnalyzer, not a keyword analyzer. So, first, let me tell youwhat I am expecting and what I am seeing:

Classes in my ontology have multiple label fields. I am indexing on allof them. Here is an example, a value from one of the fields indexed,"Anterior superficial cortex proper of left lens". The relevant portionof my query looks like this: ?s text:query (?prop "cor*"). To me thatshould match results that start with "cor". The standard indexer woulddivide the value into individual words, "anterior", "superficial", ...Because one of those tokens matches (cortex) I would expect a searchhit. But, if I use a keyword analyzer, it should consider the entirelabel as a single token. And, therefore, it should NOT match (since ifdoes not start with "cor"). But that isn't what I am seeing.


Am I misunderstanding how the keyword analyzer is supposed to work?

I build my index like this:

java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer--desc=fuseki-assembler.ttl


and my assembler looks like this:

@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:     <http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:      <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:    <http://jena.apache.org/text#> .
@prefix fuseki:  <http://jena.apache.org/fuseki#> .
@prefix foaf:    <http://xmlns.com/foaf/0.1/> .
@prefix fma:      <http://purl.org/sig/ont/fma/> .
@prefix :        <http://localhost/jena_example/#> .

[] rdf:type fuseki:Server ;
   fuseki:services (
     :service_text_tdb
   ) .

## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query
[] ja:loadClass       "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .

## ---------------------------------------------------------------
## This URI must be fixed - it's used to assemble the text dataset.

:text_dataset rdf:type     text:TextDataset ;
    text:dataset   :dataset ;
    text:index     :indexLucene ;
    .

# A TDB datset used for RDF storage
:dataset rdf:type      tdb:DatasetTDB ;
    tdb:location "/usr/local/tdb/fma" ;
    .

<#graph1> rdf:type tdb:GraphTDB ;
    tdb:dataset <#dataset> ;
    tdb:graphName <http://purl.org/sig/ont/fma.owl> ;
    .

# Text index description
:indexLucene a text:TextIndexLucene ;
    text:directory <file:Lucene> ;
    ##text:directory "mem" ;
    text:entityMap :entMap ;
    .

# Mapping in the index
# URI stored in field "uri"
# rdfs:label is mapped to field "text"
:entMap a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;
    text:map (
         [ text:field "text" ;
             text:predicate fma:preferred_name;
             text:analyzer [
               a text:LowerCaseKeywordAnalyzer
           ]
         ]
         [ text:field "text" ;
             text:predicate fma:synonym;
             text:analyzer [
               a text:LowerCaseKeywordAnalyzer
            ]
         ]
         [ text:field "text" ;
             text:predicate fma:non-English_equivalent;
             text:analyzer [
               a text:LowerCaseKeywordAnalyzer
            ]
         ]
         ) ;
    text:queryAnalyzer [
        a text:LowerCaseKeywordAnalyzer
    ] .

:service_text_tdb rdf:type fuseki:Service ;
    rdfs:label                      "TDB/text service" ;
    fuseki:name                     "sig" ;
    fuseki:serviceQuery             "query" ;
    fuseki:serviceQuery             "sparql" ;
    fuseki:serviceUpdate            "update" ;
    fuseki:serviceUpload            "upload" ;
    fuseki:serviceReadGraphStore    "get" ;
    fuseki:serviceReadWriteGraphStore    "data" ;
    fuseki:dataset                  :text_dataset ;
    .



If anyone can spot what I am doing wrong, I'd really appreciate a heads-up.

Thanks,
Todd


--
Landon Todd Detwiler
Structural Informatics Group (SIG)
University of Washington

LowerCaseKeywordAnalyzer

Reply via email to