On 30/09/2020 15:12, Mikael Pesonen wrote:

Okay got the index done:

/usr/bin/java -cp ./fuseki-server.jar jena.textindexer --desc=fuseki_config.ttl 16:51:57 INFO  textindexer     :: 159657 (15965 per second)properties indexed (15965 per second overall) 16:52:07 INFO  textindexer     :: 349257 (18960 per second)properties indexed (17462 per second overall) 16:52:17 INFO  textindexer     :: 539238 (18998 per second)properties indexed (17974 per second overall) 16:52:27 INFO  textindexer     :: 708454 (16921 per second)properties indexed (17711 per second overall) 16:52:37 INFO  textindexer     :: 888469 (18001 per second)properties indexed (17769 per second overall) 16:52:46 INFO  textindexer     :: 928952 (15744 per second) properties indexed

but I'm getting no results. Tried (with data that should return matches)

(?s ?score ?content) text:query (vcard:fn "Some Person"  )
and
?s text:query "something"  .


On startup Jena now says

2020-09-30 16:47:48,396 main ERROR Reconfiguration failed: No configuration found for '5bc2b487' at 'null' in 'null'

if that is somehow related.

looks likely.

Earier you showed:

select * where
{
 graph ?g {
    ?s ?p ?o filter(regex(str(?s), "[\x00-\x7F]"))
  }
}

so also may be a namedgraph
    Andy




On 30.9.2020 15:18, Andy Seaborne wrote:
https://issues.apache.org/jira/browse/JENA-1890 and 1892

are fixed in 3.16.0

Its a dcode error - the TDB database is intact.

On 30/09/2020 12:31, Mikael Pesonen wrote:

I figured out the regexp. Seems that we have external data having non Ascii URLs that can't be altered. Is there any workaround, for example adding text index to selected graphs only?

On 30.9.2020 13:57, Mikael Pesonen wrote:

Ah, thanks. Is it possible to find such URis with SPARQL query? SPARQL seems not to support \x -notation

select * where
{
 graph ?g {
    ?s ?p ?o filter(regex(str(?s), "[\x00-\x7F]"))
  }
}



On 30.9.2020 13:29, Andy Seaborne wrote:
In the data (probbaly in a URI) - it's reading the database.

On 30/09/2020 10:36, Mikael Pesonen wrote:

I couldn't find any non Ascii characters in the config file ([^\x00-\x7F]+)...

On 30.9.2020 0:48, Andy Seaborne wrote:
Looks like

https://issues.apache.org/jira/browse/JENA-1892 , 1890

    Andy

On 29/09/2020 15:13, Mikael Pesonen wrote:

Hi

I'm building a new text index with following command and getting java error.

/usr/bin/java -cp ./fuseki-server.jar jena.textindexer --desc=fuseki_config.ttl

After the command I get 4 files in /home/text/tools/jena_text_index/

_0.fdt
_0.fdx
segments_1
write.lock

Any idea what could case this?


Error is:

java.lang.StringIndexOutOfBoundsException: String index out of range: 59          at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:48)
         at java.base/java.lang.String.charAt(String.java:711)
         at org.apache.jena.atlas.lib.StrUtils.decodeHex(StrUtils.java:212)          at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:121)          at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:120)          at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:97)          at org.apache.jena.tdb.store.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:182)          at org.apache.jena.tdb.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:108)          at org.apache.jena.tdb.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:67)          at org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:128)          at org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:82)          at org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:50)          at org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)          at org.apache.jena.tdb.lib.TupleLib.quad(TupleLib.java:126)          at org.apache.jena.tdb.lib.TupleLib.quad(TupleLib.java:120)          at org.apache.jena.tdb.lib.TupleLib.lambda$convertToQuads$3(TupleLib.java:59)          at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:352)          at org.apache.jena.atlas.iterator.IteratorCons.next(IteratorCons.java:104)
         at jena.textindexer.exec(textindexer.java:130)
         at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
         at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
         at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
         at jena.textindexer.main(textindexer.java:52)
mikael@insight-dev:/home/text/tools/apache-jena-fuseki-3.14.0$ /usr/bin/java -cp ./fuseki-server.jar jena.textindexer --desc=fuseki_config.ttl java.lang.StringIndexOutOfBoundsException: String index out of range: 59          at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:48)
         at java.base/java.lang.String.charAt(String.java:711)
         at org.apache.jena.atlas.lib.StrUtils.decodeHex(StrUtils.java:212)          at org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:121)          at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:120)          at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:97)          at org.apache.jena.tdb.store.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:182)          at org.apache.jena.tdb.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:108)          at org.apache.jena.tdb.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:67)          at org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:128)          at org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:82)          at org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:50)          at org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)          at org.apache.jena.tdb.lib.TupleLib.quad(TupleLib.java:126)          at org.apache.jena.tdb.lib.TupleLib.quad(TupleLib.java:120)          at org.apache.jena.tdb.lib.TupleLib.lambda$convertToQuads$3(TupleLib.java:59)          at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:352)          at org.apache.jena.atlas.iterator.IteratorCons.next(IteratorCons.java:104)
         at jena.textindexer.exec(textindexer.java:130)
         at jena.cmd.CmdMain.mainMethod(CmdMain.java:93)
         at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
         at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
         at jena.textindexer.main(textindexer.java:52)


config:

@prefix :<http://localhost/jena_example/#>  .
@prefix rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#> .
@prefix tdb:<http://jena.hpl.hp.com/2008/tdb#> .
@prefix ja:<http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix text:<http://jena.apache.org/text#>  .
@prefix skos:<http://www.w3.org/2004/02/skos/core#> .
@prefix fuseki:<http://jena.apache.org/fuseki#> .
@prefix vcard:<http://www.w3.org/2006/vcard/ns#> .

## Example of a TDB dataset and text index
## Initialize TDB
[] ja:loadClass "org.apache.jena.tdb.TDB" .
tdb:DatasetTDB  rdfs:subClassOf  ja:RDFDataset .
tdb:GraphTDB    rdfs:subClassOf  ja:Model .

## Initialize text query
[] ja:loadClass "org.apache.jena.query.text.TextQuery" .
# A TextDataset is a regular dataset with a text index.
text:TextDataset      rdfs:subClassOf   ja:RDFDataset .
# Lucene index
text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .


## ---------------------------------------------------------------
# build: java -cp ./fuseki-server.jar jena.textindexer --desc=fuseki_config.ttl

:text_dataset rdf:type     text:TextDataset ;
      text:dataset   :my_dataset ;
      text:index     <#indexLucene> ;
      .

# A TDB dataset used for RDF storage
:my_dataset rdf:type      tdb:DatasetTDB ;
      tdb:location "/home/text/tools/jena_data/" ;
#    tdb:unionDefaultGraph true ; # Optional
      .

# Text index description
<#indexLucene> a text:TextIndexLucene ;
      text:directory <file:/home/text/tools/jena_text_index/> ;
      text:entityMap <#entMap> ;
      text:storeValues true ;
      text:analyzer [ a text:StandardAnalyzer ] ;
      text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
      text:queryParser text:AnalyzingQueryParser ;
      text:multilingualSupport true ;
   .

<#entMap> a text:EntityMap ;
      text:defaultField     "vcard_fn" ;
      text:entityField      "uri" ;
      text:uidField         "uid" ;
      text:langField        "lang" ;
      text:graphField       "graph" ;
      text:map (
           [ text:field "vcard_fn" ; text:predicate vcard:fn ]
           [ text:field "altLabel"  ; text:predicate skos:altLabel ]
           ) .

<#service> rdf:type fuseki:Service ;
      fuseki:name                     "/ds" ;   # http://host:port/ds-ro       fuseki:serviceQuery             "query" ;    # SPARQL query service       fuseki:serviceQuery             "sparql" ;   # SPARQL query service       fuseki:serviceUpdate            "update" ;   # SPARQL update service       fuseki:serviceUpload            "upload" ;   # Non-SPARQL upload service       fuseki:serviceReadWriteGraphStore "data" ;     # SPARQL Graph store protocol (read and write)
      fuseki:dataset           :text_dataset ;
      .





Reply via email to