Re: Upgrading 3.0.1 to 3.13.1: ReaderRIOT

2020-03-26 Thread Martynas Jusevičius
Oh right, this is the code I tried - got the exception both ways:

Model parsed = ModelFactory.createDefaultModel();
try (StringReader reader = new StringReader(validRDFPost))
{
RDFDataMgr.read(parsed, reader, "http://base;,
RDFLanguages.RDFPOST);
}


Model parsed = ModelFactory.createDefaultModel();
try (StringReader reader = new StringReader(rdfPost))
{
parsed.read(reader, "http://base;, "RDF/POST");
}


Re: Upgrading 3.0.1 to 3.13.1: ReaderRIOT

2020-03-26 Thread Martynas Jusevičius
Thanks.

I'm trying 3.14.0 for now. Refactored RDFPostReader without much
trouble, but now I get

java.lang.NoClassDefFoundError: org/apache/http/client/HttpClient
at org.apache.jena.riot.RDFParser.create(RDFParser.java:114)

because RDFParser now has a field of type HttpClient :/ I guess it
didn't in 3.0.1.

I had excluded HTTP Client because Core is using Jersey Client:


org.apache.jena
jena-arq
3.14.0


org.apache.httpcomponents
httpclient


org.apache.httpcomponents
httpclient-cache




This makes me wonder why the parsers/writers are dealing directly with
HTTP. I already had this problem with the JsonLDWriter:
https://mail-archives.apache.org/mod_mbox/jena-users/201712.mbox/%3ccae35vmygk-biq1fayp3zooyimikzhvja8dtuged3jayr98u...@mail.gmail.com%3e

Not sure how to proceed.

On Thu, Mar 26, 2020 at 10:34 PM Andy Seaborne  wrote:
>
>
>
> On 26/03/2020 18:57, Martynas Jusevičius wrote:
> > Need to match SPIN RDF API which is on 3.13.1...
> > https://github.com/spinrdf/spinrdf/blob/master/pom.xml#L73
>
> spinrdf does not have any binary artifacts and is built locally.
>
> Changing the version before building would be possible.
>
>  Andy
>
> >
> > On Thu, 26 Mar 2020 at 17.32, Andy Seaborne  wrote:
> >
> >> After 3.14.0, use of "IRI" got wrapped up to limit the places it is used
> >> directly.
> >>
> >> Why 3.13.1 and not 3.14.0?
> >> Or 3.15.0-SNAPSHOT because of JENA-1838.
> >>
> >>
> >> On 26/03/2020 13:36, Martynas Jusevičius wrote:
> >>> Hi,
> >>>
> >>> I'm working on a long overdue upgrade of Jena.
> >>>
> >>> So far the area where I can see most changes will be needed is the
> >>> implementation of ReaderRIOT streaming parser for RDF/POST:
> >>>
> >> https://github.com/AtomGraph/Core/blob/master/src/main/java/com/atomgraph/core/riot/lang/RDFPostReader.java
> >>>
> >>> Is LangEngine the recommended base class for such parsers these days?
> >>>
> >> https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/riot/lang/LangEngine.java
> >>
> >> Or LangBase
> >>
> >>>
> >>> Currently it's extending ReaderRIOTBase.
> >>
> >> ReaderRIOT is API used by RDFParser.
> >> You can implement that (and the companion factory) directly if you want.
> >>
> >> LangBase etc are implementation helpers.
> >>
> >>> Also can't figure out what to replace
> >>> ParserProfile.getPrologue().setBaseURI() calls with. I can see the
> >>> latest LangTurtleBase uses ParserProfile.setBaseIRI(), but I can't
> >>> find such method in 3.13.1.
> >>
> >> 3.15.0-dev:
> >> ParserProfile::setBaseIRI(String)
> >>
> >>>
> >>> Thanks.
> >>>
> >>> Martynas
> >>>
> >>
> >


Re: Upgrading 3.0.1 to 3.13.1: ReaderRIOT

2020-03-26 Thread Andy Seaborne




On 26/03/2020 18:57, Martynas Jusevičius wrote:

Need to match SPIN RDF API which is on 3.13.1...
https://github.com/spinrdf/spinrdf/blob/master/pom.xml#L73


spinrdf does not have any binary artifacts and is built locally.

Changing the version before building would be possible.

Andy



On Thu, 26 Mar 2020 at 17.32, Andy Seaborne  wrote:


After 3.14.0, use of "IRI" got wrapped up to limit the places it is used
directly.

Why 3.13.1 and not 3.14.0?
Or 3.15.0-SNAPSHOT because of JENA-1838.


On 26/03/2020 13:36, Martynas Jusevičius wrote:

Hi,

I'm working on a long overdue upgrade of Jena.

So far the area where I can see most changes will be needed is the
implementation of ReaderRIOT streaming parser for RDF/POST:


https://github.com/AtomGraph/Core/blob/master/src/main/java/com/atomgraph/core/riot/lang/RDFPostReader.java


Is LangEngine the recommended base class for such parsers these days?


https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/riot/lang/LangEngine.java

Or LangBase



Currently it's extending ReaderRIOTBase.


ReaderRIOT is API used by RDFParser.
You can implement that (and the companion factory) directly if you want.

LangBase etc are implementation helpers.


Also can't figure out what to replace
ParserProfile.getPrologue().setBaseURI() calls with. I can see the
latest LangTurtleBase uses ParserProfile.setBaseIRI(), but I can't
find such method in 3.13.1.


3.15.0-dev:
ParserProfile::setBaseIRI(String)



Thanks.

Martynas







Re: Upgrading 3.0.1 to 3.13.1: ReaderRIOT

2020-03-26 Thread Martynas Jusevičius
Need to match SPIN RDF API which is on 3.13.1...
https://github.com/spinrdf/spinrdf/blob/master/pom.xml#L73

On Thu, 26 Mar 2020 at 17.32, Andy Seaborne  wrote:

> After 3.14.0, use of "IRI" got wrapped up to limit the places it is used
> directly.
>
> Why 3.13.1 and not 3.14.0?
> Or 3.15.0-SNAPSHOT because of JENA-1838.
>
>
> On 26/03/2020 13:36, Martynas Jusevičius wrote:
> > Hi,
> >
> > I'm working on a long overdue upgrade of Jena.
> >
> > So far the area where I can see most changes will be needed is the
> > implementation of ReaderRIOT streaming parser for RDF/POST:
> >
> https://github.com/AtomGraph/Core/blob/master/src/main/java/com/atomgraph/core/riot/lang/RDFPostReader.java
> >
> > Is LangEngine the recommended base class for such parsers these days?
> >
> https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/riot/lang/LangEngine.java
>
> Or LangBase
>
> >
> > Currently it's extending ReaderRIOTBase.
>
> ReaderRIOT is API used by RDFParser.
> You can implement that (and the companion factory) directly if you want.
>
> LangBase etc are implementation helpers.
>
> > Also can't figure out what to replace
> > ParserProfile.getPrologue().setBaseURI() calls with. I can see the
> > latest LangTurtleBase uses ParserProfile.setBaseIRI(), but I can't
> > find such method in 3.13.1.
>
> 3.15.0-dev:
> ParserProfile::setBaseIRI(String)
>
> >
> > Thanks.
> >
> > Martynas
> >
>


Re: Apache Jena Fuseki with text indexing

2020-03-26 Thread Zhenya Antić
Andy,

I think I figured out what the issue is. It seems that I have two datasets with 
the same name, and one was started with the config file I sent (and has no data 
in it - and hence it is not indexed), and the other was started without a 
config file (like this: fuseki-server --port 3030 --loc="db" /biology), and it 
has the data.

How do I transfer the data from one to other?

Thanks,
Zhenya


On Thu, Mar 26, 2020, at 12:22 PM, Chris Tomlinson wrote:
> Zhenya,
> 
> Do you see any content in the directory:
> 
> > text:directory  ;
> 
> like the following partial listing:
> 
> > fuseki@foo :~/base/lucene-test$ ls -l
> > total 3608108
> > -rw-rw 1 fuseki fuseki 7772 Jan 29 21:15 _19a_5x.liv
> > -rw-r- 1 fuseki fuseki 299 Jan 21 15:53 _19a.cfe
> > -rw-r- 1 fuseki fuseki 36547721 Jan 21 15:53 _19a.cfs
> > -rw-r- 1 fuseki fuseki 443 Jan 21 15:53 _19a.si
> > -rw-r- 1 fuseki fuseki 23621 Jan 21 15:53 _24_17n.liv
> > -rw-r- 1 fuseki fuseki 22718569 Jan 21 15:53 _24.fdt
> > -rw-r- 1 fuseki fuseki 9184 Jan 21 15:53 _24.fdx
> > -rw-r- 1 fuseki fuseki 12975 Jan 21 15:53 _24.fnm
> > -rw-r- 1 fuseki fuseki 7009762 Jan 21 15:53 _24_Lucene50_0.doc
> > -rw-r- 1 fuseki fuseki 3804794 Jan 21 15:53 _24_Lucene50_0.pos
> > -rw-r- 1 fuseki fuseki 16186474 Jan 21 15:53 _24_Lucene50_0.tim
> > -rw-r- 1 fuseki fuseki 103945 Jan 21 15:53 _24_Lucene50_0.tip
> > -rw-r- 1 fuseki fuseki 667296 Jan 21 15:53 _24.nvd
> > -rw-r- 1 fuseki fuseki 4027 Jan 21 15:53 _24.nvm
> > -rw-r- 1 fuseki fuseki 540 Jan 21 15:53 _24.si
> 
> Also if you don’t have storevalues true then queries like:
> 
>  (?s ?score ?lit) text:query “ribosome”
> 
> won’t bind anything to ?lit. The storevalues is set like:
> 
> > # Text index description
> > :test_lucene_index a text:TextIndexLucene ;
> > text:directory  ;
> > text:storeValues true ;
> > text:entityMap :test_entmap ;
> 
> 
> Also you need to reload the data if you change the configuration so that the 
> indexing will be done according to the configuration.
> 
> ciao,
> Chris
> 
> 
> > On Mar 26, 2020, at 10:33 AM, Zhenya Antić  wrote:
> > 
> > @prefix :  .
> > @prefix tdb2:  .
> > @prefix rdf:  .
> > @prefix ja:  .
> > @prefix rdfs:  .
> > @prefix fuseki:  .
> > @prefix text:  .
> > 
> > 
> > rdfs:subClassOf ja:RDFDataset .
> > 
> > ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset .
> > 
> > tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .
> > 
> > tdb2:GraphTDB2 rdfs:subClassOf ja:Model .
> > 
> > 
> > rdfs:subClassOf ja:Model .
> > 
> > ja:MemoryDataset rdfs:subClassOf ja:RDFDataset .
> > 
> > ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset .
> > 
> > 
> > rdfs:subClassOf ja:RDFDataset .
> > 
> > :service_tdb_all a fuseki:Service ;
> > rdfs:label "TDB biology" ;
> > fuseki:dataset :tdb_dataset_readwrite ;
> > fuseki:name "biology" ;
> > fuseki:serviceQuery "query" , "" , "sparql" ;
> > fuseki:serviceReadGraphStore "get" ;
> > fuseki:serviceReadQuads "" ;
> > fuseki:serviceReadWriteGraphStore
> > "data" ;
> > fuseki:serviceReadWriteQuads "" ;
> > fuseki:serviceUpdate "" , "update" ;
> > fuseki:serviceUpload "upload" .
> > 
> > :tdb_dataset_readwrite
> > a tdb2:DatasetTDB2 ;
> > tdb2:location "db" .
> > 
> > 
> > rdfs:subClassOf ja:Model .
> > 
> > ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset .
> > 
> > ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset .
> > 
> > 
> > rdfs:subClassOf ja:RDFDataset .
> > 
> > <#dataset> rdf:type tdb2:DatasetTDB2 ;
> > tdb2:location "db" ; #path to TDB;
> > .
> > 
> > # Text index description
> > :text_dataset rdf:type text:TextDataset ;
> > text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired URI
> > text:index <#indexLucene> ;
> > .
> > 
> > <#indexLucene> a text:TextIndexLucene ;
> > text:directory  ;
> > text:entityMap <#entMap> ;
> > .
> > 
> > <#entMap> a text:EntityMap ;
> > text:defaultField "text" ;
> > text:entityField "uri" ;
> > text:map (
> > #RDF label abstracts
> > [ text:field "text" ;
> > text:predicate  ;
> > text:analyzer [
> > a text:StandardAnalyzer
> > ] 
> > ]
> > [ text:field "text" ;
> > text:predicate  ;
> > text:analyzer [
> > a text:StandardAnalyzer
> > ] 
> > ]
> > ) .
> > 
> > 
> > 
> > <#service_text_tdb> rdf:type fuseki:Service ;
> > fuseki:name "ds" ;
> > fuseki:serviceQuery "query" ;
> > fuseki:serviceQuery "sparql" ;
> > fuseki:serviceUpdate "update" ;
> > fuseki:serviceUpload "upload" ;
> > 

Re: Upgrading 3.0.1 to 3.13.1: ReaderRIOT

2020-03-26 Thread Andy Seaborne
After 3.14.0, use of "IRI" got wrapped up to limit the places it is used 
directly.


Why 3.13.1 and not 3.14.0?
Or 3.15.0-SNAPSHOT because of JENA-1838.


On 26/03/2020 13:36, Martynas Jusevičius wrote:

Hi,

I'm working on a long overdue upgrade of Jena.

So far the area where I can see most changes will be needed is the
implementation of ReaderRIOT streaming parser for RDF/POST:
https://github.com/AtomGraph/Core/blob/master/src/main/java/com/atomgraph/core/riot/lang/RDFPostReader.java

Is LangEngine the recommended base class for such parsers these days?
https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/riot/lang/LangEngine.java


Or LangBase



Currently it's extending ReaderRIOTBase.


ReaderRIOT is API used by RDFParser.
You can implement that (and the companion factory) directly if you want.

LangBase etc are implementation helpers.


Also can't figure out what to replace
ParserProfile.getPrologue().setBaseURI() calls with. I can see the
latest LangTurtleBase uses ParserProfile.setBaseIRI(), but I can't
find such method in 3.13.1.


3.15.0-dev:
ParserProfile::setBaseIRI(String)



Thanks.

Martynas



Re: Apache Jena Fuseki with text indexing

2020-03-26 Thread Chris Tomlinson
Zhenya,

Do you see any content in the directory:

> text:directory  ;

like the following partial listing:

> fuseki@foo :~/base/lucene-test$ ls -l
> total 3608108
> -rw-rw 1 fuseki fuseki   7772 Jan 29 21:15 _19a_5x.liv
> -rw-r- 1 fuseki fuseki299 Jan 21 15:53 _19a.cfe
> -rw-r- 1 fuseki fuseki   36547721 Jan 21 15:53 _19a.cfs
> -rw-r- 1 fuseki fuseki443 Jan 21 15:53 _19a.si
> -rw-r- 1 fuseki fuseki  23621 Jan 21 15:53 _24_17n.liv
> -rw-r- 1 fuseki fuseki   22718569 Jan 21 15:53 _24.fdt
> -rw-r- 1 fuseki fuseki   9184 Jan 21 15:53 _24.fdx
> -rw-r- 1 fuseki fuseki  12975 Jan 21 15:53 _24.fnm
> -rw-r- 1 fuseki fuseki7009762 Jan 21 15:53 _24_Lucene50_0.doc
> -rw-r- 1 fuseki fuseki3804794 Jan 21 15:53 _24_Lucene50_0.pos
> -rw-r- 1 fuseki fuseki   16186474 Jan 21 15:53 _24_Lucene50_0.tim
> -rw-r- 1 fuseki fuseki 103945 Jan 21 15:53 _24_Lucene50_0.tip
> -rw-r- 1 fuseki fuseki 667296 Jan 21 15:53 _24.nvd
> -rw-r- 1 fuseki fuseki   4027 Jan 21 15:53 _24.nvm
> -rw-r- 1 fuseki fuseki540 Jan 21 15:53 _24.si

Also if you don’t have storevalues true then queries like:

(?s ?score ?lit) text:query “ribosome”

won’t bind anything to ?lit. The storevalues is set like:

> # Text index description
> :test_lucene_index a text:TextIndexLucene ;
> text:directory  ;
> text:storeValues true ;
> text:entityMap :test_entmap ;


Also you need to reload the data if you change the configuration so that the 
indexing will be done according to the configuration.

ciao,
Chris


> On Mar 26, 2020, at 10:33 AM, Zhenya Antić  wrote:
> 
> @prefix :  .
> @prefix tdb2:  .
> @prefix rdf:  .
> @prefix ja:  .
> @prefix rdfs:  .
> @prefix fuseki:  .
> @prefix text:  .
> 
> 
> rdfs:subClassOf ja:RDFDataset .
> 
> ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset .
> 
> tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .
> 
> tdb2:GraphTDB2 rdfs:subClassOf ja:Model .
> 
> 
> rdfs:subClassOf ja:Model .
> 
> ja:MemoryDataset rdfs:subClassOf ja:RDFDataset .
> 
> ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset .
> 
> 
> rdfs:subClassOf ja:RDFDataset .
> 
> :service_tdb_all a fuseki:Service ;
> rdfs:label "TDB biology" ;
> fuseki:dataset :tdb_dataset_readwrite ;
> fuseki:name "biology" ;
> fuseki:serviceQuery "query" , "" , "sparql" ;
> fuseki:serviceReadGraphStore "get" ;
> fuseki:serviceReadQuads "" ;
> fuseki:serviceReadWriteGraphStore
> "data" ;
> fuseki:serviceReadWriteQuads "" ;
> fuseki:serviceUpdate "" , "update" ;
> fuseki:serviceUpload "upload" .
> 
> :tdb_dataset_readwrite
> a tdb2:DatasetTDB2 ;
> tdb2:location "db" .
> 
> 
> rdfs:subClassOf ja:Model .
> 
> ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset .
> 
> ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset .
> 
> 
> rdfs:subClassOf ja:RDFDataset .
> 
> <#dataset> rdf:type tdb2:DatasetTDB2 ;
> tdb2:location "db" ; #path to TDB;
> .
> 
> # Text index description
> :text_dataset rdf:type text:TextDataset ;
> text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired URI
> text:index <#indexLucene> ;
> .
> 
> <#indexLucene> a text:TextIndexLucene ;
> text:directory  ;
> text:entityMap <#entMap> ;
> .
> 
> <#entMap> a text:EntityMap ;
> text:defaultField "text" ;
> text:entityField "uri" ;
> text:map (
> #RDF label abstracts
> [ text:field "text" ;
> text:predicate  ;
> text:analyzer [
> a text:StandardAnalyzer
> ] 
> ]
> [ text:field "text" ;
> text:predicate  ;
> text:analyzer [
> a text:StandardAnalyzer
> ] 
> ]
> ) .
> 
> 
> 
> <#service_text_tdb> rdf:type fuseki:Service ;
> fuseki:name "ds" ;
> fuseki:serviceQuery "query" ;
> fuseki:serviceQuery "sparql" ;
> fuseki:serviceUpdate "update" ;
> fuseki:serviceUpload "upload" ;
> fuseki:serviceReadGraphStore "get" ;
> fuseki:serviceReadWriteGraphStore "data" ;
> fuseki:dataset :text_dataset ;
> .
> 
> 
> 
> On Thu, Mar 26, 2020, at 11:31 AM, Zhenya Antić wrote:
>> Hi Andy,
>> 
>> Thanks. So I think I have all the lines you listed in the .ttl file 
>> (attached). I also checked, the data file contains the relevant data. But I 
>> have 0 properties indexed.
>> 
>> Thanks,
>> Zhenya
>> 
>> 
>> 
>> On Wed, Mar 25, 2020, at 4:41 AM, Andy Seaborne wrote:
>>> 
>>> 
>>> On 24/03/2020 15:11, Zhenya Antić wrote:
 Hi Andy,
 
> Did you load the data before attaching the text index?
 
 How do I do it (or not do it, wasn't sure from your post)?
>>> 
>>> 

Re: Graceful shutdown of FusekiServer

2020-03-26 Thread Andy Seaborne
Because servers have to survive disgraceful shutdown anyway (crash, DOS, 
server switched off, casters on the rack broken [1]), there isn't a 
graceful procedure.  "kill -9"


TDB has to be able to recover in any circumstances so that it what it 
does. No graceful shutdown currently.


What resources of your custom dataset are there?

Andy

[1]
https://cloud.google.com/blog/products/management-tools/sre-keeps-digging-to-prevent-problems

On 26/03/2020 10:46, Nouwt, B. (Barry) wrote:

Hi everybody,

We are using embedded FusekiServer in our Java project and have extended Jena's 
DatasetImpl class with additional features that require some cleanup when the 
FusekiServer shuts down. This dataset is loaded into FusekiServer using a 
config.ttl file, so we do not have a direct reference to this custom Dataset 
object in our application.

To release these resources of our custom dataset, we implemented the Dataset.close() function, but it is not being called when we shutdown the FusekiServer using FusekiServer.stop(). 


Fuseki works in DatasetGraphs (DSGs).  Dataset (usually) a wrapper to 
give a different API.


As SPARQL goes down to the DSG dataset does do much for SPARQL.



I did find that the FusekiServer calls the DatasetGraph.close() function 
instead, which is encapsulated by our KnowledgeDataset.


At what point does it call DatasetGraph.close()?



What is the best way to release the resources of our custom dataset 
implementation? For now, I've added a shutdownhook to the JVM, but in my 
opinion this is not very elegant.


Agreed. Even if there isn't an elegant way now (there maybe, there may 
not), let's at least understand what it would take to add.




Thanks in advance!

Regards, Barry


Re: Apache Jena Fuseki with text indexing

2020-03-26 Thread Zhenya Antić
@prefix :  .
@prefix tdb2:  .
@prefix rdf:  .
@prefix ja:  .
@prefix rdfs:  .
@prefix fuseki:  .
@prefix text:  .


 rdfs:subClassOf ja:RDFDataset .

ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset .

tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .

tdb2:GraphTDB2 rdfs:subClassOf ja:Model .


 rdfs:subClassOf ja:Model .

ja:MemoryDataset rdfs:subClassOf ja:RDFDataset .

ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset .


 rdfs:subClassOf ja:RDFDataset .

:service_tdb_all a fuseki:Service ;
 rdfs:label "TDB biology" ;
 fuseki:dataset :tdb_dataset_readwrite ;
 fuseki:name "biology" ;
 fuseki:serviceQuery "query" , "" , "sparql" ;
 fuseki:serviceReadGraphStore "get" ;
 fuseki:serviceReadQuads "" ;
 fuseki:serviceReadWriteGraphStore
 "data" ;
 fuseki:serviceReadWriteQuads "" ;
 fuseki:serviceUpdate "" , "update" ;
 fuseki:serviceUpload "upload" .

:tdb_dataset_readwrite
 a tdb2:DatasetTDB2 ;
 tdb2:location "db" .


 rdfs:subClassOf ja:Model .

ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset .

ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset .


 rdfs:subClassOf ja:RDFDataset .

<#dataset> rdf:type tdb2:DatasetTDB2 ;
tdb2:location "db" ; #path to TDB;
.

# Text index description
:text_dataset rdf:type text:TextDataset ;
 text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired URI
 text:index <#indexLucene> ;
.

<#indexLucene> a text:TextIndexLucene ;
 text:directory  ;
 text:entityMap <#entMap> ;
 .

<#entMap> a text:EntityMap ;
 text:defaultField "text" ;
 text:entityField "uri" ;
 text:map (
 #RDF label abstracts
 [ text:field "text" ;
 text:predicate  ;
 text:analyzer [
 a text:StandardAnalyzer
 ] 
 ]
 [ text:field "text" ;
 text:predicate  ;
 text:analyzer [
 a text:StandardAnalyzer
 ] 
 ]
 ) .



<#service_text_tdb> rdf:type fuseki:Service ;
 fuseki:name "ds" ;
 fuseki:serviceQuery "query" ;
 fuseki:serviceQuery "sparql" ;
 fuseki:serviceUpdate "update" ;
 fuseki:serviceUpload "upload" ;
 fuseki:serviceReadGraphStore "get" ;
 fuseki:serviceReadWriteGraphStore "data" ;
 fuseki:dataset :text_dataset ;
 .



On Thu, Mar 26, 2020, at 11:31 AM, Zhenya Antić wrote:
> Hi Andy,
> 
> Thanks. So I think I have all the lines you listed in the .ttl file 
> (attached). I also checked, the data file contains the relevant data. But I 
> have 0 properties indexed.
> 
> Thanks,
> Zhenya
> 
> 
> 
> On Wed, Mar 25, 2020, at 4:41 AM, Andy Seaborne wrote:
>> 
>> 
>> On 24/03/2020 15:11, Zhenya Antić wrote:
>> > Hi Andy,
>> > 
>> >> Did you load the data before attaching the text index?
>> > 
>> > How do I do it (or not do it, wasn't sure from your post)?
>> 
>> Set up the Fueski system, with the text index as the Fuskei service dataset:
>> 
>>  fuseki:name "biology" ;
>>  fuseki:dataset :text_dataset ;
>> ...
>> 
>> :text_dataset rdf:type text:TextDataset ;
>>  text:dataset <#dataset> ;
>> 
>> 
>> 
>> <#dataset> rdf:type tdb2:DatasetTDB2 ;
>> tdb2:location "db" ; #path to TDB;
>> .
>> 
>> then send the data to /biology/data (which is the SPARQl GSP write 
>> endpoint) or however you want to push the data to the server (SPARQL 
>> Update, or the UI.
>> 
>> For very large data:
>> 
>> Load the TDB2 dataset offline
>> Then run the "jena.textindexer" utility
>> 
>> https://jena.apache.org/documentation/query/text-query.html#configuration
>> 
>> The first way is easier.
>> 
>>  Andy
>> 
>> > 
>> > Thanks,
>> > Zhenya
>> > 
>> > 
>> > 
>> > On Sun, Mar 22, 2020, at 9:18 AM, Andy Seaborne wrote:
>> >> Just checking one point:
>> >>
>> >> Did you load the data before attaching the text index?
>> >>
>> >> The text index is calculated as data is added so if you first load the
>> >> dataset then setup a text index, it will miss indexing the data.
>> >>
>> >> Andy
>> >>
>> >> On 21/03/2020 07:55, Lorenz Buehmann wrote:
>> >>> Hi,
>> >>>
>> >>> welcome to Semantic Web and Apache Jena.
>> >>>
>> >>> Comments inline:
>> >>>
>> >>> On 20.03.20 15:36, Zhenya Antić wrote:
>>  Hello,
>> 
>>  I am a beginner with Fuseki, knowledge graphs and SPARQL, so please 
>>  forgive me if the questions seem obvious, the learning curve for this 
>>  turned out to be quite steep.
>> >>> No problem, nothing is simple in the beginning,
>> 
>>  I am trying to get text indexing to work with my Fuseki knowledge graph.
>> >>> Which DBpedia dataset did you load? I mean, which files?
>> 
>>  For starters, I tried using a regular expression, but that didn't work:
>> 
>> 

Re: TDB blocked

2020-03-26 Thread Andy Seaborne

Fork - what sort of fork?  Java's Fork/join  or OS process fork?

Doesn't forking (no exec) a JVM bypass all JVM related initialization, 
e.g. classes.  That will break a lot of things, TDB included. I don't 
know if OS-fork clones everything anyway (e.g. threads and that means 
ThreadLocal might be broken). [The scala doc is a little parsimonius]


---

There is no need to have one shared dataset/connection object anyway.

TDBFactory.create* is cheap after the dataset is first used in the JVM 
process. It's little more than a cache lookup.


A test can be

   TDBFactory.create*
   

and be fast enough i.e push the dataset connect/create into the test or 
test suite.


Bonus - create an in-memory TDB dataset so it leaves no on disk foot 
print.  Or the equivalent of a JUnit Temp folder but the create 
in-memeory has lower start-up costs.


Andy

In-memory TDB dataset is not designed for production use - it is very 
careful to be true TDB, just without peristent storage.  It runs its own 
copy-in/copy-out block cache so a lot of data copying to get proper 
isolation of storage and usage.


Use DatasetFactory.createTxnMem() for a production in-memory database.

On 26/03/2020 08:24, Jean-Marc Vanel wrote:

Le mer. 25 mars 2020 à 17:17, Andy Seaborne  a écrit :



On 25/03/2020 14:36, Jean-Marc Vanel wrote:

Le mer. 25 mars 2020 à 13:07, Andy Seaborne  a écrit :






...





What's the full stack trace?
(What's the blocking thread?)

I was looking for the other thread that is blocking this one at
being(write).  Same JVM.



See details below. Tell me if you still want it.



Yes, another unended W transaction in the same process might be the

cause.



I thought rather, another unended W transaction in *another* process,

since

the tests execution was parallel.


If I understand correctly,

parallelExecution in Test := false

means parallel in the same JVM.



Yes indeed, that's for taking advantage of multicore CPU's.
The parameter that actually created the freeze problem in the original post
was

fork = true

and it clear that calling TDBFactory.createDataset() and close() , and
other calls, randomly from different JVM's on the same TDB directory is
bound to fail.

On the other hand, the combination

fork := false
parallelExecution in Test := true

is not good for this kind of tests. My understanding is that TDB is made
for multi-thread access in reading and writing (and it works perfectly in
my Play Framework application for years), but not made for creating and
closing.
All tests pass, but one, with this stack (complete!):

[info] TimeSeriesTestJena:
[info] - notifyDataEvent + getTimeSeries
[info] deductions.runtime.semlogs.TimeSeriesTestJena *** ABORTED ***
[info]   java.util.*ConcurrentModificationException*: Reader = 0, Writer = 2
[info]   at
org.apache.jena.tdb.sys.DatasetControlMRSW.policyError(DatasetControlMRSW.java:147)
[info]   at
org.apache.jena.tdb.sys.DatasetControlMRSW.policyError(DatasetControlMRSW.java:143)
[info]   at
org.apache.jena.tdb.sys.DatasetControlMRSW.checkConcurrency(DatasetControlMRSW.java:75)
[info]   at
org.apache.jena.tdb.sys.DatasetControlMRSW.startUpdate(DatasetControlMRSW.java:57)
[info]   at
org.apache.jena.tdb.store.nodetupletable.NodeTupleTableConcrete.startWrite(NodeTupleTableConcrete.java:65)
[info]   at
org.apache.jena.tdb.store.nodetupletable.NodeTupleTableConcrete.sync(NodeTupleTableConcrete.java:251)
[info]   at org.apache.jena.tdb.store.TableBase.sync(TableBase.java:51)
[info]   at
org.apache.jena.tdb.store.DatasetGraphTDB.sync(DatasetGraphTDB.java:253)
[info]   at
org.apache.jena.tdb.transaction.DatasetGraphTransaction.close(DatasetGraphTransaction.java:296)
[info]   at
org.apache.jena.sparql.core.DatasetImpl.close(DatasetImpl.java:231)



  Andy




Test code:


https://github.com/jmvanel/semantic_forms/blob/master/scala/forms/src/test/scala/deductions/runtime/sparql_cache/TestRDFCache.scala

=
Thread [pool-1-thread-1-ScalaTest-running-TestRDFCache] (Suspended)
Unsafe.park(boolean, long) line: not available [native method]
LockSupport.park(Object) line: 175
Semaphore$FairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt()
line: 836


Semaphore$FairSync(AbstractQueuedSynchronizer).doAcquireSharedInterruptibly(int)

line: 997


Semaphore$FairSync(AbstractQueuedSynchronizer).acquireSharedInterruptibly(int)

line: 1304
Semaphore.acquire() line: 312
TransactionManager.acquireWriterLock(boolean) line: 616
TransactionManager.beginInternal(TxnType, TxnType, String) line: 358
TransactionManager.begin(TxnType, String) line: 343
StoreConnection.begin(TxnType, String) line: 128
StoreConnection.begin(TxnType) line: 108
DatasetGraphTransaction.begin(TxnType) line: 169
DatasetGraphTransaction.begin(ReadWrite) line: 162
DatasetImpl.begin(ReadWrite) line: 122
JenaDatasetStore.$anonfun$rw$1(Dataset, Function0) line: 23
52027729.apply() line: not available
Try$.apply(Function0) line: 213
JenaDatasetStore.rw(Dataset, Function0) line: 22

Re: TDB blocked

2020-03-26 Thread Jean-Marc Vanel
Le mer. 25 mars 2020 à 17:17, Andy Seaborne  a écrit :

>
> On 25/03/2020 14:36, Jean-Marc Vanel wrote:
> > Le mer. 25 mars 2020 à 13:07, Andy Seaborne  a écrit :
>


> ...
>


> >> What's the full stack trace?
> >> (What's the blocking thread?)
> I was looking for the other thread that is blocking this one at
> being(write).  Same JVM.
>

See details below. Tell me if you still want it.


> >> Yes, another unended W transaction in the same process might be the
> cause.
> >>
> > I thought rather, another unended W transaction in *another* process,
> since
> > the tests execution was parallel.
>
> If I understand correctly,
>
> parallelExecution in Test := false
>
> means parallel in the same JVM.
>

Yes indeed, that's for taking advantage of multicore CPU's.
The parameter that actually created the freeze problem in the original post
was

fork = true

and it clear that calling TDBFactory.createDataset() and close() , and
other calls, randomly from different JVM's on the same TDB directory is
bound to fail.

On the other hand, the combination

fork := false
parallelExecution in Test := true

is not good for this kind of tests. My understanding is that TDB is made
for multi-thread access in reading and writing (and it works perfectly in
my Play Framework application for years), but not made for creating and
closing.
All tests pass, but one, with this stack (complete!):

[info] TimeSeriesTestJena:
[info] - notifyDataEvent + getTimeSeries
[info] deductions.runtime.semlogs.TimeSeriesTestJena *** ABORTED ***
[info]   java.util.*ConcurrentModificationException*: Reader = 0, Writer = 2
[info]   at
org.apache.jena.tdb.sys.DatasetControlMRSW.policyError(DatasetControlMRSW.java:147)
[info]   at
org.apache.jena.tdb.sys.DatasetControlMRSW.policyError(DatasetControlMRSW.java:143)
[info]   at
org.apache.jena.tdb.sys.DatasetControlMRSW.checkConcurrency(DatasetControlMRSW.java:75)
[info]   at
org.apache.jena.tdb.sys.DatasetControlMRSW.startUpdate(DatasetControlMRSW.java:57)
[info]   at
org.apache.jena.tdb.store.nodetupletable.NodeTupleTableConcrete.startWrite(NodeTupleTableConcrete.java:65)
[info]   at
org.apache.jena.tdb.store.nodetupletable.NodeTupleTableConcrete.sync(NodeTupleTableConcrete.java:251)
[info]   at org.apache.jena.tdb.store.TableBase.sync(TableBase.java:51)
[info]   at
org.apache.jena.tdb.store.DatasetGraphTDB.sync(DatasetGraphTDB.java:253)
[info]   at
org.apache.jena.tdb.transaction.DatasetGraphTransaction.close(DatasetGraphTransaction.java:296)
[info]   at
org.apache.jena.sparql.core.DatasetImpl.close(DatasetImpl.java:231)


>  Andy
>
>
> >
> > Test code:
> >
> https://github.com/jmvanel/semantic_forms/blob/master/scala/forms/src/test/scala/deductions/runtime/sparql_cache/TestRDFCache.scala
> > =
> > Thread [pool-1-thread-1-ScalaTest-running-TestRDFCache] (Suspended)
> > Unsafe.park(boolean, long) line: not available [native method]
> > LockSupport.park(Object) line: 175
> > Semaphore$FairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt()
> > line: 836
> >
> Semaphore$FairSync(AbstractQueuedSynchronizer).doAcquireSharedInterruptibly(int)
> > line: 997
> >
> Semaphore$FairSync(AbstractQueuedSynchronizer).acquireSharedInterruptibly(int)
> > line: 1304
> > Semaphore.acquire() line: 312
> > TransactionManager.acquireWriterLock(boolean) line: 616
> > TransactionManager.beginInternal(TxnType, TxnType, String) line: 358
> > TransactionManager.begin(TxnType, String) line: 343
> > StoreConnection.begin(TxnType, String) line: 128
> > StoreConnection.begin(TxnType) line: 108
> > DatasetGraphTransaction.begin(TxnType) line: 169
> > DatasetGraphTransaction.begin(ReadWrite) line: 162
> > DatasetImpl.begin(ReadWrite) line: 122
> > JenaDatasetStore.$anonfun$rw$1(Dataset, Function0) line: 23
> > 52027729.apply() line: not available
> > Try$.apply(Function0) line: 213
> > JenaDatasetStore.rw(Dataset, Function0) line: 22
> > JenaDatasetStore.rw(Object, Function0) line: 10
> > TestRDFCache(RDFCacheAlgo).readStoreURI(Object, Object,
> > DATASET) line: 368
> > RDFCacheAlgo.readStoreURI$(RDFCacheAlgo, Object, Object,
> > Object) line: 366
> > TestRDFCache.readStoreURI(Object, Object, Object) line: 10
> > TestRDFCache(RDFCacheAlgo).readStoreURIinOwnGraph(Object)
> > line: 334
> > TestRDFCache(RDFCacheAlgo).readStoreUriInNamedGraph(Object)
> > line: 326
> > RDFCacheAlgo.readStoreUriInNamedGraph$(RDFCacheAlgo, Object)
> > line: 325
> > TestRDFCache.readStoreUriInNamedGraph(Object) line: 10
> > TestRDFCache.$anonfun$new$4(TestRDFCache) line: 63
> > 1708361043.apply$mcV$sp() line: not available
> > 1708361043(JFunction0$mcV$sp).apply() line: 23
> > OutcomeOf$(OutcomeOf).outcomeOf(Function0) line: 85
> > OutcomeOf.outcomeOf$(OutcomeOf, Function0) line: 83
> > OutcomeOf$.outcomeOf(Function0) line: 104
> > Transformer.apply() line: 22
> > Transformer.apply() line: 20
> > AnyFunSuiteLike$$anon$1.apply() line: 189
> > TestRDFCache(TestSuite).withFixture(TestSuite$NoArgTest) line: 196
> > TestSuite.withFixture$(TestSuite,