RIOTous questions

2020-07-22 Thread Chris Tomlinson
Hello,

Well the first question is really a trig question. When is the new style BASE 
versus the old style @base to be preferred in trig files?

The RIOT question is in Jena 3.16.0 and earlier. I try:

riot --count C1.trig

and get the result:

C1.trig : Quads = 43

which is correct. But when I try:

riot --out=Turtle C1.trig > tmp.ttl

all I get is the prefix map:

> @base   .
> @prefix :   .
> @prefix adm:    .
> @prefix bda:    .
> @prefix bdg:    .
> @prefix bdr:    .
> @prefix owl:    .
> @prefix rdf:    .
> @prefix rdfs:   .
> @prefix skos:   .
> @prefix vcard:  .
> @prefix xsd:    .

The result I get is not dependent on the particular trig file. I see the same 
regardless as long as it is valid trig.

The result doesn’t appear to depend on whether I use riot or trig commands, 
with —out=Turtle or —out=TTL

The trig file, with the above prefixes, is:

> bdg:C1 {
> bda:C1  a   adm:AdminData ;
> :isRoot true ;
> adm:adminAbout  bdr:C1 ;
> adm:facetIndex  10 ;
> adm:gitPath "1a/C1.trig" ;
> adm:gitRepo bda:GR0001 ;
> adm:graphId bdg:C1 ;
> adm:logEntrybda:LG3EE3DD277F0A15A6 , bda:LGE00E23B3D66F3F4C ;
> adm:metadataLegal   bda:LD_BDRC_CC0 ;
> adm:status  bda:StatusReleased .
> 
> bda:LG3EE3DD277F0A15A6
> a   adm:LogEntry ;
> adm:logDate "2016-04-07T14:21:14.309Z"^^xsd:dateTime ;
> adm:logWho  bdr:U4 .
> 
> bda:LGE00E23B3D66F3F4C
> a   adm:LogEntry ;
> adm:logDate "2010-03-18T14:58:49.484Z"^^xsd:dateTime ;
> adm:logWho  bdr:U00019 .
> 
> bdr:C1  a   :Corporation ;
> rdfs:comment"yab gzhis family" ;
> skos:prefLabel  "glang mdun (yab gzhis)"@bo-x-ewts ;
> :corporationHasMember  bdr:CM2B0540E276FB5DC4 , 
> bdr:CM8166D8F547C8220F , bdr:CMAD95D8DC86690FCB , bdr:CMB71EF278AA02337E , 
> bdr:CMDCD0949DAFDA27BD , bdr:CME4512919AE9C9D8E ;
> :corporationRegion  bdr:G1390 ;
> :isRoot true ;
> :note   bdr:NT7DD9D48BDB5A3582 .
> 
> bdr:CM2B0540E276FB5DC4
> a   :CorporationMemberUnknown ;
> :corporationMember  bdr:P7067 .
> 
> bdr:CM8166D8F547C8220F
> a   :CorporationMemberBlood ;
> :corporationMember  bdr:P6682 .
> 
> bdr:CMAD95D8DC86690FCB
> a   :CorporationMemberBlood ;
> :corporationMember  bdr:P197 .
> 
> bdr:CMB71EF278AA02337E
> a   :CorporationMemberBlood ;
> :corporationMember  bdr:P6681 .
> 
> bdr:CMDCD0949DAFDA27BD
> a   :CorporationMemberNotSpecified ;
> :corporationMember  bdr:P2LS149 .
> 
> bdr:CME4512919AE9C9D8E
> a   :CorporationMemberBlood ;
> :corporationMember  bdr:P6680 .
> 
> bdr:NT7DD9D48BDB5A3582
> a   :Note ;
> :noteText   "This is the family of the 13th Dalai Lama.  
> ..."@en .
> }


I’m sure I’m doing something dum b here. I’m prompted to ask since the little 
tool I’ve been using (rdf2rdf-1.0.1-2.3.1.jar and rdf2rdf-1.0.2-2.3.1.jar) 
doesn’t like new style BASE.

Thanks,
Chris




Re: misleading parse exception message in Shacl.

2020-07-17 Thread Chris Tomlinson
Hi Andy,

I haven’t looked into SHACLC. We do use features such as sh:group, sh:order, 
dash:editor and so on; as well as a few annotations of our own that are 
relevant to editing and some validation controls. Off-hand it isn’t clear how 
to use SHACLC and weave these other features in.

The notation is nicely compact and if there’s an integration approach for 
additional features I’ll look deeper.

Thanks,
Chris


> On Jul 17, 2020, at 5:37 AM, Andy Seaborne  wrote:
> 
> Be interested to hear experiences using SHACL Compact Syntax.
> 
> It's Lang.SHACLC
> 
> SHACL-CS doesn't cover all SHACL but what it does cover is easier to read and 
> write.
> 
> 1/ What is missing from SHACL-CS from your perspective?
> 2/ Is it, in fact, actually helpful for managing SHACL at scale or not?
> 
>Andy
> 
> https://w3c.github.io/shacl/shacl-compact-syntax/
> 
> 
> On 16/07/2020 22:31, Chris Tomlinson wrote:
>> Andy,
>> That's great news! Updating to 3.16.0 is on the ToDo list. I'm moving it to 
>> the top.
>> Thanks very much,
>> Chris
>>> On Jul 16, 2020, at 16:20, Andy Seaborne  wrote:
>>> 
>>> Fixed in 3.16.0:
>>> 
>>> "shacl parse" gives:
>>> 
>>> No sh:path on a property shape: 
>>> node=<http://example/bdsContentLocationShape> sh:property 
>>> <http://example/bdsContentLocationShape-contentLocationStatement>
>>> 
>>> when there exists at least one triple with
>>> bds:ContentLocationShape-contentLocationStatement as subject
>>> 
>>> and
>>> 
>>> Missing property shape: node=<http://example/bdsContentLocationShape> 
>>> sh:property 
>>> <http://example/bdsContentLocationShape-contentLocationStatement>
>>> 
>>> if there are none:
>>> 
>>> 
>>> (and no stacktraces)
>>> 
>>> but what you show if 3.15.0.
>>> 
>>> Test RDF::
>>> 
>>> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
>>> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
>>> PREFIX sh:  <http://www.w3.org/ns/shacl#>
>>> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
>>> 
>>> PREFIX bds: <http://example/bds>
>>> PREFIX bdo: <http://example/bdo>
>>> 
>>> bds:ContentLocationShape
>>>  a sh:NodeShape ;
>>>  sh:property bds:ContentLocationShape-contentLocationStatement ;
>>>  sh:targetClass bdo:ContentLocation .
>>> 
>>> #bds:ContentLocationShape-contentLocationStatement rdf:type 
>>> sh:PropertyShape .
>>> 
>>> 
>>>> On 16/07/2020 21:44, Chris Tomlinson wrote:
>>>> Hi,
>>>> I’ve gotten a parse exception:
>>>> org.apache.jena.shacl.parser.ShaclParseException: No sh:path on a property 
>>>> shape: <http://purl.bdrc.io/ontology/shapes/core/ContentLocationShape>
>>>>at 
>>>> org.apache.jena.shacl.parser.ShapesParser.findPropertyShapes(ShapesParser.java:285)
>>>>at 
>>>> org.apache.jena.shacl.parser.ShapesParser.parseShape$(ShapesParser.java:214)
>>>>at 
>>>> org.apache.jena.shacl.parser.ShapesParser.parseShapeStep(ShapesParser.java:196)
>>>>at 
>>>> org.apache.jena.shacl.parser.ShapesParser.parseRootShape(ShapesParser.java:140)
>>>>at 
>>>> org.apache.jena.shacl.parser.ShapesParser.parseShapes(ShapesParser.java:84)
>>>>at org.apache.jena.shacl.Shapes.parse(Shapes.java:55)
>>>> performing:
>>>> Shapes shapes = Shapes.parse(testGraph);
>>>> on the graph:
>>>> bds:ContentLocationShape
>>>>   a sh:NodeShape ;
>>>>   sh:property bds:ContentLocationShape-contentLocationStatement ;
>>>>   sh:targetClass bdo:ContentLocation .
>>>> In the above graph there are no triples with
>>>> bds:ContentLocationShape-contentLocationStatement
>>>> as subject so the Shapes.parse raises an exception which seems reasonable; 
>>>> however, the message should refer to the missing definition of a putative 
>>>> PropertyShape reference rather than to the NodeShape that contains the 
>>>> reference.
>>>> In the simple case above it’s trivial by a casual inspection what the 
>>>> problem is, but when there are a large number of PropertyShape refs and 
>>>> all that the message says is that the NodeShape doesn’t have an sh:path, 
>>>> its pretty opaque as to what the problem is.
>>>> Maybe there’s a way to improve the exception message?
>>>> Thanks,
>>>> Chris



Re: misleading parse exception message in Shacl.

2020-07-16 Thread Chris Tomlinson
Andy,

That's great news! Updating to 3.16.0 is on the ToDo list. I'm moving it to the 
top.

Thanks very much,
Chris

> On Jul 16, 2020, at 16:20, Andy Seaborne  wrote:
> 
> Fixed in 3.16.0:
> 
> "shacl parse" gives:
> 
> No sh:path on a property shape: node=<http://example/bdsContentLocationShape> 
> sh:property <http://example/bdsContentLocationShape-contentLocationStatement>
> 
> when there exists at least one triple with
> bds:ContentLocationShape-contentLocationStatement as subject
> 
> and
> 
> Missing property shape: node=<http://example/bdsContentLocationShape> 
> sh:property <http://example/bdsContentLocationShape-contentLocationStatement>
> 
> if there are none:
> 
> 
> (and no stacktraces)
> 
> but what you show if 3.15.0.
> 
> Test RDF::
> 
> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
> PREFIX sh:  <http://www.w3.org/ns/shacl#>
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> 
> PREFIX bds: <http://example/bds>
> PREFIX bdo: <http://example/bdo>
> 
> bds:ContentLocationShape
>  a sh:NodeShape ;
>  sh:property bds:ContentLocationShape-contentLocationStatement ;
>  sh:targetClass bdo:ContentLocation .
> 
> #bds:ContentLocationShape-contentLocationStatement rdf:type sh:PropertyShape .
> 
> 
>> On 16/07/2020 21:44, Chris Tomlinson wrote:
>> Hi,
>> I’ve gotten a parse exception:
>> org.apache.jena.shacl.parser.ShaclParseException: No sh:path on a property 
>> shape: <http://purl.bdrc.io/ontology/shapes/core/ContentLocationShape>
>>at 
>> org.apache.jena.shacl.parser.ShapesParser.findPropertyShapes(ShapesParser.java:285)
>>at 
>> org.apache.jena.shacl.parser.ShapesParser.parseShape$(ShapesParser.java:214)
>>at 
>> org.apache.jena.shacl.parser.ShapesParser.parseShapeStep(ShapesParser.java:196)
>>at 
>> org.apache.jena.shacl.parser.ShapesParser.parseRootShape(ShapesParser.java:140)
>>at 
>> org.apache.jena.shacl.parser.ShapesParser.parseShapes(ShapesParser.java:84)
>>at org.apache.jena.shacl.Shapes.parse(Shapes.java:55)
>> performing:
>> Shapes shapes = Shapes.parse(testGraph);
>> on the graph:
>> bds:ContentLocationShape
>>   a sh:NodeShape ;
>>   sh:property bds:ContentLocationShape-contentLocationStatement ;
>>   sh:targetClass bdo:ContentLocation .
>> In the above graph there are no triples with
>> bds:ContentLocationShape-contentLocationStatement
>> as subject so the Shapes.parse raises an exception which seems reasonable; 
>> however, the message should refer to the missing definition of a putative 
>> PropertyShape reference rather than to the NodeShape that contains the 
>> reference.
>> In the simple case above it’s trivial by a casual inspection what the 
>> problem is, but when there are a large number of PropertyShape refs and all 
>> that the message says is that the NodeShape doesn’t have an sh:path, its 
>> pretty opaque as to what the problem is.
>> Maybe there’s a way to improve the exception message?
>> Thanks,
>> Chris


misleading parse exception message in Shacl.

2020-07-16 Thread Chris Tomlinson
Hi,

I’ve gotten a parse exception:

org.apache.jena.shacl.parser.ShaclParseException: No sh:path on a property 
shape: 
at 
org.apache.jena.shacl.parser.ShapesParser.findPropertyShapes(ShapesParser.java:285)
at 
org.apache.jena.shacl.parser.ShapesParser.parseShape$(ShapesParser.java:214)
at 
org.apache.jena.shacl.parser.ShapesParser.parseShapeStep(ShapesParser.java:196)
at 
org.apache.jena.shacl.parser.ShapesParser.parseRootShape(ShapesParser.java:140)
at 
org.apache.jena.shacl.parser.ShapesParser.parseShapes(ShapesParser.java:84)
at org.apache.jena.shacl.Shapes.parse(Shapes.java:55)
performing:

Shapes shapes = Shapes.parse(testGraph);

on the graph:

bds:ContentLocationShape
  a sh:NodeShape ;
  sh:property bds:ContentLocationShape-contentLocationStatement ;
  sh:targetClass bdo:ContentLocation .
In the above graph there are no triples with  

bds:ContentLocationShape-contentLocationStatement 

as subject so the Shapes.parse raises an exception which seems reasonable; 
however, the message should refer to the missing definition of a putative 
PropertyShape reference rather than to the NodeShape that contains the 
reference.

In the simple case above it’s trivial by a casual inspection what the problem 
is, but when there are a large number of PropertyShape refs and all that the 
message says is that the NodeShape doesn’t have an sh:path, its pretty opaque 
as to what the problem is.

Maybe there’s a way to improve the exception message?

Thanks,
Chris



Re: repeated ThriftConvert WARN visit: Unrecognized:

2020-07-03 Thread Chris Tomlinson
Hi Andy,

> On Jul 3, 2020, at 8:41 AM, Andy Seaborne  wrote:
> 
> 
> 
> On 02/07/2020 21:55, Chris Tomlinson wrote:
> > grep -v "ThriftConvert WARN visit: Unrecognized: " 
> > catalina.out
> 
> 
> Is there any signature as to when they occur?  Two PUTs overlapping, certain 
> usage by your clients (which probably isn't visible in the logs)?  earlier 
> connections broken? high load on the server? Time of day? Anything else that 
> looks like a characteristic?

Our issue <https://github.com/buda-base/buda-base/issues/30> has a bit of a 
discussion on the first occurrence on 30 April which might shed some light. The 
one thing in common in the two occurrences is that in both cases  the graph 
that was  being PUT was the ontologySchema but there were changes to that graph 
between 30 April and 30 Jun so it was not the same graph as far as content. 
Also there were not overlapping puts in the first case.

No unusual load - in fact lightly loaded is the typical situation except during 
the daily migration of legacy data which sometimes may have a lot of changed 
graphs to PUT. We haven’t seen the WARNings happen at that time.

Since we’ve only had 2 occurrences of the issue it’s a bit hard to establish a 
pattern.

Thanks,
Chris


>Andy
> 
> 
> On 03/07/2020 00:13, Chris Tomlinson wrote:
>>> On Jul 2, 2020, at 17:44, Andy Seaborne  wrote:
>>> 
>>> 
>>> 
>>> On 02/07/2020 21:55, Chris Tomlinson wrote:
>>>>> From what I can see, it (WARN) isn't database related.
>>>> No it seems to me to be related to getting the payload off the wire.
>>> 
>>> I think you said the same payload had been sent before.
>>> ??
>> Yes a copy/clone of the same payload, i.e., the serialization of the given 
>> graph, has been sent many times w/o issue.
>>> 
>>> ...
>>> 
>>>>> Even the concurrency looks OK because it locally writes a buffer so the 
>>>>> HTTP length is available.
>>> 
>>> (in case of corruption, not repeat, is happening)
>>> 
>>>> So it seems to me that there may be an opportunity for some sort of 
>>>> robustification in RDFConnection. There is evidently a loop somewhere that 
>>>> doesn't terminate, retrying the parsing repeatedly or something like that. 
>>>> The payload is finite so there wold appear to be a test that repeatedly 
>>>> fails but doesn't make progress in consuming the payload.
>>> 
>>> RDFConnection (client-side) is sending, not parsing.
>> I'm referring to the Fuseki receiving end of the connection, where the 
>> WARNing is being logged.
>>> The WARN says that an empty  was seen.
>>> 
>>> There is no information about the stalled transactions although not 
>>> finishing the write would explain this:
>>> 
>>> 30-Jun-2020 16:21:30.778
>>> 
>>> java.io.BufferedInputStream.read(BufferedInputStream.java:345)
>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>>> 
>>> so it's waiting for input. What's the proxy/reverse-proxy setup?
>> None. For the client on the same ec2 instance, obviously none; and for the 
>> client on a second ec2 instance, we have nothing between our two internal 
>> ec2's
>> In the current situation, the two precipitating PUTs are from a client on 
>> the same ec2 instance.
>>> The code writes the payload to a ByteArrayOutputStream and sends those 
>>> bytes. That's how it gets the length for the HTTP header.
>>> 
>>> https://github.com/apache/jena/blob/master/jena-rdfconnection/src/main/java/org/apache/jena/rdfconnection/RDFConnectionRemote.java#L615
>>> 
>>> (run Fuseki with "verbose" to see the headers ... but it is quite verbose)
>>> 
>>> It sent something so the RDF->Thrift->bytes had finished and then it sent 
>>> bytes.
>> As I tried to clarify, my remarks were w.r.t. the Fuseki/receiving end where 
>> the issue is getting logged. Not the sending/client end.
>> Chris
>>> Anyway - you have the source code ... :-)
>>> 
>>>Andy



Re: repeated ThriftConvert WARN visit: Unrecognized:

2020-07-02 Thread Chris Tomlinson



> On Jul 2, 2020, at 17:44, Andy Seaborne  wrote:
> 
> 
> 
> On 02/07/2020 21:55, Chris Tomlinson wrote:
>>> From what I can see, it (WARN) isn't database related.
>> No it seems to me to be related to getting the payload off the wire.
> 
> I think you said the same payload had been sent before.
> ??

Yes a copy/clone of the same payload, i.e., the serialization of the given 
graph, has been sent many times w/o issue.



> 
> ...
> 
>>> Even the concurrency looks OK because it locally writes a buffer so the 
>>> HTTP length is available.
> 
> (in case of corruption, not repeat, is happening)
> 
>> So it seems to me that there may be an opportunity for some sort of 
>> robustification in RDFConnection. There is evidently a loop somewhere that 
>> doesn't terminate, retrying the parsing repeatedly or something like that. 
>> The payload is finite so there wold appear to be a test that repeatedly 
>> fails but doesn't make progress in consuming the payload.
> 
> RDFConnection (client-side) is sending, not parsing.

I'm referring to the Fuseki receiving end of the connection, where the WARNing 
is being logged.



> The WARN says that an empty  was seen.
> 
> There is no information about the stalled transactions although not finishing 
> the write would explain this:
> 
> 30-Jun-2020 16:21:30.778
> 
> java.io.BufferedInputStream.read(BufferedInputStream.java:345)
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> 
> so it's waiting for input. What's the proxy/reverse-proxy setup?

None. For the client on the same ec2 instance, obviously none; and for the 
client on a second ec2 instance, we have nothing between our two internal ec2's

In the current situation, the two precipitating PUTs are from a client on the 
same ec2 instance.



> The code writes the payload to a ByteArrayOutputStream and sends those bytes. 
> That's how it gets the length for the HTTP header.
> 
> https://github.com/apache/jena/blob/master/jena-rdfconnection/src/main/java/org/apache/jena/rdfconnection/RDFConnectionRemote.java#L615
> 
> (run Fuseki with "verbose" to see the headers ... but it is quite verbose)
> 
> It sent something so the RDF->Thrift->bytes had finished and then it sent 
> bytes.

As I tried to clarify, my remarks were w.r.t. the Fuseki/receiving end where 
the issue is getting logged. Not the sending/client end.

Chris

> Anyway - you have the source code ... :-)
> 
>Andy


Re: repeated ThriftConvert WARN visit: Unrecognized:

2020-07-02 Thread Chris Tomlinson


> On Jul 2, 2020, at 15:33, Andy Seaborne  wrote:
> 
> There may be two things going on ... or one
> 
> The WARN
> 
> This is broken input data but from looking at the Jena code, I'm not seeing 
> how a encoding error on it's own can sent an unset RDF_StreamRow (which is 
> what prints as ) The Jena code is quite simple and by 
> inspection I can't see a path that sends that. And in Thrift you can send 
> unset data anyway - the write path throws a client side a exception.
> 
> So I'm wondering about the path from client to server especially with the 
> broken pipes earlier which might impact a reverse proxy or tomcat itself. 
> Maybe there is junk on the wire.
> 
> 
> The W-transactions
> This is TDB1 presumably.
> 
> The lack of response suggests they are backing up and that can happen in TDB1 
> if the engine database is blocked from writing to the storage. that said - it 
> does log if that is happening unless I/O is blocked at the OS level.
> 
>>> On 02/07/2020 18:37, Chris Tomlinson wrote:
>> Hi Andy,
>>>> On Jul 2, 2020, at 10:22 AM, Andy Seaborne  wrote:
>>> The log done not contain "Unrecognized".
>>> Is the "ThriftConvert WARN visit:" message from a different process?
>> Sorry. I should have been much clearer:
>>>> I have attached the catalina.out with the 199M or so "ThriftConvert WARN”s 
>>>> removed.
> 
> I wasn't clear if "removed" meant stderr another source or not.

grep -v "ThriftConvert WARN visit: Unrecognized: " catalina.out


> 
>> The event starts at
>>line 11305 "[2020-06-30 14:12:51] Fuseki INFO  [120611] PUT 
>> http://buda1. . . .”
>> with the first WARNing at 14:12:53 after the "[120611] 200 OK (2.106 s)”;
>> and ends with the stacktraces resulting from “systemctl restart 
>> fuseki.service” at :
>>line 12940 "30-Jun-2020 16:21:28.445"
>> and the stacktraces end at startup:
>>line 12939 "[2020-06-30 16:21:26] Fuseki INFO  [121275] PUT 
>> http://buda1. . . . "
>> I have a 55.1MB gzip’d file with all of the ~199M "ThriftConvert WARN”s. It 
>> is 15GB+ uncompressed. The rate of the WARNing messages is around 20K - 25K 
>> per second.
>> If you need the entire file, where should I upload it?
> 
> From what I can see, it (WARN) isn't database related.

No it seems to me to be related to getting the payload off the wire.

> 
>>> There are some broken pipe errors much earlier but they happen if clients 
>>> go away untidily.  They are on query responses not PUTs.
>>> The fact 120612 doesn't even print "Body:”
>> I think we conclude from this that the RDFConnection became unstable 
>> processing the incoming payload of 7536 triples in the ontologySchema graph. 
>> We have significantly larger graphs that are PUT using the same pattern as I 
>> displayed earlier.
>> The same source bits/bytes loaded w/o issue before and after this issue so 
>> it seems likely that there was an undetected transmission error that 
>> RDFConnection did not get notified of and was not able to recover from. May 
>> be that’s peculiar to the Thrift encoding?
> 
> Caveat a bug in thrift, no.  Either it works or it doesn't and your system 
> works most of the time.

Yes this is a very intermittent problem


> 
>>> You could run with RDFConnectionRemote - what RDFConnectionFuseki does is 
>>> switch to thrift encodings, RDFConnectionRemote uses the standards ones.
>> We’ll take this under consideration. The problem occurred once at the end of 
>> April and now at the end of June. Not very frequent given the amount of PUTs 
>> that we do on a daily basis. It’s not what I would call effectively 
>> repeatable.
>> Is there a reason that we should distrust Thrift in general? Is it somehow 
>> more susceptible to undetected transmission errors that the standard 
>> encodings?
> 
> Not that I know of. I'm suggesting it to change the situation to get more 
> information.

Ok. I'm not sure we'll make a change just now as we're in the middle of a bit 
of a crunch.

> 
>> Do we need to consider avoiding Thrift everywhere?
> 
> If it works for months, and it is a simple code path, suggests to me it is 
> working.
> 
> Even the concurrency looks OK because it locally writes a buffer so the HTTP 
> length is available.

So it seems to me that there may be an opportunity for some sort of 
robustification in RDFConnection. There is evidently a loop somewhere that 
doesn't terminate, retrying the parsing repeatedly or something like that. The 
payload is finite so there wold appear to be a test 

Re: repeated ThriftConvert WARN visit: Unrecognized:

2020-07-02 Thread Chris Tomlinson


> On Jul 2, 2020, at 15:36, Andy Seaborne  wrote:
> 
> 
>> On 01/07/2020 17:34, Chris Tomlinson wrote:
>> Hi Andy,
>> 
>>>> On Jul 1, 2020, at 7:59 AM, Andy Seaborne >>> <mailto:a...@apache.org>> wrote:
>>> 
>>> Presumably the client is in java using RDFConnectionFactory.connectFuseki?
>> 
>> Yes the clients are in java 1.8 also on Debian 4.9. Fuseki is running on one 
>> ec2 instance and some of the clients are on the same instance and others are 
>> on another ec2 instance on the same region. The clients use the following 
>> patterns:
> 
> And the client code is jena 3.14.0 as well?

Yes

> 
>> 
>>> fuConnBuilder = RDFConnectionFuseki.create()
>>> .destination(baseUrl)
>>> .queryEndpoint(baseUrl+"/query")
>>> .gspEndpoint(baseUrl+"/data")
>>> .updateEndpoint(baseUrl+"/update”);
>>> 
>>> fuConn = fuConnBuilder.build();
>>> if (!fuConn.isInTransaction()) {
>>> fuConn.begin(ReadWrite.WRITE);
>>> }
>>> fuConn.put(graphName, model);
>>> fuConn.commit();
> 
> fuConn isn't shared across threads is it?

No

Chris

> 
>Andy
> 


Re: repeated ThriftConvert WARN visit: Unrecognized:

2020-07-02 Thread Chris Tomlinson
Hi Andy,

> On Jul 2, 2020, at 10:22 AM, Andy Seaborne  wrote:
> 
> The log done not contain "Unrecognized".
> Is the "ThriftConvert WARN visit:" message from a different process?

Sorry. I should have been much clearer:

>> I have attached the catalina.out with the 199M or so "ThriftConvert WARN”s 
>> removed.

The event starts at 

line 11305 "[2020-06-30 14:12:51] Fuseki INFO  [120611] PUT 
http://buda1. . . .”

with the first WARNing at 14:12:53 after the "[120611] 200 OK (2.106 s)”;

and ends with the stacktraces resulting from “systemctl restart fuseki.service” 
at :

line 12940 "30-Jun-2020 16:21:28.445"
 
and the stacktraces end at startup:

line 12939 "[2020-06-30 16:21:26] Fuseki INFO  [121275] PUT 
http://buda1. . . . "

I have a 55.1MB gzip’d file with all of the ~199M "ThriftConvert WARN”s. It is 
15GB+ uncompressed. The rate of the WARNing messages is around 20K - 25K per 
second.

If you need the entire file, where should I upload it?



> There are some broken pipe errors much earlier but they happen if clients go 
> away untidily.  They are on query responses not PUTs.
> 
> The fact 120612 doesn't even print "Body:”

I think we conclude from this that the RDFConnection became unstable processing 
the incoming payload of 7536 triples in the ontologySchema graph. We have 
significantly larger graphs that are PUT using the same pattern as I displayed 
earlier. 

The same source bits/bytes loaded w/o issue before and after this issue so it 
seems likely that there was an undetected transmission error that RDFConnection 
did not get notified of and was not able to recover from. May be that’s 
peculiar to the Thrift encoding?



> You could run with RDFConnectionRemote - what RDFConnectionFuseki does is 
> switch to thrift encodings, RDFConnectionRemote uses the standards ones.

We’ll take this under consideration. The problem occurred once at the end of 
April and now at the end of June. Not very frequent given the amount of PUTs 
that we do on a daily basis. It’s not what I would call effectively repeatable.

Is there a reason that we should distrust Thrift in general? Is it somehow more 
susceptible to undetected transmission errors that the standard encodings?

Do we need to consider avoiding Thrift everywhere?

Thanks,
Chris


> Andy
> 
> 
> On 01/07/2020 17:34, Chris Tomlinson wrote:
>> Hi Andy,
>>> On Jul 1, 2020, at 7:59 AM, Andy Seaborne >> <mailto:a...@apache.org>> wrote:
>>> 
>>> Presumably the client is in java using RDFConnectionFactory.connectFuseki?
>> Yes the clients are in java 1.8 also on Debian 4.9. Fuseki is running on one 
>> ec2 instance and some of the clients are on the same instance and others are 
>> on another ec2 instance on the same region. The clients use the following 
>> patterns:
>>> fuConnBuilder = RDFConnectionFuseki.create()
>>> .destination(baseUrl)
>>> .queryEndpoint(baseUrl+"/query")
>>> .gspEndpoint(baseUrl+"/data")
>>> .updateEndpoint(baseUrl+"/update”);
>>> 
>>> fuConn = fuConnBuilder.build();
>>> if (!fuConn.isInTransaction()) {
>>> fuConn.begin(ReadWrite.WRITE);
>>> }
>>> fuConn.put(graphName, model);
>>> fuConn.commit();
>>> Do you have the data from 120611?
>> The data for the PUT of 120611, can be retrieved via 
>> http://purl.bdrc.io/graph/PersonUIShapes.ttl. The source is 
>> person.ui.shapes.ttl 
>> <https://github.com/buda-base/editor-templates/blob/master/templates/core/person.ui.shapes.ttl>
>>  which is loaded via OntDocumentManager w/ setProcessImports(true) from GH.
>> From the log, 120611 appears to have completed successfully with 839 triples 
>> (the expected number); however, since then there have been several reloads 
>> of that graph during restarts and so on - we’re in development - so the 
>> exact bits in the 120611 PUT are not specifically available. This particular 
>> data has not changed in the past couple of weeks.
>> As for 120612 it never completed and the source data has not had any changes 
>> since 2020-06-29 prior to the “failure” and after and there have been 
>> numerous reloads w/o issue before and since. Only during the period 14:12 - 
>> 16:21 do we see the PUTs hang.
>>> Could the request have got truncated?
>> Referring to 120611, I don’t see any evidence of truncation. The expected 
>> number of triples, 839, is reported in the log.
> 
> 
> 
>>> The fact later PUTs stall suggest (does not prove) that the earlie

repeated ThriftConvert WARN visit: Unrecognized:

2020-06-30 Thread Chris Tomlinson
Hello,

Running jena/fuseki 3.14.0, compiled and running under openjdk version 
“1.8.0_252”.

Just some hours ago fuseki running on an AWS ec2 for almost a year, logged 
around 199M occurrences of (times are UTC):

> [2020-06-30 14:12:50] Fuseki INFO [120610] 200 OK (8 ms)
> [2020-06-30 14:12:51] Fuseki INFO [120611] PUT 
> http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/PersonUIShapes
> [2020-06-30 14:12:52] Fuseki INFO [120612] PUT 
> http://buda1.bdrc.io:13180/fuseki/corerw/data?graph=http://purl.bdrc.io/graph/ontologySchema
> [2020-06-30 14:12:53] Fuseki INFO [120611] Body: Content-Length=73984, 
> Content-Type=application/rdf+thrift, Charset=null => RDF-THRIFT : Count=839 
> Triples=839 Quads=0
> [2020-06-30 14:12:53] Fuseki INFO [120611] 200 OK (2.106 s)
> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: 
> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: 
> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: 
> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: 
> [2020-06-30 14:12:57] ThriftConvert WARN visit: Unrecognized: 

. . .

> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized:  >
> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized:  >
> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized:  >
> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized:  >
> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized:  >
> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized:  >
> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized:  >
> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized:  >
> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized:  >
> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized:  >
> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized:  >
> [2020-06-30 16:21:33] ThriftConvert WARN  visit: Unrecognized:  >
> 30-Jun-2020 16:21:34.866 INFO [main] 
> org.apache.catalina.startup.VersionLoggerListener.log Server version:
> Apache Tomcat/8.0.53
> 30-Jun-2020 16:21:34.868 INFO [main] 
> org.apache.catalina.startup.VersionLoggerListener.log Server built:  
> Jun 29 2018 14:42:45 UTC
> 30-Jun-2020 16:21:34.868 INFO [main] 
> org.apache.catalina.startup.VersionLoggerListener.log Server number: 
> 8.0.53.0
> 30-Jun-2020 16:21:34.868 INFO [main] 
> org.apache.catalina.startup.VersionLoggerListener.log OS Name:   
> Linux
> 30-Jun-2020 16:21:34.868 INFO [main] 
> org.apache.catalina.startup.VersionLoggerListener.log OS Version:
> 4.9.0-8-amd64
> 30-Jun-2020 16:21:34.868 INFO [main] 
> org.apache.catalina.startup.VersionLoggerListener.log Architecture:  
> amd64
> 30-Jun-2020 16:21:34.869 INFO [main] 
> org.apache.catalina.startup.VersionLoggerListener.log Java Home: 
> /usr/lib/jvm/java-8-openjdk-amd64/jre
> 30-Jun-2020 16:21:34.869 INFO [main] 
> org.apache.catalina.startup.VersionLoggerListener.log JVM Version:   
> 1.8.0_252-8u252-b09-1~deb9u1-b09
> 30-Jun-2020 16:21:34.869 INFO [main] 
> org.apache.catalina.startup.VersionLoggerListener.log JVM Vendor:
> Oracle Corporation
> 30-Jun-2020 16:21:34.869 INFO [main] 
> org.apache.catalina.startup.VersionLoggerListener.log CATALINA_BASE: 
> /usr/local/fuseki/tomcat

Until I saw the issue and restarted fuseki.

An interesting thing I see in the trace is the overlapping PUTs just as fuseki 
wigged out. There were no changes in code or the graphs that were the being PUT 
as part of a restart of an app server.

Fuseki didn't go completely unresponsive except that later PUTs were 
indefinitely delayed - which was how I noticed the issue.

We saw this once before in 29 April and didn’t delve into it at the time.

Has anyone seen this sort of thing before? 

Thanks,
Chris





Re: OntDocumentManager and LocatorClassLoader

2020-06-25 Thread Chris Tomlinson
Hi,

Today I’ve tried 3.16.0-SNAPSHOT and it is working the same as 3.15.0 and 
3.14.0.

I think you’re right about the class loader case being a bit of a reach. The 
jar use case is not currently critical to our usage.

The test app 
<https://github.com/buda-base/shapes-testing/blob/master/src/main/java/OntTestLoading5_ONTS_RES_CT.java>
 is itself simple but the pom.xml may be a bit gnarly.

Thanks,
Chris

> On Jun 25, 2020, at 8:55 AM, Andy Seaborne  wrote:
> 
> 
> 
> On 25/06/2020 12:55, Chris Tomlinson wrote:
>> Hi Andy,
>> Sorry, I put the version at the bottom.
> 
> :-) given the other email, I didn't read that far. Sorry.
> 
>> It’s 3.15, also 3.14.
> 
> and not 3.16.0-SNAPSHOT
> 
> Do you have complete, minimal example?
> 
> I think what is happening is that reading imports from an indirected aren't 
> going to know it was indirected.
> 
> Up to 3.15.0 this code had not changed in a long time IIRC so it is a 
> different issue to thread about location-mapping.
> 
>Andy
> 
>> Thanks,
>> Chris
>>> On Jun 25, 2020, at 2:31 AM, Andy Seaborne  wrote:
>>> 
>>> Jena version?
>>> 
>>> On 25/06/2020 02:59, Chris Tomlinson wrote:
>>>> Hi,
>>>> I've got a problem with OntDocumentManager when fetching resources from an 
>>>> element of the classpath via relative urls like   >>> rdf:resource="adm/admin.ttl"/> .
>>>> When I use:
>>>> OntDocumentManager odm = new OntDocumentManager("A/B/C/ont-policy.rdf")
>>>> or
>>>> OntDocumentManager odm = new 
>>>> OntDocumentManager("https://x/a/b/c/ont-policy.rdf;)
>>>> all works as expected. The relative urls are retrieved using the path to 
>>>> the ont-policy.rdf either file or url. Part of making this work is
>>>> xmlns:base =""
>>>> in the policy file.
>>>> However, trying
>>>> OntDocumentManager odm = new OntDocumentManager("plop/ont-policy.rdf")
>>>> finds the policy resource via the LocatorClassLoader - seen with TRACE 
>>>> enabled - as expected; but then all the relative urls are prepended with 
>>>> the path to the app.jar:
>>>> java -jar /path/to/where/to/find/app.jar
>>>> I figure I'm missing some config or a uri scheme or something like that. 
>>>> I'm currently using 3.15.
>>>> Thank you for your help,
>>>> Chris



Re: OntDocumentManager and LocatorClassLoader

2020-06-25 Thread Chris Tomlinson
Hi Andy,

Sorry, I put the version at the bottom. It’s 3.15, also 3.14.

Thanks,
Chris



> On Jun 25, 2020, at 2:31 AM, Andy Seaborne  wrote:
> 
> Jena version?
> 
> On 25/06/2020 02:59, Chris Tomlinson wrote:
>> Hi,
>> I've got a problem with OntDocumentManager when fetching resources from an 
>> element of the classpath via relative urls like   > rdf:resource="adm/admin.ttl"/> .
>> When I use:
>> OntDocumentManager odm = new OntDocumentManager("A/B/C/ont-policy.rdf")
>> or
>> OntDocumentManager odm = new 
>> OntDocumentManager("https://x/a/b/c/ont-policy.rdf;)
>> all works as expected. The relative urls are retrieved using the path to the 
>> ont-policy.rdf either file or url. Part of making this work is
>> xmlns:base =""
>> in the policy file.
>> However, trying
>> OntDocumentManager odm = new OntDocumentManager("plop/ont-policy.rdf")
>> finds the policy resource via the LocatorClassLoader - seen with TRACE 
>> enabled - as expected; but then all the relative urls are prepended with the 
>> path to the app.jar:
>> java -jar /path/to/where/to/find/app.jar
>> I figure I'm missing some config or a uri scheme or something like that. I'm 
>> currently using 3.15.
>> Thank you for your help,
>> Chris



OntDocumentManager and LocatorClassLoader

2020-06-24 Thread Chris Tomlinson
Hi,

I've got a problem with OntDocumentManager when fetching resources from an 
element of the classpath via relative urls like.

When I use:

OntDocumentManager odm = new OntDocumentManager("A/B/C/ont-policy.rdf")

or

OntDocumentManager odm = new 
OntDocumentManager("https://x/a/b/c/ont-policy.rdf;)

all works as expected. The relative urls are retrieved using the path to the 
ont-policy.rdf either file or url. Part of making this work is

xmlns:base =""

in the policy file.

However, trying

OntDocumentManager odm = new OntDocumentManager("plop/ont-policy.rdf")

finds the policy resource via the LocatorClassLoader - seen with TRACE enabled 
- as expected; but then all the relative urls are prepended with the path to 
the app.jar:

java -jar /path/to/where/to/find/app.jar

I figure I'm missing some config or a uri scheme or something like that. I'm 
currently using 3.15.

Thank you for your help,
Chris

Re: one jena-shacl question - was Re: two jena-shacl questions

2020-06-03 Thread Chris Tomlinson
Hi Andy,

Thank you so much for your patience and help. I think I’ve got a handle on 
things now and will forge ahead. 

I appreciate you raising JENA-1905 
<https://issues.apache.org/jira/browse/JENA-1905>, JENA-1906 
<https://issues.apache.org/jira/browse/JENA-1906>, and JENA-1907 
<https://issues.apache.org/jira/browse/JENA-1907>. I’ll comment on the issues 
as appropriate.

Thank you again,
Chris


> On Jun 1, 2020, at 4:44 PM, Andy Seaborne  wrote:
> 
> 
> 
> On 01/06/2020 21:08, Chris Tomlinson wrote:
>> Hi Andy,
>> Not trying to be pedantic below but I’m trying to understand how to think in 
>> shacl and establish some expectations of the validation process.
> 
> If it help, the general pattern is
> 
> Target ->
>  (Node shape -> property shape->)*
>  Constraint*
> 
>>> On May 31, 2020, at 9:40 AM, Andy Seaborne  wrote:
>>> 
>>> Do we agree that this is a test case?
>>> (one file, data and shapes combined)
>>> Only command line tools needed.
>> I agree that the combined data and shapes file exhibits differences in 
>> report results, when interchanging bds:PersonShape and bds:PersonLocalShape.
>>> 
>>> @prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>> @prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
>>> @prefix sh:<http://www.w3.org/ns/shacl#> .
>>> @prefix bdo:   <http://purl.bdrc.io/ontology/core/> .
>>> @prefix bdr:   <http://purl.bdrc.io/resource/> .
>>> @prefix bds:   <http://purl.bdrc.io/ontology/shapes/core/> .
>>> 
>>> ## Data:
>>> 
>>> bdr:NM0895CB6787E8AC6E
>>> a   bdo:PersonName ;
>>> .
>>> 
>>> bdr:P707  a  bdo:Person ;
>>>bdo:personName   bdr:NM0895CB6787E8AC6E ;
>>> .
>>> 
>>> ## Shapes:
>>> 
>>> #bds:PersonShape   # 2
>>> bds:PersonLocalShape  # 1
>>>sh:property bds:PersonShape-personName ;
>>>sh:targetClass  bdo:Person ;
>>> .
>>> 
>>> bds:PersonShape-personName
>>>sh:message  "PersonName is not well-formed, wrong Class or missing 
>>> rdfs:label"@en ;
>>>sh:node bds:PersonNameShape ;
>>>sh:path bdo:personName ;
>>> .
>>> 
>>> bds:PersonNameShape  a  sh:NodeShape ;
>>>sh:property bds:PersonNameShape-personNameLabel ;
>>>sh:targetClass  bdo:PersonName ;
>>> .
>>> 
>>> bds:PersonNameShape-personNameLabel
>>>sh:message  ":PersonName must have exactly one rdfs:label"@en ;
>>>sh:minCount 1 ;
>>>sh:path rdfs:label ;
>>> .
>>> 
>>> 
>>> The differences seems to be that the hash order is different and it affects 
>>> finding targets, combined with the fact that targets are nested:
>> I see JENA-1907 <https://issues.apache.org/jira/browse/JENA-1907> raises the 
>> issue; I understand:
>>> If A is processed first as a target then the parser shapes now includes B 
>>> so processing B is skipped.
>>> Note - the effect is only in the number of times constriants are executed , 
>>> once or twice, not whether they are omitted.
>> to say that, in the current test case w/ the hash order issue, when nesting 
>> occurs owing to sh:node, then when a violation is found by (A) 
>> bds:PersonShape-personName, then the validation does not "go deeper" to 
>> consider (B) bds:PersonNameShape, by itself. W/o sh:node, in 
>> bds:PersonShape-personName, then both  bds:PersonShape-personName and 
>> bds:PersonNameShape are parsed as independent targets and  executed 
>> independently.
>>> bds:PersonLocalShape (target)
>>> -> bds:PersonLocalShape
>>>   -> bds:PersonNameShape (target)
>>> -> bds:PersonNameShape-personNameLabel
>> I think the second line above is supposed to be
>> -> bds:PersonShape-personName
>>> Both targets match bdr:P707, one by class, one by property.
>> I understand the NodeShape, bds:PersonLocalShape, matching bdr:P707, 
>> meaning, to me, that the constraints expressed in that shape need to be 
>> evaluated w/ P707 being the subject (== focus node). I take this to be “by 
>> class”.
>> I do not understand how NodeShape, bds:PersonNameShape, matches bdr:P707. I 
>> think bds:PersonNameShape matches bdr:NM0895CB6787E8AC6E because of 
>> sh:targetClass

Re: one jena-shacl question - was Re: two jena-shacl questions

2020-06-01 Thread Chris Tomlinson
personName to identify the particular NodeShape that is 
intended to validate objects of the "sh:path bdo:personName” in this situation.

Perhaps I see what is "supposed to execute twice”. 

With the "sh:node bds:PersonNameShape” in bds:PersonShape-personName, then 
bds:PersonNameShape validation must be executed (if it hasn’t already been 
executed); and

since bdr:NM0895CB6787E8AC6E will match bds:PersonNameShape separately by 
considering “sh:targetClass bdo:PersonName” then unless there is some check in 
the validator to see if a (node, shape) pair has already been executed, then 
there will be 2 executions instead of just 1.


> You can see the differences with "shacl print”.

I do see differences w/ “shacl parse” w/ and w/o "sh:node bds:PersonNameShape”. 
I’ll learn to use the tool.

My take away is that I shouldn’t be using sh:node as I have or perhaps I could 
remove the sh:targetClass from bds:PersonNameShape and use sh:node to steer the 
validation. But I guess the latter would lead to the generic "PersonName is not 
well-formed …” message instead of the more specific "PersonName must have 
exactly one rdfs:label”.

There seem to be many nuances to shacl.

Anyway thanks very much for the valuable information regarding using shacl,
Chris




> 
>Andy
> 
> 
> On 29/05/2020 20:39, Chris Tomlinson wrote:
>> Hi Andy,
>> Thank you for the reply. Focussing on just the first question. I have 
>> prepared small self-contained tests of jena-shacl from 3.14.0 (JS) and 
>> TopQuadrant Shacl 1.3.2 (TQ).
>> The apps differ only according to differences imposed by the JS and TQ APIs:
>> ShaclName_validateGraphJS.java <https://pastebin.com/5382xZeL>
>> ShaclName_validateGraphTQ.java <https://pastebin.com/3BxmyhqA>
>> The DATA_P707.ttl <https://pastebin.com/ugCZfABj> contains the three needed 
>> triples from the ontology and the bare minimum from the example P707 with 
>> two different errors in two of the PersonName instances.
>> The ShapeName_01.ttl <https://pastebin.com/jDqzvPTe> contains the shape 
>> definitions and all tests are performed only by changing the name on line 9.
>> The ShaclName_validateGraphJS-results-PersonShape.txt 
>> <https://pastebin.com/seEfWKNa> shows the results when the JS app is run 
>> with the name bds:PersonShape and gives the expected results.
>> The ShaclName_validateGraphJS-results-PersonLocalShape… 
>> <https://pastebin.com/q1SWMC4H> shows the results when the JS app is run 
>> with the name bds:PersonLocalShape and gives unexpected results. Namely, the 
>> expected violation regarding the PersonName which uses skos:prefLabel 
>> instead of rdfs:label is erroneously reported as conforming.
>> The ShaclName_validateGraphJS-results-varying.txt 
>> <https://pastebin.com/CNwnE5kg> shows results for names ranging from “P”, 
>> “Pe”, “Per” thru “PersonLocal”, “PersonShape” upto “PersonLocalShape”, 
>> “PersonLocalShaper”, and finally “PersonLocalShapers” for the JS app. In the 
>> table a “0” means the unexpected result and a “1” means the expected result 
>> - 7 names produce unexpected results and 20 names produce expected results.
>> The ShaclName_validateGraphTQ-results.txt <https://pastebin.com/BQnStjVq> 
>> shows the results when the TQ app is run for any spelling of the name on 
>> line 9 of ShapeName_01.ttl <https://pastebin.com/jDqzvPTe>. The results are 
>> the expected results as with some spellings of the name in the JS case. TQ 
>> shows no variation owing to the name on line 9 as is expected.
>> (Note: The TQ engine needed to be re-initialized for each use otherwise it 
>> accumulated results. This is why there is an init of the 
>> ShaclSimpleValidator at each use in the JS app even though it is not needed. 
>> I just wanted to produce as much as possible an apples-to-apples comparison 
>> of JS and TQ.)
>> (Note: The TQ report does not include sh:conforms true ; in the results, 
>> just: [ a   sh:ValidationReport ] . I don’t know if this conforms to the 
>> SHACL spec but that’s another matter.)
>> The results from the command line tests show the same as the above.
>> Running  with line 9 of  ShapeName_01.ttl <https://pastebin.com/jDqzvPTe> 
>> set to bds:PersonLocalShape:
>> shacl v -s ShapeName_01.ttl -d DATA_P707.ttl > 
>> PersonLocalShape_JS_Results.ttl <https://pastebin.com/M9s859Kc>
>> produces the unexpected results, namely there is no detail regarding the 
>> missing rdfs:label on bdr:NM0895CB6787E8AC6E.
>> However, running with line 9 of  ShapeName_01.ttl 
>> <https://pastebin.com/jDqzvPTe> set to bds:PersonShape:
>> 

one jena-shacl question - was Re: two jena-shacl questions

2020-05-29 Thread Chris Tomlinson
Hi Andy,

Thank you for the reply. Focussing on just the first question. I have prepared 
small self-contained tests of jena-shacl from 3.14.0 (JS) and TopQuadrant Shacl 
1.3.2 (TQ).

The apps differ only according to differences imposed by the JS and TQ APIs:

ShaclName_validateGraphJS.java 

ShaclName_validateGraphTQ.java 


The DATA_P707.ttl  contains the three needed 
triples from the ontology and the bare minimum from the example P707 with two 
different errors in two of the PersonName instances.

The ShapeName_01.ttl  contains the shape 
definitions and all tests are performed only by changing the name on line 9.

The ShaclName_validateGraphJS-results-PersonShape.txt 
 shows the results when the JS app is run with 
the name bds:PersonShape and gives the expected results.

The ShaclName_validateGraphJS-results-PersonLocalShape… 
 shows the results when the JS app is run with 
the name bds:PersonLocalShape and gives unexpected results. Namely, the 
expected violation regarding the PersonName which uses skos:prefLabel instead 
of rdfs:label is erroneously reported as conforming.

The ShaclName_validateGraphJS-results-varying.txt 
 shows results for names ranging from “P”, “Pe”, 
“Per” thru “PersonLocal”, “PersonShape” upto “PersonLocalShape”, 
“PersonLocalShaper”, and finally “PersonLocalShapers” for the JS app. In the 
table a “0” means the unexpected result and a “1” means the expected result - 7 
names produce unexpected results and 20 names produce expected results.

The ShaclName_validateGraphTQ-results.txt  shows 
the results when the TQ app is run for any spelling of the name on line 9 of 
ShapeName_01.ttl . The results are the expected 
results as with some spellings of the name in the JS case. TQ shows no 
variation owing to the name on line 9 as is expected. 

(Note: The TQ engine needed to be re-initialized for each use otherwise it 
accumulated results. This is why there is an init of the ShaclSimpleValidator 
at each use in the JS app even though it is not needed. I just wanted to 
produce as much as possible an apples-to-apples comparison of JS and TQ.)

(Note: The TQ report does not include sh:conforms true ; in the results, just: 
[ a   sh:ValidationReport ] . I don’t know if this conforms to the SHACL 
spec but that’s another matter.)

The results from the command line tests show the same as the above.

Running  with line 9 of  ShapeName_01.ttl  set 
to bds:PersonLocalShape:

shacl v -s ShapeName_01.ttl -d DATA_P707.ttl > 
PersonLocalShape_JS_Results.ttl 

produces the unexpected results, namely there is no detail regarding the 
missing rdfs:label on bdr:NM0895CB6787E8AC6E.

However, running with line 9 of  ShapeName_01.ttl 
 set to bds:PersonShape:

shacl v -s ShapeName_01.ttl -d DATA_P707.ttl > PersonShape_JS_Results.ttl 


produces the expected results, in that the detail regarding the missing 
rdfs:label on bdr:NM0895CB6787E8AC6E is present among the results.

I did not set up the TQ command line but I think the above TQ results make this 
testing unnecessary.

I think these tests show that there is an unexpected dependence on a shape name 
in the JS library and not in the TQ library. I think this is an error and I can 
open a JIRA issue if appropriate. 

A consideration I have is that we want to be able to use the fuseki shacl 
endpoint for some processing and hence need to understand the expected behavior 
of the JS library which is integrated.

Thank you again for your help
Chris





> On May 29, 2020, at 6:26 AM, Andy Seaborne  wrote:
> 
>> Question 1: regarding the name  bds:PersonShape at line 9 of 
>> ShapeName_01.ttl . With that name the results 
>> of running ShaclName_validateGraph.java  are 
>> as expected, see ShapeName-results-PersonShape.txt 
>> .
>> There are two errors in P707_nameErrs02.ttl  
>> regarding bdr:NMC2A097019ABA499F and bdr:NM0895CB6787E8AC6E which are 
>> reported in the ShapeName-results-PersonShape.txt 
>>  file.
>> However, if the name at line 9 of ShapeName_01.ttl 
>>  is changed to: bds:PersonLocalShape or 
>> bds:Frogs; then detail for bdr:NM0895CB6787E8AC6E reports, (see 
>> ShapeName-results-PersonLocalShape.txt ):
>> [ a sh:ValidationReport ;
>>   sh:conforms true ] .
>> instead of:
>> [ ash:ValidationReport ;
>>   sh:conforms  false ;
>>   sh:result[ a 

two jena-shacl questions

2020-05-28 Thread Chris Tomlinson
Hi,

I have a two questions regarding behavior I’m seeing w/ jena-shacl in 3.14.0.

The data file is P707_nameErrs02.ttl , the shape 
graph is at ShapeName_01.ttl , and the test code 
is ShaclName_validateGraph.java .


Question 1: regarding the name  bds:PersonShape at line 9 of ShapeName_01.ttl 
. With that name the results of running 
ShaclName_validateGraph.java  are as expected, 
see ShapeName-results-PersonShape.txt . 

There are two errors in P707_nameErrs02.ttl  
regarding bdr:NMC2A097019ABA499F and bdr:NM0895CB6787E8AC6E which are reported 
in the ShapeName-results-PersonShape.txt  file.

However, if the name at line 9 of ShapeName_01.ttl 
 is changed to: bds:PersonLocalShape or 
bds:Frogs; then detail for bdr:NM0895CB6787E8AC6E reports, (see 
ShapeName-results-PersonLocalShape.txt ):

[ a sh:ValidationReport ; 
  sh:conforms true ] .

instead of:

[ ash:ValidationReport ;
  sh:conforms  false ;
  sh:result[ a sh:ValidationResult ;
 sh:focusNode  bdr:NM0895CB6787E8AC6E ;
 sh:resultMessage  ":PersonName must have exactly 
one rdfs:label"@en ;
 sh:resultPath rdfs:label ;
 sh:resultSeverity sh:Violation ;
 sh:sourceConstraintComponent  sh:MinCountConstraintComponent ;
 sh:sourceShape
bds:PersonNameShape-personNameLabel
   ]
] .

which is the result with bds:PersonShape at line 9 of ShapeName_01.ttl 
. In fact changing the name to bds:FrogTarts 
also produces the expected results.

Summary: If the shape name at line 9 of ShapeName_01.ttl 
 is either bds:PersonShape or bds:FrogTarts then 
the results are as expected; while if the shape name is either 
bds:PersonLocalShape or bds:Frogs then one of the detail results disappears and 
is replaced by  sh:conforms true.

Why this dependence on the shape name? The shape name isn’t referred to 
elsewhere in ShapeName_01.ttl .



Question 2: With the same files as illustration, I’m wanting to understand how 
deep the:

sv.validate(shapes, dataGraph, rez.asNode());

goes? What I mean is that simply calling:

Model topReport = process(shapes, dataGraph, rez);

at line 74 of ShaclName_validateGraph.java  
produces just the result:

[ ash:ValidationReport ;
  sh:conforms  false ;
  sh:result[ a sh:ValidationResult ;
 sh:focusNode  bdr:P707 ;
 sh:resultMessage  "PersonName is not well-formed, 
wrong Class or missing rdfs:label"@en ;
 sh:resultPath bdo:personName ;
 sh:resultSeverity sh:Violation ;
 sh:sourceConstraintComponent  sh:NodeConstraintComponent ;
 sh:sourceShapebds:PersonShape-personName ;
 sh:value  bdr:NM0895CB6787E8AC6E
   ] ;
  sh:result[ a sh:ValidationResult ;
 sh:focusNode  bdr:P707 ;
 sh:resultMessage  "PersonName is not well-formed, 
wrong Class or missing rdfs:label"@en ;
 sh:resultPath bdo:personName ;
 sh:resultSeverity sh:Violation ;
 sh:sourceConstraintComponent  sh:ClassConstraintComponent ;
 sh:sourceShapebds:PersonShape-personName ;
 sh:value  bdr:NMC2A097019ABA499F
   ]
] .

without going deeper to produce the more detailed results for each of the 
PersonNames in error.

Is this a result of validate node semantics?

As can be seen the detailed results are produced by using validate node on the 
sh:value objects in the above two results. Is this the appropriate way of 
extracting more useful detail than the generic results reported from the 
violations reported by the “top-level” call to validate node?


I hope the questions and files are reasonably clear.

Thanks for help with these two questions,
Chris



Re: SHACL validation of updates with Fuseki

2020-05-20 Thread Chris Tomlinson
Hi Andy,

Our jena-shacl use case in fuseki involves a limited set of graph types each 
with its own set of shapes, most of which do not apply to other graph types, 
although some apply across all graphs like requiring at least one 
skos:prefLabel from among a list of languages.

Each changed/new graph needs to be validated in the context of a comparatively 
small subset of all of the named graphs: the ontology graph, and graphs 
containing resources referenced from the subject graph. Pretty much your case:

>> * the validation needs local changes (e.g. minCount) to the entity (subject 
>> and all triples with that subject) - that can be used to reduce the number 
>> of validations done. If a entity isn't touched, no validation necessary.


We’re currently using ont-policy.rdf and imports to group shapes ontologies 
into graphs and provide an include facility. We have code that interrogates a 
custom element in the OntologySpec that is used to PUT a named shape graph to 
fuseki. So I’m not sure about the proposed features 
https://afs.github.io/shacl-datasets.html 
.

I haven’t looked at shacl-c.

Thank you,
Chris



> On May 20, 2020, at 2:50 PM, Andy Seaborne  wrote:
> 
> Hi Ben,
> 
> Not currently I'm afaird but certainly something to provide.
> 
> Question to everyone:
> 
> Would it work if the same SHACL rules applied to all graphs? Or are there 
> cases where different graph have different sets of shapes?
> 
> It is possible to prune the validation work significantly because many 
> constraints don't need the whole dataset so having many shapes, most of which 
> don't apply to a graph should not have too much impact.
> 
> What is more the validation is focused on changes:
> 
> * the validation is only on the triple added (e.g. sh:datatype) - and does 
> not need access to the database so it can be done in parallel outside the 
> transaction.
> 
> * the validation needs local changes (e.g. minCount) to the entity (subject 
> and all triples with that subject) - that can be used to reduce the number of 
> validations done. If a entity isn't touched, no validation necessary.
> 
> * global - needs access to the whole database. Not much can be done except 
> execute inline at the end of the transaction.
> 
> (from
> https://lists.apache.org/thread.html/rc4df58fba718a0cbfe9305cee9ab24c6c25bc162c468f9336f059b85%40%3Cusers.jena.apache.org%3E
> )
> 
> or does it need something more compliucated, may be even targeting graphs?
> 
>   https://afs.github.io/shacl-datasets.html
> 
> Another thing to add is SHACL-C (Compact syntax), at least for reading, for 
> manageability in the case of many relatively simple constraints.
> 
>Andy
> 
> 
> 
> On 20/05/2020 13:08, Benjamin Geer wrote:
>> Hello,
>> Is it possible to configure Fuseki to use 
>> org.apache.jena.shacl.GraphValidation, so that each update is accepted only 
>> if, after the changes, the union of all named graphs in the dataset would be 
>> valid according to SHACL shapes that are configured on the server? In other 
>> words, to do what the Shacl02_validateTransaction example does, but for 
>> updates submitted to Fuseki?
>> Ben
>> ---
>> Data and Service Center for the Humanities (DaSCH)
>> University of Basel, Switzerland
>> https://dasch.swiss 



Re: SHACL Endpoint questions

2020-05-19 Thread Chris Tomlinson
Hi Andy,

Thanks for the very helpful feedback.

1) I did not understand the proper use of sh:inversePath. I thought it was to 
verify that the target of the target/value of the sh:path property had a 
property equal to the value of sh:inversePath. I see that is just not correct.

2) I’ve found an effective solution to the problem of limiting validation to 
just the triples that should be in the graph of a resource such as bdr:P707 by 
creating a new shapes module that uses  [ ] sh:deactivated true   on any 
propertyShapes that leave the graph in question.

I’m getting closer to being able to formulate a plausible extension to the 
shacl endpoint.

Thank you again for your help in the midst of all the 3.15.0 work,
Chris


> On May 16, 2020, at 5:45 AM, Andy Seaborne  wrote:
> 
> 
> 
> On 15/05/2020 00:57, Chris Tomlinson wrote:
>> Hello Andy,
>> I have standalone code using validator.validate(Shapes, Graph, Node) where 
>> the graph is a merge of the target graph, e.g., P707, and the ontology 
>> graph. This works fine to validate examples like P707 generating sh:results 
>> just for references to P705 which is not otherwise included in the merged 
>> graph, which is what I expect.
>> If the code was running in Fuseki and the graph is the dataset graph (equiv 
>> union graph I think) then I would like to know how far out from the node the 
>> validation process will reach.
> 
> That depends on the target (or, here, implicit class target) and the shape 
> itself.
> 
> digression
> 
> ... something that I've experimented with - analysing the shapes to determine 
> execution strategy. There are some useful cases:
> 
> * the validation is only on the triple added (e.g. sh:datatype) - and does 
> not need access to the database so it can be done in parallel outside the 
> transaction.
> * the validation needs local changes (e.g. minCount) to the entity (subject 
> and all triples with that subject) - that can be used to reduce the number of 
> validations done. If a entity isn't touched, no validation necessary.
> * global - needs access to the whole database. Not much can be done except 
> execute inline at the end of the transaction. Often these are SPARQL 
> constraits where you can e.g. count the triples.
> 
> /digression
> 
>> For example, given that the shapes include the shape:
>> bds:PersonShape-hasParent
>> a   sh:PropertyShape ;
>> sh:classbdo:Person ;
>> sh:description  "this Person may have at most two parents."@en ;
>> sh:inversePath  bdo:hasChild ;
> 
> ??
> 
>> sh:maxCount 2 ;
>> sh:path bdo:hasParent ;
>> .
>> Then I thought that the validation process would check just that:
>> P705 rdf:type bdo:Person .
>> as well as validating the count constraint; and in the case of the shape:
> 
> 
> Yes - there are two constraints: sh:class and sh:maxCount
> 
>> bds:PersonShape-hasFather
>> a   sh:PropertyShape ;
>> sh:description  "this Person may have a father."@en ;
>> sh:inversePath  bdo:hasChild ;
> 
> Is that supposed to be:
> 
> sh:path [ sh:inversePath  bdo:hasChild  ]
> 
> ?
> 
> A property shape has a sh:path and that sh:path can be a inverse path.
> 
> sh:inversePath isn't used on the property shape itself.
> 
>> sh:maxCount 1 ;
>> sh:node bds:MaleShape ;
>> sh:path bdo:hasFather ;
> 
> and now we have two sh:paths?
> 
> (If that is you shape, the sh:inversePath is going to be ignored as it is out 
> of place.)
> 
>> .
>> will in addition check that:
>> P705 bdo:gender bdr:GenderMale .
>> and not check any other constraints on P705, such as its students or kinship 
>> relations.
> 
> If P705 is reached with "sh:path bdo:hasFather"
> 
>> The purpose being that when a user has “edited” an existing resource or 
>> “created” a new resource then we just want to validate the changed or new 
>> resource without having the validation process traverse all resources 
>> reachable from P707 via arbitrary length paths, which is unnecessary.
>> Assuming the validator.validate(Shapes, Graph, Node) works along the lines 
>> I’ve sketched, then since the shacl endpoint doesn’t use this method it 
>> would take an extension to the endpoint or a new endpoint to accomplish want 
>> I’ve described.
> 
> See the code.
> 
>validator.validate(Shapes, Graph, Node)
> 
> executes the shapes (any that apply) to the single focus node.  It does check 
> the shapes to see which apply so the target clause (

Re: SHACL Endpoint questions

2020-05-14 Thread Chris Tomlinson
Hello Andy,

I have standalone code using validator.validate(Shapes, Graph, Node) where the 
graph is a merge of the target graph, e.g., P707, and the ontology graph. This 
works fine to validate examples like P707 generating sh:results just for 
references to P705 which is not otherwise included in the merged graph, which 
is what I expect.

If the code was running in Fuseki and the graph is the dataset graph (equiv 
union graph I think) then I would like to know how far out from the node the 
validation process will reach.

For example, given that the shapes include the shape:

bds:PersonShape-hasParent
a   sh:PropertyShape ;
sh:classbdo:Person ;
sh:description  "this Person may have at most two parents."@en ;
sh:inversePath  bdo:hasChild ;
sh:maxCount 2 ;
sh:path bdo:hasParent ;
.

Then I thought that the validation process would check just that:

P705 rdf:type bdo:Person .

as well as validating the count constraint; and in the case of the shape:

bds:PersonShape-hasFather
a   sh:PropertyShape ;
sh:description  "this Person may have a father."@en ;
sh:inversePath  bdo:hasChild ;
sh:maxCount 1 ;
sh:node bds:MaleShape ;
sh:path bdo:hasFather ;
.

will in addition check that:

P705 bdo:gender bdr:GenderMale .

and not check any other constraints on P705, such as its students or kinship 
relations.

The purpose being that when a user has “edited” an existing resource or 
“created” a new resource then we just want to validate the changed or new 
resource without having the validation process traverse all resources reachable 
from P707 via arbitrary length paths, which is unnecessary.

Assuming the validator.validate(Shapes, Graph, Node) works along the lines I’ve 
sketched, then since the shacl endpoint doesn’t use this method it would take 
an extension to the endpoint or a new endpoint to accomplish want I’ve 
described.

I’m happy to raise an issue and create a PR if that makes sense.

Thank you again very much,
Chris



> On May 14, 2020, at 4:16 PM, Andy Seaborne  wrote:
> 
> On 14/05/2020 19:06, Chris Tomlinson wrote:
>> Hi Andy,
>> I want to validate a named graph in the context of the union graph. I don’t 
>> want to validate the union graph. The union graph has information in it such 
>> as the ontology which defines subClass and subProperty relations needed to 
>> successfully validate a target graph such as http://purl.bdrc.io/graph/P707 
>> <http://purl.bdrc.io/graph/P707>.
> 
> I don't understand "in the context of the union graph."
> 
> Isn't "Context" in RDF is "merge the graphs"?
> 
> Validation is a process that operates on a shapes graph (which is parsed so 
> really its just shapes - anything else in it is ignored) and a data graph.
> 
> There's no structure to the data graph - it is everything being validated.
> 
> I did suggest some SHACL extensions
> 
>   https://afs.github.io/shacl-datasets.html
> 
> but they are hypothetical extensions.
> 
> 
> In code, you could make a temporary union of two or more graphs to make a 
> single data graph.
> 
> "a named graph in the context of the union graph."
> 
> So the NG is in addition to the dataset graphs? or is in in the dataset 
> already?
> 
> 
> In the SHACL service ?graph= is the data target and is taken from the dataset.
> 
>> Also P707 refers to a parent and teacher P705 which needs to be verified 
>> that it meets minimum criteria for a Person.
>> I thought that validate(shapes, graph, node)
> 
> /** Produce a full validation report for this node in the data. */
> 
> i.e. use node as the focus node (like sh:targetNode) and execute the shapes 
> only with that node.
> 
> But does P707 have one focus node or many?
> 
>> should accomplish this if graph = the dataset graph which contains all these 
>> additional bits of information.
>> That’s why the endpoint is interesting since it provides in principle access 
>> to using shacl inside of Fuseki, where the entire dataset is available, 
>> without having to write an independent bit of code that we add to our fuseki 
>> deployments.
> 
> There is nothing special about Fuseki endpoint - any Dataset has a union 
> graph.
> 
> It's a way to call
> 
> ValidationReport report =
>ShaclValidator.get().validate(shapesGraph, data);
> 
> on a remote data graph.
> 
>> I hope this clarifies what I’m wanting to accomplish. I probably don’t 
>> understand what validate(shapes, graph, node) is supposed to do.
>> Thanks for your patience,
>> Chris
>>> On May 14, 2020, at 12:34 PM, Andy Seaborne  wr

Re: SHACL Endpoint questions

2020-05-14 Thread Chris Tomlinson
Hi Andy,

I want to validate a named graph in the context of the union graph. I don’t 
want to validate the union graph. The union graph has information in it such as 
the ontology which defines subClass and subProperty relations needed to 
successfully validate a target graph such as http://purl.bdrc.io/graph/P707 
<http://purl.bdrc.io/graph/P707>. 

Also P707 refers to a parent and teacher P705 which needs to be verified that 
it meets minimum criteria for a Person.

I thought that validate(shapes, graph, node) should accomplish this if graph = 
the dataset graph which contains all these additional bits of information.

That’s why the endpoint is interesting since it provides in principle access to 
using shacl inside of Fuseki, where the entire dataset is available, without 
having to write an independent bit of code that we add to our fuseki 
deployments.

I hope this clarifies what I’m wanting to accomplish. I probably don’t 
understand what validate(shapes, graph, node) is supposed to do.

Thanks for your patience,
Chris


> On May 14, 2020, at 12:34 PM, Andy Seaborne  wrote:
> 
> ?graph names the graph to be validated.
> 
> ?graph can be a URI of a named graph in the dataset
> 
> or ?graph=default for the default graph (note: this is the storage default 
> graph, not the union default graph)
> 
> or ?graph=union for the union of all named graphs which is what I think 
> you're asking for.
> 
> (This is the org.apache.jena.fuseki.servlets.SHACL_Validation servlet.)
> 
> 
> On 14/05/2020 15:40, Chris Tomlinson wrote:
>> Hi Andy,
>> Thanks very much for the shacl guidance. The use of sh:targetSubjectsOf is 
>> quite helpful. I replaced the bdo:personName w/ bdo:isRoot which must be 
>> present on any Entity resource so that if a Work or Place or other entity is 
>> checked it will fail if it isn’t a bdo:Person.
>> This still fails in the event that there is no bdo:isRoot so in some way 
>> that negative needs also to be caught to weed out really malformed graphs.
>> I still have a question about the shacl endpoint:
>> Is the ?graph parameter validated in the context of the entire dataset 
>> specified in the endpoint URL or just the named graph itself?
>> It appears to be just the named graph itself so is the same as running the 
>> shacl command outside of Fuseki.
> 
> Yes - as above, it can be the union.
> 
>> We are wanting a validation of the named graph against the entire (union) 
>> dataset graph
> 
> Not sure what "against" means here. There is a shapes graph in the validate 
> request and data graph, which can be the union graph of the dataset.
> 
> To direct the validation to a certain node, use sh:targetNode.
> 
>> which will have sufficient information about subClassOf* and external 
>> resources like P705 without entailing a validation of all nodes reachable 
>> from triples in the ?graph named graph. This might be similar to:
>> validator.validate(shapes, dsg, node)
>> where node would be the root resource URI like, 
>> <http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707>>.
>> Is this something that needs an issue raised and a bit of extension of the 
>> endpoint or is there another way to get this kind of behavior through the 
>> endpoint?
>> Thank you very much for your help,
>> Chris
>>> On May 13, 2020, at 12:16 PM, Andy Seaborne  wrote:
>>> 
>>> 
>>> 
>>> On 13/05/2020 16:03, Chris Tomlinson wrote:
>>>> Hi Andy,
>>>> Thank you for the reply. I can get your example to work as you indicate, 
>>>> but have some questions:
>>>> 1) I went through the latest SHACL draft 
>>>> <https://w3c.github.io/data-shapes/shacl/> and I cannot find how to know 
>>>> that sh:targetNode always executes. It’s also not clear to me what it 
>>>> means to execute. I thought that sh:targetNode X was a way to restrict a 
>>>> shape to X in the data graph, whatever X might be.
>>> 
>>> It sets the target node to X and that becomes $this.
>>> 
>>> It does not say the target has to be in the graph.
>>> 
>>> The tests use this idiom quite a lot.
>>> 
>>> This matters because in some places the spec is not complete and without 
>>> some light reverse engineering from the tests, I'd not have been able to 
>>> implement some of the SPARQL functionality (particularly SPARQL components, 
>>> not the SPARQl constraints we're talking about here).
>>> 
>>> Also, RDF graphs do not have a formally defined set of nodes - they are a 
>>> set of edges and any nodes you want can b

Re: SHACL Endpoint questions

2020-05-14 Thread Chris Tomlinson
Hi Andy,

Thanks very much for the shacl guidance. The use of sh:targetSubjectsOf is 
quite helpful. I replaced the bdo:personName w/ bdo:isRoot which must be 
present on any Entity resource so that if a Work or Place or other entity is 
checked it will fail if it isn’t a bdo:Person.

This still fails in the event that there is no bdo:isRoot so in some way that 
negative needs also to be caught to weed out really malformed graphs.

I still have a question about the shacl endpoint:

Is the ?graph parameter validated in the context of the entire dataset 
specified in the endpoint URL or just the named graph itself?

It appears to be just the named graph itself so is the same as running the 
shacl command outside of Fuseki.

We are wanting a validation of the named graph against the entire (union) 
dataset graph which will have sufficient information about subClassOf* and 
external resources like P705 without entailing a validation of all nodes 
reachable from triples in the ?graph named graph. This might be similar to:

validator.validate(shapes, dsg, node)

where node would be the root resource URI like, 
<http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707>>.

Is this something that needs an issue raised and a bit of extension of the 
endpoint or is there another way to get this kind of behavior through the 
endpoint?

Thank you very much for your help,
Chris


> On May 13, 2020, at 12:16 PM, Andy Seaborne  wrote:
> 
> 
> 
> On 13/05/2020 16:03, Chris Tomlinson wrote:
>> Hi Andy,
>> Thank you for the reply. I can get your example to work as you indicate, but 
>> have some questions:
>> 1) I went through the latest SHACL draft 
>> <https://w3c.github.io/data-shapes/shacl/> and I cannot find how to know 
>> that sh:targetNode always executes. It’s also not clear to me what it means 
>> to execute. I thought that sh:targetNode X was a way to restrict a shape to 
>> X in the data graph, whatever X might be.
> 
> It sets the target node to X and that becomes $this.
> 
> It does not say the target has to be in the graph.
> 
> The tests use this idiom quite a lot.
> 
> This matters because in some places the spec is not complete and without some 
> light reverse engineering from the tests, I'd not have been able to implement 
> some of the SPARQL functionality (particularly SPARQL components, not the 
> SPARQl constraints we're talking about here).
> 
> Also, RDF graphs do not have a formally defined set of nodes - they are a set 
> of edges and any nodes you want can be used in triples.
> 
>> 2) What I’m trying to do is validate that a resource like 
>> http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707> is a 
>> Person, which at a minimum means that:
>> <http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707>>  
>> a  <http://purl.bdrc.io/ontology/core/Person 
>> <http://purl.bdrc.io/ontology/core/Person>> .
>> is present in the http://purl.bdrc.io/graph/P707 
>> <http://purl.bdrc.io/graph/P707>. The PersonShape 
>> <https://github.com/buda-base/editor-templates/blob/master/templates/core/person.shapes.ttl>
>>  has:
>> sh:targetClass bdo:Person
>> but that only serves to say that PersonShape only applies to resources of 
>> class bdo:Person and if there are none, then there are no violations which 
>> means I can try to validate a bibliographic element such as 
>> http://purl.bdrc.io/resource/W1FPL1 <http://purl.bdrc.io/resource/W1FPL1> 
>> which is of class bdo: ImageInstance but of course that still sh:conforms 
>> true since bds:PersonShape doesn’t apply and hence there aren’t any 
>> violations. (to see the resources, use 
>> http://ldspdi-dev.bdrc.io/resource/W1FPL1.ttl 
>> <http://ldspdi-dev.bdrc.io/resource/W1FPL1.ttl>, for example).
>> The use case is: a client submits a graph of a resource and claims it to be 
>> a bdo:Person or a subClassOf* it; and we want to validate the graph as a 
>> bdo:Person and so want to get the result “false" for bdr:W1FPL1 instead of 
>> “true".
>> It’s our intent to use a tool like shacl for this top-level task as well as 
>> validating the details liuke having at least one name, a gender, and so on.
>> I tried using something like your example:
>> bds:CheckPersonClassShape  a  sh:NodeShape ;
>> rdfs:label  "Check Person Class Shape"@en ;
>> sh:targetNode "Check Class" ;
>> sh:sparql [
>>   a sh:SPARQLConstraint ;
>>   sh:prefixes [
>> sh:declare [
>>   sh:prefix "rdf" ;
>>   sh:namespace "http://www.w3.org/1999/02/22-rdf-sy

Re: SHACL Endpoint questions

2020-05-13 Thread Chris Tomlinson
Hi Andy,

Thank you for the reply. I can get your example to work as you indicate, but 
have some questions:

1) I went through the latest SHACL draft 
<https://w3c.github.io/data-shapes/shacl/> and I cannot find how to know that 
sh:targetNode always executes. It’s also not clear to me what it means to 
execute. I thought that sh:targetNode X was a way to restrict a shape to X in 
the data graph, whatever X might be.

2) What I’m trying to do is validate that a resource like 
http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707> is a 
Person, which at a minimum means that:

<http://purl.bdrc.io/resource/P707 <http://purl.bdrc.io/resource/P707>>  a  
<http://purl.bdrc.io/ontology/core/Person 
<http://purl.bdrc.io/ontology/core/Person>> .

is present in the http://purl.bdrc.io/graph/P707 
<http://purl.bdrc.io/graph/P707>. The PersonShape 
<https://github.com/buda-base/editor-templates/blob/master/templates/core/person.shapes.ttl>
 has:

sh:targetClass bdo:Person

but that only serves to say that PersonShape only applies to resources of class 
bdo:Person and if there are none, then there are no violations which means I 
can try to validate a bibliographic element such as 
http://purl.bdrc.io/resource/W1FPL1 <http://purl.bdrc.io/resource/W1FPL1> which 
is of class bdo: ImageInstance but of course that still sh:conforms true since 
bds:PersonShape doesn’t apply and hence there aren’t any violations. (to see 
the resources, use http://ldspdi-dev.bdrc.io/resource/W1FPL1.ttl 
<http://ldspdi-dev.bdrc.io/resource/W1FPL1.ttl>, for example).

The use case is: a client submits a graph of a resource and claims it to be a 
bdo:Person or a subClassOf* it; and we want to validate the graph as a 
bdo:Person and so want to get the result “false" for bdr:W1FPL1 instead of 
“true".

It’s our intent to use a tool like shacl for this top-level task as well as 
validating the details liuke having at least one name, a gender, and so on.

I tried using something like your example:

bds:CheckPersonClassShape  a  sh:NodeShape ;
rdfs:label  "Check Person Class Shape"@en ;
sh:targetNode "Check Class" ;
sh:sparql [
  a sh:SPARQLConstraint ;
  sh:prefixes [
sh:declare [
  sh:prefix "rdf" ;
  sh:namespace "http://www.w3.org/1999/02/22-rdf-syntax-ns#; ;
] , [
  sh:prefix "bdo" ;
  sh:namespace "http://purl.bdrc.io/ontology/core/; ;
]
  ] ;
  sh:select """
select $this (rdf:type as ?path) (bdo:Person as ?value)
where {
   filter not exists { $this ?path ?value }
}
  """ ;
] ;
.

But this just always reports a violation that the literal, “Check Class”, 
doesn’t conform, which is true since it isn’t in the data graph.

3) The original reason for wanting to use the shacl endpoint was so that we 
could PUT the submitted graph in the Fuseki dataset and then use the endpoint 
to validate the resource bdr:P707 (or bdr:W1FPL1) as a Person (or not) with the 
rest of the dataset graph available to handle things like subClassOf*  and 
subPropertyOf* for various items as well as validating the minimum of resources 
referenced by P707 such as that P705 is a male person and hence can be a father 
of P707.

The graph for P707 that is submitted would only have references to P705, with 
no properties on P705, since that resource is in its own graph.

I thought this is pretty much how validate(Shapes Graph, Node) would work, 
where Graph would be the union dataset graph.

I’m evidently missing some understanding.

I appreciate your patience,
Chris



> On May 12, 2020, at 3:52 AM, Andy Seaborne  wrote:
> 
> Chris,
> 
> Here's a shape that always executes and tests for an empty data graph.
> 
> # No violation
> shacl validate -v -shapes ex-shapes.ttl -data not-empty.ttl
> 
> # Violation
> shacl validate -v -shapes ex-shapes.ttl -data empty.nt
> 
> "sh:targetNode" always executes.
> 
> With this pattern, the SPARQL query can do arbitrary checks.
> 
>Andy
> 
> ## ex-shapes.ttl
> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
> 
> PREFIX sh:  <http://www.w3.org/ns/shacl#>
> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
> 
> PREFIX ex:<http://example/>
> 
> ex:NotEmptyGraphShape
>  rdf:type sh:NodeShape ;
>  sh:targetNode "Empty Graph" ;
>  sh:sparql [
>a sh:SPARQLConstraint ;
>sh:select """
>   SELECT $this ?value
>   WHERE {
>FILTER NOT EXISTS { ?s ?p ?o }
>   }
>   """ ;
>   ] .
> 
> On 11/05/2020 17:14, Chris To

Re: SHACL Endpoint questions

2020-05-11 Thread Chris Tomlinson
Hi Andy,

> On May 11, 2020, at 10:38 AM, Andy Seaborne  wrote:
> 
> 
> 
> On 11/05/2020 16:27, Chris Tomlinson wrote:
>> Darn it!!
>> When I use the correct parameter name it “works":
>> curl -s GET http://ldspdi-dev.bdrc.io/shapes/core/PersonShapes | curl -XPOST 
>> --data-binary @-  --header 'Content-type: text/turtle' 
>> 'http://host:port/fuseki/newcorerw/shacl?graph=http://purl.bdrc.io/graph/P707’
>> I just am not getting over seeing sh:conforms true when it seems like a 
>> third result should be present.
> 
> sh:conforms  true ;
> 
> will only appear if there are no violations.

I appreciate that it works that way but until and unless I can understand your 
point about

 [] sh:targetNode ex:myNode

then I don’t know how to distinguish: 1) no violations because a Person graph 
conforms to the PersonShapes - like there’s no Work indicated as a parent of 
the person or a rdfs:label is used where a skos:prefLabel is expected; versus 
2) no violations because the question is vacuous like asking if a Work looks 
like a person or an empty non-existent graph looks like a person. 

I understand that the shapes graphs in general are not exhaustive and that 
there certainly can be properties on a resource in the target graph that aren’t 
mentioned in the shapes graph; however, when developing shapes in particular it 
seems like knowing whether I’m getting conforms true simply because the shapes 
graph says nothing about the content of the data graph versus saying something 
true about the (part of) the target data graph.


>> I think I don’t get how to properly think of shacl validation.
> 
> A small illustration would help ...

When I request the endpoint to validate P707 as above, I get the same results 
as with the small standalone tests:

[ ash:ValidationReport ;
  sh:conforms  false ;
  sh:result[ a sh:ValidationResult ;
 sh:focusNode  bdr:P707 ;
 sh:resultMessage  
"Node[NodeShape[http://purl.bdrc.io/ontology/shapes/core/MaleShape]] at 
focusNode <http://purl.bdrc.io/resource/P705>" ;
 sh:resultPath bdo:hasFather ;
 sh:resultSeverity sh:Violation ;
 sh:sourceConstraintComponent  sh:NodeConstraintComponent ;
 sh:sourceShapebds:PersonShape-hasFather ;
 sh:value  bdr:P705
   ] ;
  sh:result[ a sh:ValidationResult ;
 sh:focusNode  bdr:P707 ;
 sh:resultMessage  
"ClassConstraint[<http://purl.bdrc.io/ontology/core/Person>]: Expected class 
:<http://purl.bdrc.io/ontology/core/Person> for 
<http://purl.bdrc.io/resource/P705>" ;
 sh:resultPath bdo:hasParent ;
 sh:resultSeverity sh:Violation ;
 sh:sourceConstraintComponent  sh:ClassConstraintComponent ;
 sh:sourceShapebds:PersonShape-hasParent ;
 sh:value  bdr:P705
   ] ;
  sh:result[ a sh:ValidationResult ;
 sh:focusNode  bdr:P707 ;
 sh:resultMessage  
"ClassConstraint[<http://purl.bdrc.io/ontology/core/Person>]: Expected class 
:<http://purl.bdrc.io/ontology/core/Person> for 
<http://purl.bdrc.io/resource/P705>" ;
 sh:resultPath bdo:kinWith ;
 sh:resultSeverity sh:Violation ;
 sh:sourceConstraintComponent  sh:ClassConstraintComponent ;
 sh:sourceShapebds:PersonShape-kinWith ;
 sh:value  bdr:P705
   ] ;
  sh:result[ a sh:ValidationResult ;
 sh:focusNode  bdr:P707 ;
 sh:resultMessage  
"ClassConstraint[<http://purl.bdrc.io/ontology/core/Gender>]: Expected class 
:<http://purl.bdrc.io/ontology/core/Gender> for 
<http://purl.bdrc.io/resource/GenderMale>" ;
 sh:resultPath bdo:personGender ;
 sh:resultSeverity sh:Violation ;
 sh:sourceConstraintComponent  sh:ClassConstraintComponent ;
 sh:sourceShapebds:PersonShape-gender ;
 sh:value  bdr:GenderMale
   ] ;
  sh:result[ a sh:ValidationResult ;
 sh:focusNode  bdr:P707 ;
 sh:resultMessage  
"ClassConstraint[<http://purl.bdrc.io/

Re: SHACL Endpoint questions

2020-05-11 Thread Chris Tomlinson
Darn it!!

When I use the correct parameter name it “works":

curl -s GET http://ldspdi-dev.bdrc.io/shapes/core/PersonShapes | curl -XPOST 
--data-binary @-  --header 'Content-type: text/turtle' 
'http://host:port/fuseki/newcorerw/shacl?graph=http://purl.bdrc.io/graph/P707’

I just am not getting over seeing sh:conforms true when it seems like a third 
result should be present.

I think I don’t get how to properly think of shacl validation.

Anyway sorry to bother ,
Chris


> On May 11, 2020, at 10:19 AM, Chris Tomlinson  
> wrote:
> 
> Hi Andy,
> 
>> On May 10, 2020, at 2:53 PM, Andy Seaborne > <mailto:a...@apache.org>> wrote:
>> 
>> On 08/05/2020 21:34, Chris Tomlinson wrote:
>> 
>>> 2) In any event, when I call the endpoint like:
>>> curl -s GET http://ldspdi-dev.bdrc.io/shapes/core/PersonShapes 
>>> <http://ldspdi-dev.bdrc.io/shapes/core/PersonShapes> | curl -XPOST 
>>> --data-binary @-  --header 'Content-type: text/turtle' 
>>> 'http://ahost:aport/fuseki/newcorerw/shacl?http://purl.bdrc.io/graph/P707 
>>> <http://ahost:aport/fuseki/newcorerw/shacl?http://purl.bdrc.io/graph/P707>'
>> 
>> Maybe it's email corruption but that isn't the invocation syntax.
>> 
>> That should have ?graph=
>> 
>> Otherwise it defaults to "?graph=default" which seems consistent with the 
>> report.
>> 
>> "?http://purl.bdrc.io/graph/P707 <http://purl.bdrc.io/graph/P707>" is going 
>> to be ignored and for you that's the union default graph.
> 
> I’ve tried adding query= parameter and it doesn’t make any difference:
> 
> curl -s GET http://ldspdi-dev.bdrc.io/shapes/core/PersonShapes 
> <http://ldspdi-dev.bdrc.io/shapes/core/PersonShapes> | curl -XPOST 
> --data-binary @-  --header 'Content-type: text/turtle' 
> 'http://host:port/fuseki/newcorerw/shacl?query=http://purl.bdrc.io/graph/P707 
> <http://host:port/fuseki/newcorerw/shacl?query=http://purl.bdrc.io/graph/P707>'
> 
> I get the same results as I reported. When using PersonShapes I see:
> 
> [ ash:ValidationReport ;
>   sh:conforms  false ;
>   sh:result[ a sh:ValidationResult ;
>  sh:focusNode  bdr:UNKNOWN_Person ;
>  sh:resultMessage  "minCount[1]: Invalid 
> cardinality: expected min 1: Got count = 0" ;
>  sh:resultPath bdo:personName ;
>  sh:resultSeverity sh:Violation ;
>  sh:sourceConstraintComponent  sh:MinCountConstraintComponent 
> ;
>  sh:sourceShapebds:PersonShape-personName
>]
> ] .
> 
> Regardless of whether a Person graph (P707), Work graph (W1FPL1), or no graph 
> (NO_GRAPH) is used.
> 
> When using WorkShapes everything reports sh:conforms true ;.
> 
> I have written small test cases using the jena-shacl libs that fetch the 
> above shapes and target graphs and they produce expected validation results. 
> 
> To fully conform some elements of the dataset union graph are needed which is 
> why I’m investigating the shacl endpoint as a way of performing validation of 
>  single graph in the context of the entire dataset.
> 
> 
>>> Is there any way to tell whether the shapes graph in some sense doesn’t 
>>> apply to the data graph? This seems like an important distinction.
>> 
>> You can add a constraint that is always triggered.
>> 
>> [] sh:targetNode ex:myNode
>> 
>> is always triggered; it does not require ex:myNode to be in the data.
>> 
>> From there, a SPARQL constraint could do any validations for "right graph", 
>> "empty graph" etc.
> 
> Thanks for the pointer. I’ll explore this idea once I get more understanding.
> 
> Also, what is the relationship of jena-shacl to TopBraid SHACL API 
> <https://github.com/TopQuadrant/shacl>?
> 
> Thanks,
> Chris
> 
> 



Re: SHACL Endpoint questions

2020-05-11 Thread Chris Tomlinson
Hi Andy,

> On May 10, 2020, at 2:53 PM, Andy Seaborne  wrote:
> 
> On 08/05/2020 21:34, Chris Tomlinson wrote:
> 
>> 2) In any event, when I call the endpoint like:
>> curl -s GET http://ldspdi-dev.bdrc.io/shapes/core/PersonShapes | curl -XPOST 
>> --data-binary @-  --header 'Content-type: text/turtle' 
>> 'http://ahost:aport/fuseki/newcorerw/shacl?http://purl.bdrc.io/graph/P707'
> 
> Maybe it's email corruption but that isn't the invocation syntax.
> 
> That should have ?graph=
> 
> Otherwise it defaults to "?graph=default" which seems consistent with the 
> report.
> 
> "?http://purl.bdrc.io/graph/P707; is going to be ignored and for you that's 
> the union default graph.

I’ve tried adding query= parameter and it doesn’t make any difference:

curl -s GET http://ldspdi-dev.bdrc.io/shapes/core/PersonShapes | curl -XPOST 
--data-binary @-  --header 'Content-type: text/turtle' 
'http://host:port/fuseki/newcorerw/shacl?query=http://purl.bdrc.io/graph/P707'

I get the same results as I reported. When using PersonShapes I see:

[ ash:ValidationReport ;
  sh:conforms  false ;
  sh:result[ a sh:ValidationResult ;
 sh:focusNode  bdr:UNKNOWN_Person ;
 sh:resultMessage  "minCount[1]: Invalid 
cardinality: expected min 1: Got count = 0" ;
 sh:resultPath bdo:personName ;
 sh:resultSeverity sh:Violation ;
 sh:sourceConstraintComponent  sh:MinCountConstraintComponent ;
 sh:sourceShapebds:PersonShape-personName
   ]
] .

Regardless of whether a Person graph (P707), Work graph (W1FPL1), or no graph 
(NO_GRAPH) is used.

When using WorkShapes everything reports sh:conforms true ;.

I have written small test cases using the jena-shacl libs that fetch the above 
shapes and target graphs and they produce expected validation results. 

To fully conform some elements of the dataset union graph are needed which is 
why I’m investigating the shacl endpoint as a way of performing validation of  
single graph in the context of the entire dataset.


>> Is there any way to tell whether the shapes graph in some sense doesn’t 
>> apply to the data graph? This seems like an important distinction.
> 
> You can add a constraint that is always triggered.
> 
> [] sh:targetNode ex:myNode
> 
> is always triggered; it does not require ex:myNode to be in the data.
> 
> From there, a SPARQL constraint could do any validations for "right graph", 
> "empty graph" etc.

Thanks for the pointer. I’ll explore this idea once I get more understanding.

Also, what is the relationship of jena-shacl to TopBraid SHACL API 
<https://github.com/TopQuadrant/shacl>?

Thanks,
Chris




Re: SHACL Endpoint questions

2020-05-10 Thread Chris Tomlinson
Hi Andy,

Thanks for the reply. I'll try again tomorrow with the clarification regarding 
?graph= which is in the doc but not the examples and I didn't read the doc 
above the examples closely.

Chris

> On May 10, 2020, at 14:53, Andy Seaborne  wrote:
> 
> 
> 
>> On 08/05/2020 21:34, Chris Tomlinson wrote:
>> Hello,
>> I’ve enabled a shacl endpoint as described in Apache Jena Shacl 
>> <https://jena.apache.org/documentation/shacl/> and have some questions.
>> Our objective in using the endpoint is to be able to validate a given 
>> resource graph in the tdb:unionDefaultGraph against a given shapes graph.
>> 1) I added:
>> fuseki:endpoint  [ fuseki:operation fuseki:shacl ; fuseki:name "shacl" ] 
>> ;
>> to the Fuseki assembler file for a dataset and restarted, and it is 
>> reachable.
>> One observation is that using the above “new style” endpoint declaration 
>> apparently can be replaced by:
>> fuseki:serviceShacl  "shacl" ;
>> I’m not sure about this but the comment in the doc:
>>> This is not installed into a dataset setup by default; a configuration file 
>>> using fuseki:serviceShacl is necessary
>> seems to suggest it sort of.
> 
> That wasn't intended - I'll fix the documentation.
> 
> The limitation is that fuseki:serviceQuery etc require fixed names to known 
> to the configuration engine which is in fuseki-core.
> 
> Extension operations (new operations can be added to Fuseki) use the new 
> style where the predicate is fuseki:operation. In fact, for SHACL, it can be 
> added quite simply because it comes from Jena itself but the code seems to 
> say that the old-style isn't enabled.
> 
> 
>> 2) In any event, when I call the endpoint like:
>> curl -s GET http://ldspdi-dev.bdrc.io/shapes/core/PersonShapes | curl -XPOST 
>> --data-binary @-  --header 'Content-type: text/turtle' 
>> 'http://ahost:aport/fuseki/newcorerw/shacl?http://purl.bdrc.io/graph/P707'
> 
> Maybe it's email corruption but that isn't the invocation syntax.
> 
> That should have ?graph=
> 
> Otherwise it defaults to "?graph=default" which seems consistent with the 
> report.
> 
> "?http://purl.bdrc.io/graph/P707; is going to be ignored and for you that's 
> the union default graph.
> 
> 
>> I see a result like:
>> [ ash:ValidationReport ;
>>   sh:conforms  false ;
>>   sh:result[ a sh:ValidationResult ;
>>  sh:focusNode  bdr:UNKNOWN_Person ;
>>  sh:resultMessage  "minCount[1]: Invalid 
>> cardinality: expected min 1: Got count = 0" ;
>>  sh:resultPath bdo:personName ;
>>  sh:resultSeverity sh:Violation ;
>>  sh:sourceConstraintComponent  
>> sh:MinCountConstraintComponent ;
>>  sh:sourceShapebds:PersonShape-personName
>>]
>> ] .
>> which is surprising given that there’s no reference in the graph 
>> http://purl.bdrc.io/graph/P707 <http://purl.bdrc.io/graph/P707>  that refers 
>> eventually to bdr:UNKNOWN_Person. The P707 graph is a Person graph. I’ve 
>> also tried using an encoded url, http%3a%2f%2fpurl.bdrc.io%2fgraph%2fP707, 
>> but that makes no difference.
>> In fact, using a non-Person graph like http://purl.bdrc.io/graph/W12827 
>> <http://purl.bdrc.io/graph/W12827> produces the same result. As does, 
>> submitting a non-existent graph URL like http://no.such.org/flip/flop 
>> <http://no.such.org/flip/flop>.
> 
> See above.
> 
>> This all leads me to conclude that SHACL-Validation#L61 
>> <https://github.com/apache/jena/blob/ab7882a73445c7a75e811eb58d06211c410891b0/jena-fuseki2/jena-fuseki-core/src/main/java/org/apache/jena/fuseki/servlets/SHACL_Validation.java#L61>
>>  isn’t getting in a way that I understand.
>> I haven’t found any logging that gives any helpful entries and so I’m asking 
>> questions rather than diving deeper for now.
>> If I use another shape graph like 
>> http://ldspdi-dev.bdrc.io/shapes/core/WorkShapes 
>> <http://ldspdi-dev.bdrc.io/shapes/core/WorkShapes>, then I get a result like:
>> [ ash:ValidationReport ;
>>   sh:conforms  true
>> ] .
>> again regardless of the graph argument.
>> I don’t understand the results.
>> 3) This result leads to a related question. It seems that a result of:
>> sh:conforms  true
>> means that the data graph conforms to the shapes graph in all respects where 
>> the sha

SHACL Endpoint questions

2020-05-08 Thread Chris Tomlinson
Hello,

I’ve enabled a shacl endpoint as described in Apache Jena Shacl 
 and have some questions.

Our objective in using the endpoint is to be able to validate a given resource 
graph in the tdb:unionDefaultGraph against a given shapes graph.

1) I added:

fuseki:endpoint  [ fuseki:operation fuseki:shacl ; fuseki:name "shacl" ] ;

to the Fuseki assembler file for a dataset and restarted, and it is reachable. 

One observation is that using the above “new style” endpoint declaration 
apparently can be replaced by:

fuseki:serviceShacl  "shacl" ;

I’m not sure about this but the comment in the doc:

> This is not installed into a dataset setup by default; a configuration file 
> using fuseki:serviceShacl is necessary

seems to suggest it sort of.

2) In any event, when I call the endpoint like:

curl -s GET http://ldspdi-dev.bdrc.io/shapes/core/PersonShapes | curl -XPOST 
--data-binary @-  --header 'Content-type: text/turtle' 
'http://ahost:aport/fuseki/newcorerw/shacl?http://purl.bdrc.io/graph/P707'

I see a result like:

[ ash:ValidationReport ;
  sh:conforms  false ;
  sh:result[ a sh:ValidationResult ;
 sh:focusNode  bdr:UNKNOWN_Person ;
 sh:resultMessage  "minCount[1]: Invalid 
cardinality: expected min 1: Got count = 0" ;
 sh:resultPath bdo:personName ;
 sh:resultSeverity sh:Violation ;
 sh:sourceConstraintComponent  sh:MinCountConstraintComponent ;
 sh:sourceShapebds:PersonShape-personName
   ]
] .

which is surprising given that there’s no reference in the graph 
http://purl.bdrc.io/graph/P707   that refers 
eventually to bdr:UNKNOWN_Person. The P707 graph is a Person graph. I’ve also 
tried using an encoded url, http%3a%2f%2fpurl.bdrc.io%2fgraph%2fP707, but that 
makes no difference.

In fact, using a non-Person graph like http://purl.bdrc.io/graph/W12827 
 produces the same result. As does, 
submitting a non-existent graph URL like http://no.such.org/flip/flop 
.

This all leads me to conclude that SHACL-Validation#L61 

 isn’t getting in a way that I understand. 

I haven’t found any logging that gives any helpful entries and so I’m asking 
questions rather than diving deeper for now.

If I use another shape graph like 
http://ldspdi-dev.bdrc.io/shapes/core/WorkShapes 
, then I get a result like:

[ ash:ValidationReport ;
  sh:conforms  true
] .

again regardless of the graph argument.

I don’t understand the results.

3) This result leads to a related question. It seems that a result of:

sh:conforms  true

means that the data graph conforms to the shapes graph in all respects where 
the shapes graph picks out features in the data graph, even if the shapes graph 
picks out nothing at all in the data graph.

Is there any way to tell whether the shapes graph in some sense doesn’t apply 
to the data graph? This seems like an important distinction.

I appreciate any help on these questions,
Chris






Re: OntDocumentManager and related questions

2020-04-23 Thread Chris Tomlinson
I think I now understand a little better about OntDocumentManager and 
OntModels. I stared at Jena code, documentation 
<https://jena.apache.org/documentation/ontology/> and more test code. Maybe the 
following will be helpful to someone else.

> On Apr 22, 2020, at 6:21 PM, Chris Tomlinson  
> wrote:
> 
> ...
> 
> There are several things that don’t seem to work as expected from the docs:
> 
> 1) The code:
> 
>> OntModel om = odm.getOntology(ontUri, oms); 
> 
> returns a model w/ just the contents of the file located via baseURI:
> 
>> http://purl.bdrc.io/shapes/core/PersonShapes/ 
>> <http://purl.bdrc.io/shapes/core/PersonShapes/>
> from the ont-policy.rdf 
> <https://raw.githubusercontent.com/buda-base/editor-templates/master/ont-policy.rdf>,
>  regardless of the setting:
> 
>>  odm.setProcessImports(true or false);
> 
> The javadocs indicate that if true the imports should be processed by 
> getOntology, but this isn’t happening at this point?

Key point here was I overlooked:

> write() on an ontology model will only write the statements from the base 
> model.

so even though the imports were processed, om.write(...) was misleading me in 
that it only writes om.getBaseModel() so I wasn’t seeing the imported content.

The call om.writeAll(…) produces the desired result as the documentation 
<https://jena.apache.org/documentation/ontology/> indicates. Another case of 
failing to rtfm on my part.

The OntModel contains a baseModel resulting from loading the top-level document 
and a collection of subModels one for the baseModel and an OntModel for each of 
the imported documents.



> 2) However, when
> 
>>  String graphName = BDG+graphLocalName;
>> fuConn.put(graphName, om);
> 
> 
> Then the graph is stored on Fuseki with the given graphName and if
> 
>>  odm.setProcessImports(true);
> 
> 
> then the imported triples appear in the graph on Fuseki even though they 
> don’t appear to be present in local OntModel, om, in the app.
> 
> It’s almost like the fuConn.put() causes the imports to be loaded on-the-fly.
> 
> Why is this?

The crux is that fuConn.put(…) performs om.getGraph() which returns a graph 
that contains all the content of the baseModel and the subModels. So what’s 
transferred to fuseki has all the imported content which does look mysterious 
when compared to the document resulting from om.write(…) versus om.writeAll(…).



> 3) Finally, contrary the javadocs:
> 
>> om.loadImports();
> 
> does not lead to loading the imports when:
> 
>>  odm.setProcessImports(true or false);
> 
> Why?

Same error as number 1).


Hopefully this is some value to others,
Chris


> I’m sure I’ve mucked up something but I don’t see it.
> 
> Thanks for help on this,
> Chris
> 
> 
> 



OntDocumentManager and related questions

2020-04-22 Thread Chris Tomlinson
Hello,

We’re having some difficulties using the OntDocumentManager properly. I’ve put 
together a test app  that exercises several 
usages.

I am running it in eclipse with references to the jena libs from 3.15-SNAPSHOT 
 which I built locally via

> mvn clean install -Pdev

other than this, the test app is standalone. The

>   private static final String OUT =  files>;
>   private static final String DS =  models/graphs>;


need to be set properly as indicated.

There are several things that don’t seem to work as expected from the docs:

1) The code:

> OntModel om = odm.getOntology(ontUri, oms); 

returns a model w/ just the contents of the file located via baseURI:

> http://purl.bdrc.io/shapes/core/PersonShapes/ 
> 
from the ont-policy.rdf 
,
 regardless of the setting:

>  odm.setProcessImports(true or false);

The javadocs indicate that if true the imports should be processed by 
getOntology, but this isn’t happening at this point?


2) However, when

>   String graphName = BDG+graphLocalName;
> fuConn.put(graphName, om);


Then the graph is stored on Fuseki with the given graphName and if

>  odm.setProcessImports(true);


then the imported triples appear in the graph on Fuseki even though they don’t 
appear to be present in local OntModel, om, in the app.

It’s almost like the fuConn.put() causes the imports to be loaded on-the-fly.

Why is this?


3) Finally, contrary the javadocs:

> om.loadImports();

does not lead to loading the imports when:

>  odm.setProcessImports(true or false);

Why?


I’m sure I’ve mucked up something but I don’t see it.

Thanks for help on this,
Chris





Re: how to disable DEBUG and TRACE in 3.15.0-SNAPSHOT

2020-04-22 Thread Chris Tomlinson
Never mind. The messages disappeared when I created a copy of the test w/ a 
different name?!

Sorry for the interruption,
Chris


> On Apr 22, 2020, at 12:18 PM, Chris Tomlinson  
> wrote:
> 
> Hi,
> 
> I’m trying to test some code against 3.15.0-SNAPSHOT referring to jena-arq, 
> jena-base, jena-core, and jena-rdfconnection in an eclipse environment and 
> when I run the test app (which, btw, works fine) I get many many lines of 
> output like:
> 
>> DEBUG StatusLogger Using ShutdownCallbackRegistry class 
>> org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry
>> DEBUG StatusLogger Not in a ServletContext environment, thus not loading 
>> WebLookup plugin.
>> DEBUG StatusLogger AsyncLogger.ThreadNameStrategy=CACHED (user specified 
>> null, default is CACHED)
>> TRACE StatusLogger Using default SystemClock for timestamps.
>> DEBUG StatusLogger org.apache.logging.log4j.core.util.SystemClock does not 
>> support precise timestamps.
>> DEBUG StatusLogger Not in a ServletContext environment, thus not loading 
>> WebLookup plugin.
>> DEBUG StatusLogger Took 0.099440 seconds to load 215 plugins from 
>> sun.misc.Launcher$AppClassLoader@6d06d69c
>> DEBUG StatusLogger PluginManager 'Converter' found 44 plugins
>> DEBUG StatusLogger Starting OutputStreamManager SYSTEM_OUT.false.false-1
>> DEBUG StatusLogger Starting LoggerContext[name=6d06d69c, 
>> org.apache.logging.log4j.core.LoggerContext@64729b1e]...
>> DEBUG StatusLogger Reconfiguration started for context[name=6d06d69c] at URI 
>> null (org.apache.logging.log4j.core.LoggerContext@64729b1e) with optional 
>> ClassLoader: null
>> DEBUG StatusLogger Not in a ServletContext environment, thus not loading 
>> WebLookup plugin.
>> DEBUG StatusLogger PluginManager 'ConfigurationFactory' found 4 plugins
> 
> at the beginning and end of the run, with the output of the test code in 
> between.
> 
> I looked through the log4j2.properties files and don’t see any DEBUG or TRACE 
> enabled.
> 
> What is the way to disable this logging?
> 
> Thanks,
> Chris
> 
> 



how to disable DEBUG and TRACE in 3.15.0-SNAPSHOT

2020-04-22 Thread Chris Tomlinson
Hi,

I’m trying to test some code against 3.15.0-SNAPSHOT referring to jena-arq, 
jena-base, jena-core, and jena-rdfconnection in an eclipse environment and when 
I run the test app (which, btw, works fine) I get many many lines of output 
like:

> DEBUG StatusLogger Using ShutdownCallbackRegistry class 
> org.apache.logging.log4j.core.util.DefaultShutdownCallbackRegistry
> DEBUG StatusLogger Not in a ServletContext environment, thus not loading 
> WebLookup plugin.
> DEBUG StatusLogger AsyncLogger.ThreadNameStrategy=CACHED (user specified 
> null, default is CACHED)
> TRACE StatusLogger Using default SystemClock for timestamps.
> DEBUG StatusLogger org.apache.logging.log4j.core.util.SystemClock does not 
> support precise timestamps.
> DEBUG StatusLogger Not in a ServletContext environment, thus not loading 
> WebLookup plugin.
> DEBUG StatusLogger Took 0.099440 seconds to load 215 plugins from 
> sun.misc.Launcher$AppClassLoader@6d06d69c
> DEBUG StatusLogger PluginManager 'Converter' found 44 plugins
> DEBUG StatusLogger Starting OutputStreamManager SYSTEM_OUT.false.false-1
> DEBUG StatusLogger Starting LoggerContext[name=6d06d69c, 
> org.apache.logging.log4j.core.LoggerContext@64729b1e]...
> DEBUG StatusLogger Reconfiguration started for context[name=6d06d69c] at URI 
> null (org.apache.logging.log4j.core.LoggerContext@64729b1e) with optional 
> ClassLoader: null
> DEBUG StatusLogger Not in a ServletContext environment, thus not loading 
> WebLookup plugin.
> DEBUG StatusLogger PluginManager 'ConfigurationFactory' found 4 plugins

at the beginning and end of the run, with the output of the test code in 
between.

I looked through the log4j2.properties files and don’t see any DEBUG or TRACE 
enabled.

What is the way to disable this logging?

Thanks,
Chris




Re: Apache Jena Fuseki with text indexing

2020-03-26 Thread Chris Tomlinson
Zhenya,

Do you see any content in the directory:

> text:directory  ;

like the following partial listing:

> fuseki@foo :~/base/lucene-test$ ls -l
> total 3608108
> -rw-rw 1 fuseki fuseki   7772 Jan 29 21:15 _19a_5x.liv
> -rw-r- 1 fuseki fuseki299 Jan 21 15:53 _19a.cfe
> -rw-r- 1 fuseki fuseki   36547721 Jan 21 15:53 _19a.cfs
> -rw-r- 1 fuseki fuseki443 Jan 21 15:53 _19a.si
> -rw-r- 1 fuseki fuseki  23621 Jan 21 15:53 _24_17n.liv
> -rw-r- 1 fuseki fuseki   22718569 Jan 21 15:53 _24.fdt
> -rw-r- 1 fuseki fuseki   9184 Jan 21 15:53 _24.fdx
> -rw-r- 1 fuseki fuseki  12975 Jan 21 15:53 _24.fnm
> -rw-r- 1 fuseki fuseki7009762 Jan 21 15:53 _24_Lucene50_0.doc
> -rw-r- 1 fuseki fuseki3804794 Jan 21 15:53 _24_Lucene50_0.pos
> -rw-r- 1 fuseki fuseki   16186474 Jan 21 15:53 _24_Lucene50_0.tim
> -rw-r- 1 fuseki fuseki 103945 Jan 21 15:53 _24_Lucene50_0.tip
> -rw-r- 1 fuseki fuseki 667296 Jan 21 15:53 _24.nvd
> -rw-r- 1 fuseki fuseki   4027 Jan 21 15:53 _24.nvm
> -rw-r- 1 fuseki fuseki540 Jan 21 15:53 _24.si

Also if you don’t have storevalues true then queries like:

(?s ?score ?lit) text:query “ribosome”

won’t bind anything to ?lit. The storevalues is set like:

> # Text index description
> :test_lucene_index a text:TextIndexLucene ;
> text:directory  ;
> text:storeValues true ;
> text:entityMap :test_entmap ;


Also you need to reload the data if you change the configuration so that the 
indexing will be done according to the configuration.

ciao,
Chris


> On Mar 26, 2020, at 10:33 AM, Zhenya Antić  wrote:
> 
> @prefix :  .
> @prefix tdb2:  .
> @prefix rdf:  .
> @prefix ja:  .
> @prefix rdfs:  .
> @prefix fuseki:  .
> @prefix text:  .
> 
> 
> rdfs:subClassOf ja:RDFDataset .
> 
> ja:DatasetTxnMem rdfs:subClassOf ja:RDFDataset .
> 
> tdb2:DatasetTDB2 rdfs:subClassOf ja:RDFDataset .
> 
> tdb2:GraphTDB2 rdfs:subClassOf ja:Model .
> 
> 
> rdfs:subClassOf ja:Model .
> 
> ja:MemoryDataset rdfs:subClassOf ja:RDFDataset .
> 
> ja:RDFDatasetZero rdfs:subClassOf ja:RDFDataset .
> 
> 
> rdfs:subClassOf ja:RDFDataset .
> 
> :service_tdb_all a fuseki:Service ;
> rdfs:label "TDB biology" ;
> fuseki:dataset :tdb_dataset_readwrite ;
> fuseki:name "biology" ;
> fuseki:serviceQuery "query" , "" , "sparql" ;
> fuseki:serviceReadGraphStore "get" ;
> fuseki:serviceReadQuads "" ;
> fuseki:serviceReadWriteGraphStore
> "data" ;
> fuseki:serviceReadWriteQuads "" ;
> fuseki:serviceUpdate "" , "update" ;
> fuseki:serviceUpload "upload" .
> 
> :tdb_dataset_readwrite
> a tdb2:DatasetTDB2 ;
> tdb2:location "db" .
> 
> 
> rdfs:subClassOf ja:Model .
> 
> ja:RDFDatasetOne rdfs:subClassOf ja:RDFDataset .
> 
> ja:RDFDatasetSink rdfs:subClassOf ja:RDFDataset .
> 
> 
> rdfs:subClassOf ja:RDFDataset .
> 
> <#dataset> rdf:type tdb2:DatasetTDB2 ;
> tdb2:location "db" ; #path to TDB;
> .
> 
> # Text index description
> :text_dataset rdf:type text:TextDataset ;
> text:dataset <#dataset> ; # <-- replace `:my_dataset` with the desired URI
> text:index <#indexLucene> ;
> .
> 
> <#indexLucene> a text:TextIndexLucene ;
> text:directory  ;
> text:entityMap <#entMap> ;
> .
> 
> <#entMap> a text:EntityMap ;
> text:defaultField "text" ;
> text:entityField "uri" ;
> text:map (
> #RDF label abstracts
> [ text:field "text" ;
> text:predicate  ;
> text:analyzer [
> a text:StandardAnalyzer
> ] 
> ]
> [ text:field "text" ;
> text:predicate  ;
> text:analyzer [
> a text:StandardAnalyzer
> ] 
> ]
> ) .
> 
> 
> 
> <#service_text_tdb> rdf:type fuseki:Service ;
> fuseki:name "ds" ;
> fuseki:serviceQuery "query" ;
> fuseki:serviceQuery "sparql" ;
> fuseki:serviceUpdate "update" ;
> fuseki:serviceUpload "upload" ;
> fuseki:serviceReadGraphStore "get" ;
> fuseki:serviceReadWriteGraphStore "data" ;
> fuseki:dataset :text_dataset ;
> .
> 
> 
> 
> On Thu, Mar 26, 2020, at 11:31 AM, Zhenya Antić wrote:
>> Hi Andy,
>> 
>> Thanks. So I think I have all the lines you listed in the .ttl file 
>> (attached). I also checked, the data file contains the relevant data. But I 
>> have 0 properties indexed.
>> 
>> Thanks,
>> Zhenya
>> 
>> 
>> 
>> On Wed, Mar 25, 2020, at 4:41 AM, Andy Seaborne wrote:
>>> 
>>> 
>>> On 24/03/2020 15:11, Zhenya Antić wrote:
 Hi Andy,
 
> Did you load the data before attaching the text index?
 
 How do I do it (or not do it, wasn't sure from your post)?
>>> 
>>> 

Re: Text search and similar

2020-01-12 Thread Chris Tomlinson
Hi Mikael,

> On Jan 10, 2020, at 4:26 AM, Mikael Pesonen  
> wrote:
> 
> 
> Hi Chris,
> 
> On 09/01/2020 17.50, Chris Tomlinson wrote:
>> Hello Br,
>> 
>>> On Jan 9, 2020, at 3:34 AM, Mikael Pesonen  
>>> wrote:
>>> 
>>> 
>>> Hi,
>>> 
>>> I asked about these few years ago so maybe there is some new ideas.
>>> 
>>> 1) Is it possible to config text index so that it would add, for example, 
>>> all textual values (xsd:string etc) to index automatically? Now every 
>>> property has to be configured manually.
>> No it is not currently possible. Perhaps more detail on how you would see 
>> using such a feature and how you would handle various literal datatypes 
>> (convert all to xsd:string?) and then how would you search, currently 
>> searches are focussed on one or more properties - a recent update allows to 
>> provide a list of properties that can be searched in a single Lucene search. 
>> More detail is available at 
>> https://jena.apache.org/documentation/query/text-query.html 
>> <https://jena.apache.org/documentation/query/text-query.html>.
> In ideal case all values that are of type string literal would be indexed. 
> Querys would work as now, you would define the properties you are querying, 
> for example
> 
> *(?concept ?score ?prefLabel) text:query (skos:prefLabel "tech*" "lang:en") 
> Of course I don't know how hard this would be to implement. *

So, you're wanting objects of type xsd:string and rdf:langString to be indexed 
with the property/predicate appearing in the triple. This in turn would mean 
that a field name would need to be created based on the resource localName of 
the property and for rdf:langString a default lang field name would need to be 
defined in the assembler file along with whatever multi-language analyzer 
structure is needed. This is tantamount to creating the entmap for the Lucene 
index configuration on-the-fly.


>> 
>>> 2) Is there planned support for searching similar resources, based on the 
>>> Lucene index?
>> I’m not aware of any such plans. More detail would be needed to evaluate 
>> feasibility, in particular how to recognize resources as similar.
>> 
>> Please note that the Jena+Lucene model is to index individual triples as 
>> Lucene documents not entire graphs or models which in turn leads to indexing 
>> and searching focussed on properties.
> This would be fine. At least for our needs it would enough to find similar 
> values only, not entire resources.

I’m sorry I still don’t know what constitutes "similar values”. I’m guessing 
you’re referring to using Lucene fuzzy matches, proximity matches and the like. 
These are already supported to an extent (see Jena Full Text Search 
<https://jena.apache.org/documentation/query/text-query.html>).

This sort of thing would not be released until Jena 3.15 at the earliest. I 
haven’t given any implementation thought to this other than what’s written here.

Regards,
Chris


>> 
>> Chris
>> 
>>> Br
>>> 
>>> -- 
>>> 
>> 
> 
> -- 
> Lingsoft - 30 years of Leading Language Management
> 
> www.lingsoft.fi
> 
> Speech Applications - Language Management - Translation - Reader's and 
> Writer's Tools - Text Tools - E-books and M-books
> 
> Mikael Pesonen
> System Engineer
> 
> e-mail: mikael.peso...@lingsoft.fi
> Tel. +358 2 279 3300
> 
> Time zone: GMT+2
> 
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
> 
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
> 



Re: Text search and similar

2020-01-09 Thread Chris Tomlinson
Hello Br,

> On Jan 9, 2020, at 3:34 AM, Mikael Pesonen  wrote:
> 
> 
> Hi,
> 
> I asked about these few years ago so maybe there is some new ideas.
> 
> 1) Is it possible to config text index so that it would add, for example, all 
> textual values (xsd:string etc) to index automatically? Now every property 
> has to be configured manually.

No it is not currently possible. Perhaps more detail on how you would see using 
such a feature and how you would handle various literal datatypes (convert all 
to xsd:string?) and then how would you search, currently searches are focussed 
on one or more properties - a recent update allows to provide a list of 
properties that can be searched in a single Lucene search. More detail is 
available at https://jena.apache.org/documentation/query/text-query.html 
.

> 
> 2) Is there planned support for searching similar resources, based on the 
> Lucene index?

I’m not aware of any such plans. More detail would be needed to evaluate 
feasibility, in particular how to recognize resources as similar.

Please note that the Jena+Lucene model is to index individual triples as Lucene 
documents not entire graphs or models which in turn leads to indexing and 
searching focussed on properties.

Chris

> 
> Br
> 
> -- 
> 



Re: No such type:

2019-10-06 Thread Chris Tomlinson
Hello Laura,

Now that another issue is being examined some data and queries will be needed.

Thanks,
Chris

> On Oct 6, 2019, at 16:26, Laura Morales  wrote:
> 
> $ cat run/config.ttl
> @prefix :<#> .
> @prefix fuseki:  <http://jena.apache.org/fuseki#> .
> @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
> @prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#> .
> @prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
> @prefix ja:  <http://jena.hpl.hp.com/2005/11/Assembler#> .
> @prefix text:<http://jena.apache.org/text#> .
> 
> [] rdf:type fuseki:Server ;
>ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "3" ] .
> 
> 
> --
> 
> 
> $ cat run/configuration/demo.ttl
> PREFIX :<#>
> PREFIX fuseki:  <http://jena.apache.org/fuseki#>
> PREFIX ja:  <http://jena.hpl.hp.com/2005/11/Assembler#>
> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
> PREFIX tdb: <http://jena.hpl.hp.com/2008/tdb#>
> PREFIX text:<http://jena.apache.org/text#>
> 
> :service a fuseki:Service ;
>rdfs:label "demo" ;
>fuseki:name "demo" ;
>fuseki:serviceQuery "query" ;
>fuseki:dataset :text_dataset .
> 
> :text_dataset a text:TextDataset ;
>text:dataset :dataset ;
>text:index   :dataset_index .
> 
> :dataset a a tdb:DatasetTDB ;
>tdb:location "..." .
> 
> :dataset_index a text:TextIndexLucene ;
>text:directory  ;
>text:entityMap :index_map .
> 
> :index_map a text:EntityMap ;
>text:entityField  "uri" ;
>text:defaultField "field" ;
>text:map ([
>text:field "field" ;
>text:predicate rdfs:label ]) .
> 
> 
> --
> 
> 
> $ ./fuseki-server --version
> Jena:   VERSION: 3.12.0
> Jena:   BUILD_DATE: 2019-05-27T16:07:27+
> TDB:VERSION: 3.12.0
> TDB:BUILD_DATE: 2019-05-27T16:07:27+
> Fuseki: VERSION: 3.12.0
> Fuseki: BUILD_DATE: 2019-05-27T16:07:27+
> 
> 
> --
> 
> With the above configuration I do not get a "No such type" error, but I get a 
> sort of mixed behavior. Sometimes it seems to work, while most of the times 
> it returns zero results. And I cannot reproduce it either... it gives me zero 
> results except a few times it magically starts working (returning results).
> I get the "No such type" error when using a RDFDataset instead of DatasetTDB, 
> but at this point I'd love to understand what's going on here before trying 
> to understand the RDFDataset error.
> 
> 
> 
>> Sent: Sunday, October 06, 2019 at 8:10 PM
>> From: "Chris Tomlinson" 
>> To: users@jena.apache.org
>> Subject: Re: No such type: <http://jena.apache.org/text#TextDataset>
>> 
>> Hi Laura,
>> 
>> It would be helpful to see the assembler file. Then we may get closer to 
>> whether there's a bug.
>> 
>> Regards,
>> Chris


Re: No such type:

2019-10-06 Thread Chris Tomlinson
Hi Laura,

It would be helpful to see the assembler file. Then we may get closer to 
whether there's a bug.

Regards,
Chris

> On Oct 6, 2019, at 12:27, Laura Morales  wrote:
> 
> I'm trying to enable full text search on a Fuseki v3.12 instance but I get 
> the error shown below. The assembler is pretty much a copycat of the 
> documentation, with a Lucene text index. The assembler contains the prefix 
> "text: ".
> Is this a bug?
> 
> $ java -cp fuseki-server.jar jena.textindexer --desc=run/config.ttl
> org.apache.jena.sparql.ARQException: No such type: 
> 
>at 
> org.apache.jena.sparql.core.assembler.AssemblerUtils.build(AssemblerUtils.java:134)
>at 
> org.apache.jena.query.text.TextDatasetFactory.create(TextDatasetFactory.java:38)
>at jena.textindexer.processModulesAndArgs(textindexer.java:90)
>at jena.cmd.CmdArgModule.process(CmdArgModule.java:52)
>at jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
>at jena.cmd.CmdMain.mainRun(CmdMain.java:58)
>at jena.cmd.CmdMain.mainRun(CmdMain.java:45)
>at jena.textindexer.main(textindexer.java:52)
> 


Re: JenaText: support for explicit field names in text queries

2019-09-01 Thread Chris Tomlinson
Hi again Brian,

I looked a bit more and it’s not clear how to “fix” the issue after all. The 
change I suggested to TextIndexLucene uncovers a basic issue.

When using a query such as:

(?s ?score ?lit) text:query ( “some query string” 300 ) .

The code currently inserts the primaryField, e.g., rdfs:label or what have you 
and then TextQueryPF binds the hit value from Lucene to the ?lit by looking up 
the matching field value in the result doc returned by Lucene; however, the 
change I suggested no longer defaults to the primaryField and so there’s an 
error during the result binding handling in TextQueryPF.

The basic problem is that there’s an ambiguity with:

… text:query ( “some query string” 300 ) .

The current code doesn’t know whether there are fields mentioned in the query 
string or not. 

If there are fields in the query string then the use of the

(?s ?score ?lit) text:query …

form must be disallowed since there’s no way to know what field value to 
retrieve from the Lucene query result documents without further analysis of the 
query string. Apparently in your application there will generally be two or 
more matching fields in each result document and it would be further 
complicated to figure out what matching field value to use - or invent another 
syntax from grabbing more than a single ?lit per result doc.

If there are no fields mentioned in the query string then the primaryField 
should be used explicitly and then ?lit can be bound to an appropriate match 
value as currently.

Perhaps you can raise a Jena issue and we can discuss and see what can be done.

Regards,
Chris



> On Sep 1, 2019, at 2:25 PM, Chris Tomlinson  
> wrote:
> 
> Hi Brian,
> 
>> On Sep 1, 2019, at 7:17 AM, Brian McBride > <mailto:brian.mcbr...@epimorphics.com>> wrote:
>> 
>> It used to be the case that JenaText supported querying of a Lucene text 
>> index where the index was created independently of Jena and then made 
>> available to JenaText via the dataset configuration.  Is this still the case?
> 
> That should still be the case, with the proviso that currently the fields 
> names be handled via RDF properties outside the query string.
> 
> As you noted, it has been documented since 3.6.0 
> <https://jena.apache.org/documentation/query/text-query.html> that:
> 
>> No explicit use of Fields within the query string is supported.
> 
> This is based on the assumption that the indexes contain only a single 
> property field in the documents as they are indexed and hence only a single 
> field corresponding to an RDF property in a query. Evidently a poor 
> assumption not caught until now.
> 
> 
>> Up until Jena 3.9.0 definitely, and I suspect 3.12.0 - I have not confirmed 
>> this yet, it was possible to express text queries with field names and they 
>> worked.
> 
> You’re correct, the change was introduced 
> <https://github.com/apache/jena/blob/519c129ab2dfcb5eb43f1a337c618a8e69f88acd/jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java#L744>
>  in the 3.13.0 code that breaks the previous behavior. I’m not able to 
> explore fixing this for the next three weeks but may take a look at “fixing” 
> this then. The basic change would be to replace the referenced line by:
> 
> qstring = qs;
> 
> and that should be it. The results handling ( in simpleResults 
> <https://github.com/apache/jena/blob/519c129ab2dfcb5eb43f1a337c618a8e69f88acd/jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java#L562>
>  and highlightResults 
> <https://github.com/apache/jena/blob/519c129ab2dfcb5eb43f1a337c618a8e69f88acd/jena-text/src/main/java/org/apache/jena/query/text/TextIndexLucene.java#L668>)
>   should need no changes since Lucene:
> 
> doc.get(null) 
> 
> just returns null  which is already handled. Evidently your application 
> doesn’t use the
> 
>  (?s ?score ?lit) text:query … 
> 
> form, since there’s no information about what fields have been used in the 
> queryString no bindings for ?lit can be made.
> 
>> We needed an index where multiple properties of the same resource were 
>> indexed as a single document.  I would be happy to discuss this further - 
>> why the solution indicated in the JenaText documentation didn't work for us 
>> and whether there is way to construct a general purpose JenaText solution 
>> that would. 
> 
> 
> More explanation would be interesting.
> 
> Sorry for the inconvenience,
> Chris
> 



Re: JenaText: support for explicit field names in text queries

2019-09-01 Thread Chris Tomlinson
Hi Brian,

> On Sep 1, 2019, at 7:17 AM, Brian McBride  
> wrote:
> 
> It used to be the case that JenaText supported querying of a Lucene text 
> index where the index was created independently of Jena and then made 
> available to JenaText via the dataset configuration.  Is this still the case?

That should still be the case, with the proviso that currently the fields names 
be handled via RDF properties outside the query string.

As you noted, it has been documented since 3.6.0 
 that:

> No explicit use of Fields within the query string is supported.

This is based on the assumption that the indexes contain only a single property 
field in the documents as they are indexed and hence only a single field 
corresponding to an RDF property in a query. Evidently a poor assumption not 
caught until now.


> Up until Jena 3.9.0 definitely, and I suspect 3.12.0 - I have not confirmed 
> this yet, it was possible to express text queries with field names and they 
> worked.

You’re correct, the change was introduced 

 in the 3.13.0 code that breaks the previous behavior. I’m not able to explore 
fixing this for the next three weeks but may take a look at “fixing” this then. 
The basic change would be to replace the referenced line by:

qstring = qs;

and that should be it. The results handling ( in simpleResults 

 and highlightResults 
)
  should need no changes since Lucene:

doc.get(null) 

just returns null  which is already handled. Evidently your application doesn’t 
use the

 (?s ?score ?lit) text:query … 

form, since there’s no information about what fields have been used in the 
queryString no bindings for ?lit can be made.

> We needed an index where multiple properties of the same resource were 
> indexed as a single document.  I would be happy to discuss this further - why 
> the solution indicated in the JenaText documentation didn't work for us and 
> whether there is way to construct a general purpose JenaText solution that 
> would. 


More explanation would be interesting.

Sorry for the inconvenience,
Chris



Re: JENA-1620 and Query timeout overrides

2019-08-31 Thread Chris Tomlinson
Hi Brian,

The Lucene version changed in 3.10.0 to 7.4 from 6.4 in 3.9.0 (and earlier). I 
don’t think this has anything to do with the problem though.

I’m surprised that the query you indicate works in 3.9.0. It looks like what’s 
intended is a phrase query but it needs to be surrounded by double quotes:

?s text:query (“\”street: the\”” 300)

the Lucene query is created by taking the default field name, “text” is this 
case and dropping the query string in, giving:

text:string: the

as the query, which looks like two Lucene field names concatenated.

Without the inner double quotes Lucene will treat the query string as a OR of 
terms:

string: OR the

The “:” could be escaped like

?s text:query (“street\\: the” 300)

Which would be an OR. 

I don’t think any of this has changed sice 3.3.0

Regards,
Chris



> On Aug 31, 2019, at 3:08 PM, Brian McBride  
> wrote:

> 2) I ran our integration tests with the 3.13.0-SNAPSHOT installed and got a 
> JENA text problem.  A simplifed version of the query is:
> 
> [[
> 
> PREFIX  xsd:  
> PREFIX  text: 
> PREFIX  ppd: 
> PREFIX  lrcommon: 
> SELECT *  {
>   ?ppd_propertyAddress
>   text:query( "street:  the" 300 ) .
> } LIMIT 1
> 
> ]]
> 
> The fuseki log shows
> 
> [[
> 
> Cannot parse 'text:street: the  ': Encountered " ":" ": "" at line 1, column 
> 11.
> 
> ]]
> 
> This works in 3.9.0.
> 
> Our application indexes multiple properties in different fields.



Re: question about RDFPatch headers

2019-05-18 Thread Chris Tomlinson
Hi Andy,

We appreciate the ideas. If we go with a linking approach we’ll have to add 
some more machinery, which is fine.

Thanks again,
Chris


> On May 17, 2019, at 10:40 AM, Andy Seaborne  wrote:
> 
> Hi Chris,
> 
> If the "meta" part becomes complicated, it might be better to put a link in 
> the header that goes to another file.  There is balance to be struck between 
> arbitrary structures and simple processing.
> 
> It does make some sense to have a multi-valued patch header.
> 
> (all one line with or without separating comma)
> 
> H graph <http://purl.bdrc.io/graph/0686c69d-8f89-4496-acb5-744f0157a8db> 
> <http://purl.bdrc.io/graph/3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1> .
> 
> Having the header as a map makes mixing header entries and reprocessing them 
> work better.  No assumed order creeps in and no confusion about duplicates 
> for things that must be unique (like id). c.f. HTTP headers.
> 
> If you think the meta data is going to get large, then a link to elsewhere 
> may be better for other reasons like using the metadata without needing to 
> access the patch in the log.
> 
>Andy
> 
> On 16/05/2019 18:35, Chris Tomlinson wrote:
>> Hi,
>> We’re building an editing service for our RDF Linked Data Service and are 
>> thinking to use at least some of the features of RDFPatch/RDFDelta.
>> We use named graphs for the various Entities that we model: Works, Persons, 
>> Places, Lineages and so on. We are wanting to include in the patch some 
>> headers indicating the graphs that are being updated in the patch and the 
>> graphs that are created in the patch. We want this information to help the 
>> editing service have easy access to this information w/o analyzing the patch 
>> and doing other work to discover what’s being created and so on.
>> At first we thought of using a couple of keywords like, graph and create:
>> H graph <http://purl.bdrc.io/graph/0686c69d-8f89-4496-acb5-744f0157a8db> .
>> H graph <http://purl.bdrc.io/graph/3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1> .
>> H create <http://purl.bdrc.io/graph/b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a> .
>> H create <http://purl.bdrc.io/graph/0157a8db-acb5-4496-8f89-0686c69d744f> .
>> H id … .
>> but org.seaborne.patch.PatchHeader uses a Map so we can only one H graph … 
>> and one H create … in the patch. Two alternatives we’ve considered are to 
>> use a String of comma separated graphIds:
>> H graph "0686c69d-8f89-4496-acb5-744f0157a8db , 
>> 3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1” .
>> H create "b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a , 
>> 0157a8db-acb5-4496-8f89-0686c69d744f” .
>> which is plausible but in some cases the list of graphIds could become quite 
>> long and so this could be an issue down the line with very large strings.
>> A second idea was to add the notion of a preamble to the patch using PS, for 
>> preamble start, and PE, for preamble end, which would separate our 
>> extensions from the defined RDFPatch structure:
>> PS .
>> H graph <http://purl.bdrc.io/graph/0686c69d-8f89-4496-acb5-744f0157a8db> .
>> H graph <http://purl.bdrc.io/graph/3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1> .
>> H create <http://purl.bdrc.io/graph/b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a> .
>> H create <http://purl.bdrc.io/graph/0157a8db-acb5-4496-8f89-0686c69d744f> .
>> PE .
>> H id … .
>> TX .
>> …
>> We would then pre-parse the patch payload up to the PE and submit the 
>> remainder to RDFPatch, and so on.
>> A 3rd possibility is to consider some extension to RDFPatch to use a 
>> different signature for the Map in PatchHeader. This seems rather involved.
>> So we’re asking what approaches others might have taken for this sort of 
>> use-case or how best to accommodate this in RDFPatch as is.
>> Thanks very much,
>> Chris



Re: concatenating patches

2019-05-18 Thread Chris Tomlinson
Hi Andy,

Thanks for the reply. We’ll look more deeply into the rdf-delta repo. I think 
there’s a lot there to at least help inform our design.

Regards,
Chris


> On May 17, 2019, at 8:06 AM, Andy Seaborne  wrote:
> 
> 
> 
> On 16/05/2019 18:48, Chris Tomlinson wrote:
>> Hello,
>> As part of our editing service development, using at least aspects of 
>> RDFPatch, we have a use-case that goes like this:
>> User U_A does some work on a number of resources resulting in patch P_01 and 
>> then stashes the patch on the editing service and requests user U_B to 
>> finish the work item represented by P_01
>> User U_B retrieves P_01 and begins a new patch P_02 that continues on from 
>> P_01 and then
>> User U_B requests the editing service to apply P_01 followed by P_02.
>> This seems like a case where RDF Patch Log 
>> <https://afs.github.io/rdf-delta/rdf-patch-logs.html> would be appropriate, 
>> but it isn’t clear from the docs or code exactly how one manages a Log of 
>> patches to be applied in sequence and then requests the sequence to be 
>> applied. I looked about for a PatchLog class but didn’t see such.
>> How would this kind of scenario be handled?
> 
> The id/prev headers provide a way to define a log (a linear sequence of 
> patches with a head and tail).
> 
> There are many ways to use patches
> 
> The patch module on its own does not provide this - that is what modle 
> rdf-delta-client does for the case of an HA dataset.
> 
> DeltaConnection with a SyncPolicy of NONE (not automatic) and call
> 
> DeltaConnection internally a DataState - it tracks the state/version of the 
> dataset so sync() gets and applies the right patches in order.
> 
> Your case seems a little different so it may not apply out-of-the-box - there 
> is a workflow between the two users that is managing the patch flow.
> 
> The DeltaClient class can also help - it is more focuses on managing and 
> interrogating the state of the state of the patch log.
> 
> So one way may be to run a patch log server and add your workflow specifics 
> to pull and apply patches.
> 
>Andy
> 
>> Thanks very much,
>> Chris



concatenating patches

2019-05-16 Thread Chris Tomlinson
Hello,

As part of our editing service development, using at least aspects of RDFPatch, 
we have a use-case that goes like this:

User U_A does some work on a number of resources resulting in patch P_01 and 
then stashes the patch on the editing service and requests user U_B to finish 
the work item represented by P_01
User U_B retrieves P_01 and begins a new patch P_02 that continues on from P_01 
and then
User U_B requests the editing service to apply P_01 followed by P_02.
This seems like a case where RDF Patch Log 
 would be appropriate, but 
it isn’t clear from the docs or code exactly how one manages a Log of patches 
to be applied in sequence and then requests the sequence to be applied. I 
looked about for a PatchLog class but didn’t see such.

How would this kind of scenario be handled?

Thanks very much,
Chris



question about RDFPatch headers

2019-05-16 Thread Chris Tomlinson
Hi,

We’re building an editing service for our RDF Linked Data Service and are 
thinking to use at least some of the features of RDFPatch/RDFDelta.

We use named graphs for the various Entities that we model: Works, Persons, 
Places, Lineages and so on. We are wanting to include in the patch some headers 
indicating the graphs that are being updated in the patch and the graphs that 
are created in the patch. We want this information to help the editing service 
have easy access to this information w/o analyzing the patch and doing other 
work to discover what’s being created and so on.

At first we thought of using a couple of keywords like, graph and create:

H graph  .
H graph  .
H create  .
H create  .
H id … .

but org.seaborne.patch.PatchHeader uses a Map so we can only one H graph … and 
one H create … in the patch. Two alternatives we’ve considered are to use a 
String of comma separated graphIds:

H graph "0686c69d-8f89-4496-acb5-744f0157a8db , 
3ee0eca0-6d5f-4b4d-85db-f69ab1167eb1” .
H create "b1167eb1-85db-4b4d-6d5f-3ee0eca0f69a , 
0157a8db-acb5-4496-8f89-0686c69d744f” .

which is plausible but in some cases the list of graphIds could become quite 
long and so this could be an issue down the line with very large strings.

A second idea was to add the notion of a preamble to the patch using PS, for 
preamble start, and PE, for preamble end, which would separate our extensions 
from the defined RDFPatch structure:

PS .
H graph  .
H graph  .
H create  .
H create  .
PE .
H id … .
TX .
…

We would then pre-parse the patch payload up to the PE and submit the remainder 
to RDFPatch, and so on.

A 3rd possibility is to consider some extension to RDFPatch to use a different 
signature for the Map in PatchHeader. This seems rather involved.

So we’re asking what approaches others might have taken for this sort of 
use-case or how best to accommodate this in RDFPatch as is.

Thanks very much,
Chris



Re: Trouble with querying with language in Jena text

2019-05-02 Thread Chris Tomlinson
Hi Mikael,

try removing:

>  text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
>  text:queryParser text:AnalyzingQueryParser ;

Also the following should work as well as using “lang:en”:

  (?s ?score ?content) text:query (lsrm:content “text”@en)

but I doubt that will make a difference.

I’m still on 3.10 but there’ve been no changes in the jena-text for 3.11 that 
should be in play for your issue.

Chris


> On May 2, 2019, at 5:56 AM, Mikael Pesonen  wrote:
> 
> 
> I'm using Jena 3.11, full server as jar, and have following text index config:
> 
> <#indexLucene> a text:TextIndexLucene ;
>  text:directory   ;
>  text:entityMap <#entMap> ;
>  text:storeValues true ;
>  text:analyzer [ a text:StandardAnalyzer ] ;
>  text:queryAnalyzer [ a text:KeywordAnalyzer ] ;
>  text:queryParser text:AnalyzingQueryParser ;
>  text:multilingualSupport true ;
>   .
> 
> <#entMap> a text:EntityMap ;
>  text:defaultField "prefLabel" ;
>  text:entityField  "uri" ;
>  text:uidField "uid" ;
>  text:langField"lang" ;
>  text:graphField   "graph" ;
>  text:map (
>   [ text:field "prefLabel" ; text:predicate skos:prefLabel ]
>   [ text:field "altLabel"  ; text:predicate skos:altLabel ]
>   [ text:field "content"  ; text:predicate lsrm:content ]
>   ) .
> 
> 
> When inserting long text into lsrm:content, search usually works only without 
> language. So, inserted
> 
>  lsrm:content "long ... text ... here"@en
> 
> and querying like this works
> 
> (?s ?score ?content) text:query (lsrm:content "text" ) .
> 
> but this returns empty result
> 
> (?s ?score ?content) text:query (lsrm:content "text" "lang:en") .
> 
> But in some occasions language search does work in lsrm:content, can't see 
> what is the cause here.
> 
> Any ideas?
> 
> -- 
> Lingsoft - 30 years of Leading Language Management
> 
> www.lingsoft.fi
> 
> Speech Applications - Language Management - Translation - Reader's and 
> Writer's Tools - Text Tools - E-books and M-books
> 
> Mikael Pesonen
> System Engineer
> 
> e-mail: mikael.peso...@lingsoft.fi
> Tel. +358 2 279 3300
> 
> Time zone: GMT+2
> 
> Helsinki Office
> Eteläranta 10
> FI-00130 Helsinki
> FINLAND
> 
> Turku Office
> Kauppiaskatu 5 A
> FI-20100 Turku
> FINLAND
> 



Re: Text Index build with empty fields

2019-03-25 Thread Chris Tomlinson
Hi Sorin,

If the issue is resolved to your satisfaction please go ahead and close it.

Thanks,
Chris


> On Mar 25, 2019, at 11:56 AM, Sorin Gheorghiu 
>  wrote:
> 
> Hi Chris,
> 
> after doing more tests I have good news, the textindexer of Jena 3.10 is 
> working fine. When a large rdf data is indexed, the textindexer starts with 
> one field (per record), but later on the other fields are indexed as well. 
> This behaviour had confused me, I expected to see all fields indexed 
> immediately. Hence I learnt I have to wait until textindexer finishes his 
> task, then to check the results.
> Thank you for your support so far! Shall I close the ticket?
> Best regards,
> Sorin
> 
> Am 12.03.2019 um 15:39 schrieb Chris Tomlinson:
>> Hi Sorin,
>> 
>> I have focussed on the jena text integration w/ Lucene local to jena/fuseki. 
>> The solr was dropped over a year ago due to lack of support/interest and w’ 
>> your information about ES 7.x it’s likely going to take someone who is a 
>> user of ES to help keep the integration up-to-date. 
>> 
>> Anuj Kumar mailto:akum...@isightpartners.com>> 
>> did the ES integration about a year ago for jena 3.9.0 and as I mentioned I 
>> made obvious changes to the ES integration to update to Lucene 7.4.0 for 
>> jena 3.10.0.
>> 
>> The upgrade to Lucene 7.4.0  
>> <https://issues.apache.org/jira/browse/JENA-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673657#comment-16673657>was
>>  prompted by a user, jeanmarc.va...@gmail.com 
>> <mailto:jeanmarc.va...@gmail.com>, who was interested in Lucene 7.5, but the 
>> released version of ES was built against 7.4 so we upgraded to that version.
>> 
>> I’ve opened JENA-1681 <https://issues.apache.org/jira/browse/JENA-1681> for 
>> the issue you’ve reported. You can report your findings there and hopefully 
>> we can get to the bottom of the problem.
>> 
>> Regards,
>> Chris
>> 
>> 
>> 
>>> On Mar 12, 2019, at 6:40 AM, Sorin Gheorghiu 
>>> mailto:sorin.gheorg...@uni-konstanz.de>> 
>>> wrote:
>>> 
>>> Hi Chris,
>>> 
>>> Thank you for your detailed answer. I will still try to find the root cause 
>>> of this issue.
>>> But I have a question to you, do you know if Jena will support 
>>> Elasticsearch in the further versions?
>>> 
>>> I am asking because in Elasticsearch 7.0 are breaking changes which will 
>>> affect the transport-client [1]: 
>>> The TransportClient is deprecated in favour of the Java High Level REST 
>>> Client and will be removed in Elasticsearch 8.0.
>>> This supposes changes in the client’s initialization code, the Migration 
>>> Guide [2] explains how to do it.
>>> 
>>> 
>>> [1] 
>>> https://www.elastic.co/guide/en/elasticsearch/client/java-api/master/transport-client.html
>>>  
>>> <https://www.elastic.co/guide/en/elasticsearch/client/java-api/master/transport-client.html>
>>> [2] 
>>> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-level-migration.html
>>>  
>>> <https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-level-migration.html>
>>> 
>>> Best regards,
>>> Sorin
>>> 
>>> Am 11.03.2019 um 18:38 schrieb Chris Tomlinson:
>>>> Hi Sorin,
>>>> 
>>>> I haven’t had the time to try and delve further into your issue. Your pcap 
>>>> seems to clearly indicate that there is no data populating any 
>>>> field/property other than the first one in the entity map.
>>>> 
>>>> I’ve included the configuration file that we use. It has many many fields 
>>>> defined that are all populated. We load jena/fuseki from a collection of 
>>>> git repos via a git-to-dbs tool <https://github.com/buda-base/git-to-dbs> 
>>>> and we don’t see the sort of issue you’re reporting where there is a 
>>>> single field out of all the defined fields that is populated in the 
>>>> dataset and Lucene index - we don’t use ElasticSearch. 
>>>> 
>>>> The point being that whatever is going wrong is apparently not in the 
>>>> parsing of the configuration and setting up of the internal tables that 
>>>> record information about which predicates are indexed via Lucene (or 
>>>> Elasticsearch) into what fields.
>>>> 
>>>> So it appears to me that the issue is something that is 

Re: Text Index build with empty fields

2019-03-12 Thread Chris Tomlinson
Hi Sorin,

I have focussed on the jena text integration w/ Lucene local to jena/fuseki. 
The solr was dropped over a year ago due to lack of support/interest and w’ 
your information about ES 7.x it’s likely going to take someone who is a user 
of ES to help keep the integration up-to-date. 

Anuj Kumar  did the ES integration about a year ago 
for jena 3.9.0 and as I mentioned I made obvious changes to the ES integration 
to update to Lucene 7.4.0 for jena 3.10.0.

The upgrade to Lucene 7.4.0  
<https://issues.apache.org/jira/browse/JENA-1621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16673657#comment-16673657>was
 prompted by a user, jeanmarc.va...@gmail.com 
<mailto:jeanmarc.va...@gmail.com>, who was interested in Lucene 7.5, but the 
released version of ES was built against 7.4 so we upgraded to that version.

I’ve opened JENA-1681 <https://issues.apache.org/jira/browse/JENA-1681> for the 
issue you’ve reported. You can report your findings there and hopefully we can 
get to the bottom of the problem.

Regards,
Chris



> On Mar 12, 2019, at 6:40 AM, Sorin Gheorghiu 
>  wrote:
> 
> Hi Chris,
> 
> Thank you for your detailed answer. I will still try to find the root cause 
> of this issue.
> But I have a question to you, do you know if Jena will support Elasticsearch 
> in the further versions?
> 
> I am asking because in Elasticsearch 7.0 are breaking changes which will 
> affect the transport-client [1]: 
> The TransportClient is deprecated in favour of the Java High Level REST 
> Client and will be removed in Elasticsearch 8.0.
> This supposes changes in the client’s initialization code, the Migration 
> Guide [2] explains how to do it.
> 
> [1] 
> https://www.elastic.co/guide/en/elasticsearch/client/java-api/master/transport-client.html
>  
> <https://www.elastic.co/guide/en/elasticsearch/client/java-api/master/transport-client.html>
> [2] 
> https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-level-migration.html
>  
> <https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-level-migration.html>
> 
> Best regards,
> Sorin
> 
> Am 11.03.2019 um 18:38 schrieb Chris Tomlinson:
>> Hi Sorin,
>> 
>> I haven’t had the time to try and delve further into your issue. Your pcap 
>> seems to clearly indicate that there is no data populating any 
>> field/property other than the first one in the entity map.
>> 
>> I’ve included the configuration file that we use. It has many many fields 
>> defined that are all populated. We load jena/fuseki from a collection of git 
>> repos via a git-to-dbs tool <https://github.com/buda-base/git-to-dbs> and we 
>> don’t see the sort of issue you’re reporting where there is a single field 
>> out of all the defined fields that is populated in the dataset and Lucene 
>> index - we don’t use ElasticSearch. 
>> 
>> The point being that whatever is going wrong is apparently not in the 
>> parsing of the configuration and setting up of the internal tables that 
>> record information about which predicates are indexed via Lucene (or 
>> Elasticsearch) into what fields.
>> 
>> So it appears to me that the issue is something that is happening in the 
>> connection between the standalone textindexer.java and the Elasticsearch via 
>> the TextIndexES.java. The textindexer.java doesn’t have any post 3.8.0 
>> changes that I can see and the only change in the TextIndexES.java is a 
>> change in the name of 
>> org.elasticsearch.common.transport.InetSocketTransportAddress to 
>> org.elasticsearch.common.transport.TransportAddress as part of the upgrade.
>> 
>> I’m really not able to go further at this time.
>> 
>> I’m sorry,
>> Chris
>> 
>> 
>>> # Fuseki configuration for BDRC, configures two endpoints:
>>> #   - /bdrc is read-only
>>> #   - /bdrcrw is read-write
>>> #
>>> # This was painful to come up with but the web interface basically allows 
>>> no option
>>> # and there is no subclass inference by default so such a configuration 
>>> file is necessary.
>>> #
>>> # The main doc sources are:
>>> #  - 
>>> https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html 
>>> <https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html>
>>> #  - https://jena.apache.org/documentation/assembler/assembler-howto.html 
>>> <https://jena.apache.org/documentation/assembler/assembler-howto.html>
>>> #  - https://jena.apache.org/documentation/assembler/assembler.ttl 
>>> <https://jena.apache.org/documentation/a

Re: Text Index build with empty fields

2019-03-01 Thread Chris Tomlinson
Hi Sorin,

tcpdump -A -r works fine to view the pcap file; however, I don’t have the time 
to delve into the data. I’ll take your word for it that the whole setup worked 
in 3.8.0 and I encourage you to try simplifying the entity map perhaps by 
having a unique field per property to see if the problem appears related to 
prefName and varName fields mapping to multiple properties. 

I do notice that the field oldgndid only maps to a single property but not 
knowing the data I have no idea whether there’s any of that data in your tests.

Since you indicate that only the field, gndtype, has data (per the pcap file) 
then if there is oldgndid data (i.e., occurrences of gndo:oldAuthorityNumber, 
then that suggests that there is some rather generic issue w/ textindexer; 
however if there is no oldgndid data then there may be a problem that has crept 
in since 3.8.0 that is leading to a problem with data for multiple properties 
assigned to a single field which I would guess might be related to 
google.common.collection.MultiMap that holds the results of parsing the entity 
map.

I have no idea how to enable the debug when running the standalone textindexer, 
perhaps someone else can answer that.

Regards,
Chris


> On Mar 1, 2019, at 2:57 AM, Sorin Gheorghiu  
> wrote:
> 
> Hi Chris,
> 
> 1) As I said before, this entity map worked in 3.8.0. 
> The pcap file I sent you is the proof that Jena delivers inconsistent data. 
> You may open it with Wireshark
> 
> 
> 
> or read it with tcpick:
> # tcpick -C -yP -r textindexer_280219.pcap | more
> 
> ES...}..\*...gnd_fts_es_131018_index.cp-dFuCVTg-dUwvfyREG2w..GndSubjectheadings.http://d-nb.info/gnd/102968225.
> ES..\*.transport_client.indices:data/write/update..gnd_fts_es_131018_index.GndSubjectheadings.http://d-nb.info/gnd/102968438..painless..if((ctx._source
>  == null) || (ctx._source.gndtype == null) || (ctx._source.gndtype.empty == 
> true)) {ctx._source.gndtype=[params.fieldValue] } else 
> {ctx._source.gndtype.add(params.fieldValue)}
> ..fieldValue..Person...gnd_fts_es_131018_indexGndSubjectheadings..http://d-nb.info/gnd/102968438{"varName":[],"varName":[],"varName":[],"varName":[],"varName":[],"varName":[],"varName":[],"prefName":[],"prefName":[],"prefName":[],"prefName":[],"prefName":[],"prefName":[],"prefName":[],"oldgndid":[],"gndtype":["Person"]}..
> As a remark, Jena sends whole text index data within one TCP packet for one 
> Elasticsearch document.
> 
> 3) fuseki.log collects logs when Fuseki server is running, but for text 
> indexer we have to run java command line, i.e.
> 
>   java -cp ./fuseki-server.jar: jena.textindexer 
> --desc=run/config.ttl
> The question is how to activate the debug logs during text indexer?
> 
> 
> Regards,
> Sorin
> 
> Am 28.02.2019 um 21:41 schrieb Chris Tomlinson:
>> Hi Sorin,
>> 
>> 1) I suggest trying to simplify the entity map. I assume there’s data for 
>> each of the properties other than skos:altLabel in the entity map:
>> 
>>>  [ text:field "gndtype";
>>>text:predicate skos:altLabel
>>>  ]
>>>  [ text:field "oldgndid";
>>>text:predicate gndo:oldAuthorityNumber
>>>  ]
>>>  [ text:field "prefName";
>>>text:predicate gndo:preferredNameForTheSubjectHeading
>>>  ]
>>>  [ text:field "varName";
>>>text:predicate gndo:variantNameForTheSubjectHeading
>>>  ]
>>>  [ text:field "prefName";
>>>text:predicate gndo:preferredNameForThePlaceOrGeographicName
>>>  ]
>>>  [ text:field "varName";
>>>text:predicate gndo:variantNameForThePlaceOrGeographicName
>>>  ]
>>>  [ text:field "prefName";
>>>text:predicate gndo:preferredNameForTheWork
>>>  ]
>>>  [ text:field "varName";
>>>text:predicate gndo:variantNameForTheWork
>>>  ]
>>>  [ text:field "prefName";
>>>text:predicate gndo:preferredNameForTheConferenceOrEvent
>>>  ]
>>>  [ text:field "varName";
>>>text:predicate gndo:variantNameForTheConferenceOrEvent
>>>  ]
>>>  [ text:field "prefName";
>>>

Re: Text Index build with empty fields

2019-02-28 Thread Chris Tomlinson
Hi Sorin,

1) I suggest trying to simplify the entity map. I assume there’s data for each 
of the properties other than skos:altLabel in the entity map:

>  [ text:field "gndtype";
>text:predicate skos:altLabel
>  ]
>  [ text:field "oldgndid";
>text:predicate gndo:oldAuthorityNumber
>  ]
>  [ text:field "prefName";
>text:predicate gndo:preferredNameForTheSubjectHeading
>  ]
>  [ text:field "varName";
>text:predicate gndo:variantNameForTheSubjectHeading
>  ]
>  [ text:field "prefName";
>text:predicate gndo:preferredNameForThePlaceOrGeographicName
>  ]
>  [ text:field "varName";
>text:predicate gndo:variantNameForThePlaceOrGeographicName
>  ]
>  [ text:field "prefName";
>text:predicate gndo:preferredNameForTheWork
>  ]
>  [ text:field "varName";
>text:predicate gndo:variantNameForTheWork
>  ]
>  [ text:field "prefName";
>text:predicate gndo:preferredNameForTheConferenceOrEvent
>  ]
>  [ text:field "varName";
>text:predicate gndo:variantNameForTheConferenceOrEvent
>  ]
>  [ text:field "prefName";
>text:predicate gndo:preferredNameForTheCorporateBody
>  ]
>  [ text:field "varName";
>text:predicate gndo:variantNameForTheCorporateBody
>  ]
>  [ text:field "prefName";
>text:predicate gndo:preferredNameForThePerson
>  ]
>  [ text:field "varName";
>text:predicate gndo:variantNameForThePerson
>  ]
>  [ text:field "prefName";
>text:predicate gndo:preferredNameForTheFamily
>  ]
>  [ text:field "varName";
>text:predicate gndo:variantNameForTheFamily
>  ]


2) You might try a TextIndexLucene

3) Adding the line log4j.logger.org.apache.jena.query.text.es=DEBUG should 
work. I see no problem with it.

Sorry to be of little help,
Chris


> On Feb 28, 2019, at 8:53 AM, Sorin Gheorghiu 
>  wrote:
> 
> Hi Chris,
> Thank you for answering, I reply you directly because users@jena doesn't 
> accept messages larger than 1Mb.
> 
> The previous text index successful attempt we did was with 3.8.0, not 3.9.0, 
> sorry for the misinformation.
> Attached is the assembler file for 3.10.0 as requested, as well as the packet 
> capture file to see that only the 'gndtype' field has data.
> I tried to enable the debug logs in log4j.properties with 
> log4j.logger.org.apache.jena.query.text.es=DEBUG but no output in the log 
> file.
> 
> Regards,
> Sorin
> 
> Am 27.02.2019 um 20:01 schrieb Chris Tomlinson:
>> Hi Sorin,
>> 
>> Please provide the assembler file for Elasticsearch that has the problematic 
>> entity map definitions.
>> 
>> There haven’t been any changes in over a year to textindexer since well 
>> before 3.9. I don’t see any relevant changes to the handling of entity maps 
>> either so I can’t begin to pursue the issue further w/o perhaps seeing your 
>> current assembler file. 
>> 
>> I don't have any experience with Elasticsearch or with using jena-text-es 
>> beyond a simple change to TextIndexES.java to change 
>> org.elasticsearch.common.transport.InetSocketTransportAddress to 
>> org.elasticsearch.common.transport.TransportAddress as part of the upgrade 
>> to Lucene 7.4.0 and Elasticsearch 6.4.2.
>> 
>> Regards,
>> Chris
>> 
>> 
>>> On Feb 25, 2019, at 2:37 AM, Sorin Gheorghiu 
>>>  <mailto:sorin.gheorg...@uni-konstanz.de> 
>>> wrote:
>>> 
>>> Correction: only the *latest field *from the /text:map/ list contains a 
>>> value.
>>> 
>>> To reformulate:
>>> 
>>> * if there are 3 fields in /text:map/, then during indexing the first
>>>   two are empty (let's name them 'text1' and 'text2') and the latest
>>>   field contains data (let's name it 'text3')
>>> * if on the next attempt the field 'text3' is commented out, then
>>>   'text1' is empty and 'text2' contains data
>>> 
>>> 
>>> Am 22.02.2019 um 15:01 schrieb Sorin Gheorghiu:
>>>> In addition:
>>>> 
>>>>  * if there are 3 fields in /text:map/, then during indexing one
>>>>contains data (let's name it 'text1'), the others are empty (let's
>>>

Re: Text Index build with empty fields

2019-02-27 Thread Chris Tomlinson
Hi Sorin,

Please provide the assembler file for Elasticsearch that has the problematic 
entity map definitions.

There haven’t been any changes in over a year to textindexer since well before 
3.9. I don’t see any relevant changes to the handling of entity maps either so 
I can’t begin to pursue the issue further w/o perhaps seeing your current 
assembler file. 

I don't have any experience with Elasticsearch or with using jena-text-es 
beyond a simple change to TextIndexES.java to change 
org.elasticsearch.common.transport.InetSocketTransportAddress to 
org.elasticsearch.common.transport.TransportAddress as part of the upgrade to 
Lucene 7.4.0 and Elasticsearch 6.4.2.

Regards,
Chris


> On Feb 25, 2019, at 2:37 AM, Sorin Gheorghiu 
>  wrote:
> 
> Correction: only the *latest field *from the /text:map/ list contains a value.
> 
> To reformulate:
> 
> * if there are 3 fields in /text:map/, then during indexing the first
>   two are empty (let's name them 'text1' and 'text2') and the latest
>   field contains data (let's name it 'text3')
> * if on the next attempt the field 'text3' is commented out, then
>   'text1' is empty and 'text2' contains data
> 
> 
> Am 22.02.2019 um 15:01 schrieb Sorin Gheorghiu:
>> 
>> In addition:
>> 
>>  * if there are 3 fields in /text:map/, then during indexing one
>>contains data (let's name it 'text1'), the others are empty (let's
>>name them 'text2' and 'text3'),
>>  * if on the next attempt the field 'text1' is commented out, then
>>'text2' contains data and 'text3' is empty
>> 
>> 
>> 
>>  Weitergeleitete Nachricht 
>> Betreff: Text Index build with empty fields
>> Datum:   Fri, 22 Feb 2019 14:01:18 +0100
>> Von: Sorin Gheorghiu 
>> Antwort an:  users@jena.apache.org
>> An:  users@jena.apache.org
>> 
>> 
>> 
>> Hi,
>> 
>> When building the text index with the /jena.textindexer/ tool in Jena 3.10 
>> for an external full-text search engine (Elasticsearch of course) and having 
>> multiple fields with different names in /text:map/, just *one field is 
>> indexed* (more precisely one field contains data, the others are empty). It 
>> doesn't look to be an issue with Elasticsearch, in the logs generated during 
>> the indexing the fields are already missing the values, but one. The same 
>> setup worked in Jena 3.9. Changing the Java version from 8 to 9 or 11 didn't 
>> change anything.
>> 
>> Could it be that changes of the new release have affected this tool and we 
>> deal with a bug?
>> 
> 



Re: missing xml:base

2019-02-23 Thread Chris Tomlinson
Hi Andy,

> On Feb 23, 2019, at 3:14 PM, Andy Seaborne  wrote:
> 
> I thought xml:base has to be enabled in the RDF/XML writer, it's no on by 
> default.
> 
> https://jena.apache.org/documentation/io/rdfxml_howto.html#advanced-rdfxml-output
>  
> <https://jena.apache.org/documentation/io/rdfxml_howto.html#advanced-rdfxml-output>

Ah! Thanks very much for pointing to docs on 
org.apache.jena.rdf.model.RDFWriter . Indeed setting the “xmlbase” property 
causes an xml:base to be included in the  header as desired:

> org.apache.jena.rdf.model.RDFWriter rdfw = m.getWriter("RDF/XML");
> rdfw.setProperty("xmlbase", "http://purl.bdrc.io/ontology/ext/auth/;);
> rdfw.write(m, System.out, "http://purl.bdrc.io/ontology/ext/auth/;);

Are there any particular issues to be aware of when using 
org.apache.jena.rdf.model.RDFWriter vs org.apache.jena.riot.RDFWriter which 
underlies {Model, OntModel}.write? The latter doesn’t support a property to 
include xml:base in the output. 



>> However, we are wanting to ensure that there is an explicit baseURI present 
>> in the resulting serialization. 
> 
> Out of curiosity, why do you want that?
> 
> Using ":" instead of a "base" seems quite common.

There are third party tools, such as TopBraid Composer FE, SE and ME, that 
require an explicit xml:base when resolving an owl:imports via the web - i.e., 
there is no matching local file - rather than using the URI in the owl:imports 
as the baseURI for the document. As I read the RDF/XML syntax spec the client 
is supposed to use the document URL for the baseURI when no xml:base is 
present. The TBC uses an "Accept: application/rdf+xml, text/turtle” and with 
such a list RDF/XML is selected (the service can provide both serializations). 

A solution, when serializing as RDF/XML, might be to use 
Model.write(OutputStream, “RDF/XML", null) or Model.write(OutputStream, 
“RDF/XML”), which forces absolute paths to be used in the output; however, 
without the (unnecessary) xml:base, TBC fails silently to load the imports. By 
comparison, Protégé has no problem correctly following owl:imports.

We’re just trying to provide proper content negotiation and accommodate the 
dictum that the client makes it right.

Based on the javadoc for Model.write parameter, base:

> base The base uri to use when writing relative URI's. null means use only 
> absolute URI's. This is used for relative URIs that would be resolved against 
> the document retrieval URL. For some values of lang, this value may be 
> included in the output.

It is unclear of what use a non-null value for XML/RDF is since the resulting 
output has no xml:base, the URL of the doc can not be necessarily relied on, 
and the output is rendered ambiguous since relative URIs can’t be resolved 
unless perhaps the default namespace, denoted by “:”, is used..

The write w/ an explicit baseURI could output an explicit xml:base (as an @base 
is written for Turtle) or as also suggested by the javadoc simply use the 
supplied baseURI to construct the proper URIs for values of rdf:about etc on 
output.



>Andy

Thank you,
Chris










> 
> On 23/02/2019 00:42, Martynas Jusevičius wrote:
>> Sorry, can't run the test case right now.
>> No I'm thinking RDF/XML :) But the prefixes are only used for
>> properties, not subject (@rdf:about) or object (@rdf:resource) values.
>> What would you gain by adding an xml:base? You could shorten
>> prfx:lcl-name to just lcl-name by setting xml:base to prfx: namespace
>> URI, but that would only work for a single namespace.
>> And not sure how much sense it makes to compare RDF/XML with Turtle
>> because the former builds on XML which has its own namespace
>> mechanism. But maybe I'm completely misunderstanding what you are
>> trying to do :)
>> On Sat, Feb 23, 2019 at 12:18 AM Chris Tomlinson
>>  wrote:
>>> 
>>> No.
>>> 
>>> If you run the test case you see that RDF/XML writes out xmlns defns of 
>>> prefixes and uses the prefixes in the serialization. Perhaps you are 
>>> thinking of n-triples.
>>> 
>>> Thanks,
>>> Chris
>>> 
>>>> On Feb 22, 2019, at 16:52, Martynas Jusevičius  
>>>> wrote:
>>>> 
>>>> Isn't it so that RDF/XML writer always writes absolute URIs, so
>>>> xml:base is unnecessary because it would have no effect anyway?
>>>> 
>>>> On Fri, Feb 22, 2019 at 11:20 PM Chris Tomlinson
>>>>  wrote:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> We are trying to serve various ontology files in a variety of 
>>>>> serializations, principally RDF/XML and Turtle.
>>>>&g

Re: missing xml:base

2019-02-22 Thread Chris Tomlinson
No. 

If you run the test case you see that RDF/XML writes out xmlns defns of 
prefixes and uses the prefixes in the serialization. Perhaps you are thinking 
of n-triples.

Thanks,
Chris

> On Feb 22, 2019, at 16:52, Martynas Jusevičius  wrote:
> 
> Isn't it so that RDF/XML writer always writes absolute URIs, so
> xml:base is unnecessary because it would have no effect anyway?
> 
> On Fri, Feb 22, 2019 at 11:20 PM Chris Tomlinson
>  wrote:
>> 
>> Hello,
>> 
>> We are trying to serve various ontology files in a variety of 
>> serializations, principally RDF/XML and Turtle.
>> 
>> The specs indicate that if the baseURI for an ontology is the URL by which 
>> the ontology is retrieved then it is not required that the producer include 
>> an explicit xml:base or @base or similar in the serialization.
>> 
>> However, we are wanting to ensure that there is an explicit baseURI present 
>> in the resulting serialization.
>> 
>> This is because not all tools respect the injunction to use the URL that was 
>> used to retrieve the ontology as the baseURI if there is not an explicit 
>> xml:base or @base and so on in the serialization.
>> 
>> The question is that Model.write(OutputStream, Language, baseURI) includes 
>> an @base when the language is “TURTLE“ in the serialization but when the 
>> language is “RDF/XML” we do not see an xml:base in the result. (same happens 
>> when OntModel is used and when RDFWriter is used.)
>> 
>> The docs indicate that the baseURI param is to be used to specify what URI 
>> should be used to serialize relative URIs and says nothing about including 
>> xml:base or @base; yet, we see the @base for Turtle and no xml:base for 
>> RDF/XML.
>> 
>> This is an issue when a tool requests RDF/XML before Turtle in accept 
>> headers and requires that there be an xml:base in the RDF/XML serialization. 
>> Noting that RDF/XML is the only required serialization.
>> 
>> What procedure should be used to “force” an @base or xml:base uniformly?
>> 
>> Here is a small test case that shows the issue:
>> 
>>
>> https://github.com/buda-base/lds-pdi/blob/master/src/test/java/io/bdrc/ldspdi/test/ModelWriteTest.java
>>  
>> <https://github.com/buda-base/lds-pdi/blob/master/src/test/java/io/bdrc/ldspdi/test/ModelWriteTest.java>
>> 
>> 
>> Thank you,
>> Chris
>> 
>> 


missing xml:base

2019-02-22 Thread Chris Tomlinson
Hello,

We are trying to serve various ontology files in a variety of serializations, 
principally RDF/XML and Turtle. 

The specs indicate that if the baseURI for an ontology is the URL by which the 
ontology is retrieved then it is not required that the producer include an 
explicit xml:base or @base or similar in the serialization.

However, we are wanting to ensure that there is an explicit baseURI present in 
the resulting serialization. 

This is because not all tools respect the injunction to use the URL that was 
used to retrieve the ontology as the baseURI if there is not an explicit 
xml:base or @base and so on in the serialization. 

The question is that Model.write(OutputStream, Language, baseURI) includes an 
@base when the language is “TURTLE“ in the serialization but when the language 
is “RDF/XML” we do not see an xml:base in the result. (same happens when 
OntModel is used and when RDFWriter is used.)

The docs indicate that the baseURI param is to be used to specify what URI 
should be used to serialize relative URIs and says nothing about including 
xml:base or @base; yet, we see the @base for Turtle and no xml:base for RDF/XML.

This is an issue when a tool requests RDF/XML before Turtle in accept headers 
and requires that there be an xml:base in the RDF/XML serialization. Noting 
that RDF/XML is the only required serialization.

What procedure should be used to “force” an @base or xml:base uniformly?

Here is a small test case that shows the issue:


https://github.com/buda-base/lds-pdi/blob/master/src/test/java/io/bdrc/ldspdi/test/ModelWriteTest.java
 



Thank you,
Chris




Re: Jena Full Text Search documentation

2019-02-22 Thread Chris Tomlinson
I finally update the jena text query documentation with the helpful table 
supplied by Sorin Gheorghiu.

Thank you for the contribution,
Chris


> On Feb 5, 2019, at 6:07 AM, Sorin Gheorghiu  
> wrote:
> 
> Hi,
> 
> it is a great that Jena supports further the Full Text Search extension. 
> Therefore I suggest to update the documentation: if [1] refers to the latest 
> released Jena version (which is 3.10.0), then the supported versions are 
> Lucene 7.4.0 and Elasticsearch 6.4.2 (based on JENA-1621).
> Here is the version compatibility matrix (based on my informations) between 
> Jena and the full text engines supported by Jena:
> 
> Note: Elasticsearch 6.6.0 is already released, I assume the next Jena version 
> will support Lucene 7.5
> 
> [1] https://jena.apache.org/documentation/query/text-query.html 
> 
> 
> Thank you!
> 
> 



Re: Text search from sparql

2019-01-18 Thread Chris Tomlinson
Hi Vincent,

I’ll be happy to review the changes also. Best to have a user that has recent 
experience with the deficits in the current docs make changes that clarify 
their issues.

Thanks,
Chris


> On Jan 18, 2019, at 12:09 PM, vincent ventresque 
>  wrote:
> 
> Hi ajs6f
> 
> Thank you for your proposal, I'd be glad to contribute (as soon as I find a 
> moment, maybe next week).
> 
> By the way, I wrote a small tutorial (in french) for our project, and some 
> PHP code to use with Fuseki + a sample dataset (data from french national 
> library). I consider translating the tuto into English, because I think 
> Fuseki is great to learn RDF and LOD : do you think s.o could be interested 
> in reviewing the tuto?
> 
> Vincent
> 
> Le 18/01/2019 à 18:30, ajs6f a écrit :
>> Hi, Vincent--
>> 
>> As mentioned in a recent thread, you needn't be a committer or have any 
>> other official role to improve that page. Just use the link in the upper 
>> right to submit your suggested changes, and I promise to review/commit them! 
>> (Probably with some help from Chris or Osma, who know the text-indexing 
>> machinery much much better than do I.)
>> 
>> ajs6f
>> 
>>> On Jan 18, 2019, at 11:32 AM, Vincent Ventresque 
>>>  wrote:
>>> 
>>> Hi Chris,
>>> 
>>> I'd like to add sthg about the documentation page : it's a bit laconic, and 
>>> I spent a few hours to understand that values have to be provided in config 
>>> file for dataset uri.
>>> 
>>> Furthermore, I had to search on Stackoverflow, and saw other users had the 
>>> same problem with jena-text config)
>>> 
>>> Maybe s.o. could add some explanations on this page?
>>> 
>>> -- give a value for dataset URI (replace <#dataset> with sthg like 
>>> :my_dataset)
>>> 
>>> -- give a path for Lucene index files => this will create a directory
>>> 
>>> -- give a path for TDB (tdb:location 
>>> "/home/.../fuseki/run/databases/My_dataset")
>>> 
>>> -- if you don't set environment variables, run fuseki after changing 
>>> directory ("cd My_Fuseki_install) and give relative paths for command 
>>> arguments (" --config=run/my_config.ttl ").
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Le 18/01/2019 à 17:13, Chris Tomlinson a écrit :
>>>> Hi,
>>>> 
>>>> 1) If you’re using a default config, it does not have a working jena-text 
>>>> configuration. The config will need to  include skos:prefLabel in the 
>>>> entity map.
>>>> 
>>>> 2) when you change the jena-text in significant ways, such as changing 
>>>> what analyzer is used for a given property and so on, then you’ll need to 
>>>> rebuild the Lucene index via reloading the dataset or using the 
>>>> textIndexer 
>>>> <https://jena.apache.org/documentation/query/text-query.html#building-a-text-index>.
>>>>  I don’t recall this being mentioned as part of your testing
>>>> 
>>>> 3) Please indicate exactly which item you’re using 
>>>> jena-fuseki-war-3.9.0.war or jena-fuseki-webapp-3.9.0.jar etc, and the 
>>>> config file itself. The error you’ve mentioned previously:
>>>> 
>>>>> Jan 17 17:00:28 semantic-dev java[16800]: [2019-01-17 17:00:28] Config
>>>>>  INFO  Load configuration: 
>>>>> file:///home/text/tools/apache-jena-fuseki-3.9.0/run/configuration/text_index.ttl
>>>>>  
>>>>> 
>>>>> Jan 17 17:00:28 semantic-dev java[16800]: [2019-01-17 17:00:28] 
>>>>> WebAppContext WARN  Failed startup of context 
>>>>> o.e.j.w.WebAppContext@4159e81b{Apache Jena Fuseki 
>>>>> Server,/,file:///home/text/tools/apache-jena-fuseki-3.9.0/webapp/,UNAVAILABLE
>>>>>  }
>>>>> Jan 17 17:00:28 semantic-dev java[16800]: at 
>>>>> org.apache.jena.fuseki.build.FusekiConfig.readAssemblerFile(FusekiConfig.java:148)
>>>> suggests to me that something in the config file is confusing the 
>>>> readAssemblerFile. It doesn’t look like it’s failing in the reading the 
>>>> jena-text portion of the config.
>>>> 
>>>> If http://api.finto.fi/download/mesh/mesh-skos.ttl 
>>>> <http://api.finto.fi/download/mesh/mesh-skos.ttl> the dataset, then can 
>>>> you cut it down to just a small test case with some concepts with “medi” 
>>>> and a few wi

Re: Text search from sparql

2019-01-18 Thread Chris Tomlinson
Hi again,

Indeed, if there is no TextIndex configured then the text:query will return 
null to the surrounding context:


Log.warn(TextQueryPF.class, "Failed to find the text index : tried 
context and as a text-enabled dataset") ;
return null ;

So you could perhaps verify that when you see all concepts returned that there 
is a log message indicating that the TextIndex is missing for the dataset.

Thanks,
Chris


> On Jan 18, 2019, at 10:13 AM, Chris Tomlinson  
> wrote:
> 
> Hi,
> 
> 1) If you’re using a default config, it does not have a working jena-text 
> configuration. The config will need to  include skos:prefLabel in the entity 
> map.
> 
> 2) when you change the jena-text in significant ways, such as changing what 
> analyzer is used for a given property and so on, then you’ll need to rebuild 
> the Lucene index via reloading the dataset or using the textIndexer 
> <https://jena.apache.org/documentation/query/text-query.html#building-a-text-index>.
>  I don’t recall this being mentioned as part of your testing
> 
> 3) Please indicate exactly which item you’re using jena-fuseki-war-3.9.0.war 
> or jena-fuseki-webapp-3.9.0.jar etc, and the config file itself. The error 
> you’ve mentioned previously:
> 
>> Jan 17 17:00:28 semantic-dev java[16800]: [2019-01-17 17:00:28] Config 
>> INFO  Load configuration: 
>> file:///home/text/tools/apache-jena-fuseki-3.9.0/run/configuration/text_index.ttl
>>  
>> 
>> Jan 17 17:00:28 semantic-dev java[16800]: [2019-01-17 17:00:28] 
>> WebAppContext WARN  Failed startup of context 
>> o.e.j.w.WebAppContext@4159e81b{Apache Jena Fuseki 
>> Server,/,file:///home/text/tools/apache-jena-fuseki-3.9.0/webapp/,UNAVAILABLE
>>  }
>> Jan 17 17:00:28 semantic-dev java[16800]: at 
>> org.apache.jena.fuseki.build.FusekiConfig.readAssemblerFile(FusekiConfig.java:148)
> 
> suggests to me that something in the config file is confusing the 
> readAssemblerFile. It doesn’t look like it’s failing in the reading the 
> jena-text portion of the config.
> 
> If http://api.finto.fi/download/mesh/mesh-skos.ttl 
> <http://api.finto.fi/download/mesh/mesh-skos.ttl> the dataset, then can you 
> cut it down to just a small test case with some concepts with “medi” and a 
> few without? That along with the other information should help move this 
> further along..
> 
> 4) Your query: 
> 
>> PREFIX skos: <http://www.w3.org/2004/02/skos/core# 
>> <http://www.w3.org/2004/02/skos/core#>>
>> PREFIX text: <http://jena.apache.org/text# <http://jena.apache.org/text#>>
>> SELECT *
>> WHERE
>> {
>>   GRAPH <http://www.yso.fi/onto/mesh/ <http://www.yso.fi/onto/mesh/>>
>>   {
>> ?concept text:query (skos:prefLabel "medi") .
>> ?concept skos:prefLabel ?prefLabel .
>> 
>> # FILTER (  REGEX(?prefLabel, "\\bmedi", "i"))
>>   }
>> }
>> limit 10
> 
> might effectively just be executing:
> 
>> ?concept skos:prefLabel ?prefLabel .
> 
> if there is actually no jena-text config - I haven’t checked what happens 
> when there is no TextIndex configured and the text:query is invoked, but may 
> be a noop
> 
> Thanks,
> Chris
> 
> 
>> On Jan 18, 2019, at 8:08 AM, Mikael Pesonen > <mailto:mikael.peso...@lingsoft.fi>> wrote:
>> 
>> 
>> 
>> On 18/01/2019 13:40, Andy Seaborne wrote:
>>> 
>>> 
>>> On 17/01/2019 15:45, Mikael Pesonen wrote:
>>>> 
>>>> 
>>>> On 17/01/2019 17:38, Andy Seaborne wrote:
>>>>> 
>>>>> 
>>>>> On 17/01/2019 12:51, Mikael Pesonen wrote:
>>>>>> 
>>>>>> 
>>>>>> On 17/01/2019 13:58, Andy Seaborne wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> On 16/01/2019 12:50, Mikael Pesonen wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I'm trying to get text search work. Sparql REGEX takes few seconds to 
>>>>>>>> finish so hoping this would be faster. Application is term search 
>>>>>>>> using SKOS ontology.
>>>>>>>> 
>>>>>>>>   First tested if it's enabled by default
>>>>>>>> 
>>>>>>>>   ?concept text:query (skos:prefLabel "medi") .
>>>>>>>>?concept skos:prefLabel ?prefLabel
>>>>>>>> 
>>>>>>>> That returns all concepts so I 

Re: Text search from sparql

2019-01-18 Thread Chris Tomlinson
Hi,

1) If you’re using a default config, it does not have a working jena-text 
configuration. The config will need to  include skos:prefLabel in the entity 
map.

2) when you change the jena-text in significant ways, such as changing what 
analyzer is used for a given property and so on, then you’ll need to rebuild 
the Lucene index via reloading the dataset or using the textIndexer 
.
 I don’t recall this being mentioned as part of your testing

3) Please indicate exactly which item you’re using jena-fuseki-war-3.9.0.war or 
jena-fuseki-webapp-3.9.0.jar etc, and the config file itself. The error you’ve 
mentioned previously:

> Jan 17 17:00:28 semantic-dev java[16800]: [2019-01-17 17:00:28] Config 
> INFO  Load configuration: 
> file:///home/text/tools/apache-jena-fuseki-3.9.0/run/configuration/text_index.ttl
>  
> 
> Jan 17 17:00:28 semantic-dev java[16800]: [2019-01-17 17:00:28] WebAppContext 
> WARN  Failed startup of context o.e.j.w.WebAppContext@4159e81b{Apache Jena 
> Fuseki 
> Server,/,file:///home/text/tools/apache-jena-fuseki-3.9.0/webapp/,UNAVAILABLE 
> }
> Jan 17 17:00:28 semantic-dev java[16800]: at 
> org.apache.jena.fuseki.build.FusekiConfig.readAssemblerFile(FusekiConfig.java:148)

suggests to me that something in the config file is confusing the 
readAssemblerFile. It doesn’t look like it’s failing in the reading the 
jena-text portion of the config.

If http://api.finto.fi/download/mesh/mesh-skos.ttl 
 the dataset, then can you cut 
it down to just a small test case with some concepts with “medi” and a few 
without? That along with the other information should help move this further 
along..

4) Your query: 

> PREFIX skos:  >
> PREFIX text: >
> SELECT *
> WHERE
> {
>   GRAPH >
>   {
> ?concept text:query (skos:prefLabel "medi") .
> ?concept skos:prefLabel ?prefLabel .
> 
> # FILTER (  REGEX(?prefLabel, "\\bmedi", "i"))
>   }
> }
> limit 10

might effectively just be executing:

> ?concept skos:prefLabel ?prefLabel .

if there is actually no jena-text config - I haven’t checked what happens when 
there is no TextIndex configured and the text:query is invoked, but may be a 
noop

Thanks,
Chris


> On Jan 18, 2019, at 8:08 AM, Mikael Pesonen  
> wrote:
> 
> 
> 
> On 18/01/2019 13:40, Andy Seaborne wrote:
>> 
>> 
>> On 17/01/2019 15:45, Mikael Pesonen wrote:
>>> 
>>> 
>>> On 17/01/2019 17:38, Andy Seaborne wrote:
 
 
 On 17/01/2019 12:51, Mikael Pesonen wrote:
> 
> 
> On 17/01/2019 13:58, Andy Seaborne wrote:
>> 
>> 
>> On 16/01/2019 12:50, Mikael Pesonen wrote:
>>> 
>>> Hi,
>>> 
>>> I'm trying to get text search work. Sparql REGEX takes few seconds to 
>>> finish so hoping this would be faster. Application is term search using 
>>> SKOS ontology.
>>> 
>>>   First tested if it's enabled by default
>>> 
>>>   ?concept text:query (skos:prefLabel "medi") .
>>>?concept skos:prefLabel ?prefLabel
>>> 
>>> That returns all concepts so I guess it's not enabled.
>> 
>> If it returns all concepts, the first line matched (otherwise you get 
>> none). If so, there is a text index and "medi" (case insensitive) 
>> matches Lucene rules, everything.
> What does this mean then, why is it matching everything?
 
 If zero matches, you don't get to ?concept skos:prefLabel ?prefLabel (if 
 the text index is correct)
 
 The query above, if the index is setup correctly,  gets all concepts where 
 any skos:prefLabel matches "medi" (not just at the start), then gets all 
 skos:prefLabel for those concepts. That does not mean ?prefLabel only 
 matches "medi"
 
 :c skos:prefLabel "medi" ;
skos:prefLabel "Other" .
 
 will return 2 matches including ?prefLabel="Other"
>>> Yes that is how I understood it. But  ?concept text:query (skos:prefLabel 
>>> "medi")  returns all concepts, also those that don't have any label having 
>>> "medi".
>> 
>> Then I don't understand what is going on.
>> 
>> Do you have a complete, minimal example that someone can use to recreate the 
>> situation?
>> 
> This is the query:
> 
> PREFIX skos: 
> PREFIX text: 
> SELECT *
> WHERE
> {
>   GRAPH 
>   {
> ?concept text:query (skos:prefLabel "medi") .
> ?concept skos:prefLabel ?prefLabel .
> 
> # FILTER (  REGEX(?prefLabel, "\\bmedi", "i"))
>   }
> }
> limit 10
> 
> and graph is dump copied from here:  https://finto.fi/mesh/en/
> end of page "Download this vocabulary"
> 
> So to make clear, we have made zero configuration 

Re: Error in Jena code, when switch to another computer

2018-05-26 Thread Chris Tomlinson
Encoding mismatch or a stray xml byte order mark: 
http://illegalargumentexception.blogspot.com/2010/09/java-content-is-not-allowed-in-prolog.html

> On May 26, 2018, at 20:15, javed khan  wrote:
> 
> Thanks for your reply
> 
> I use almost the same versions of Java, Netbeans and Protege as my previous
> system.
> 
> The error is:
> 
> ERROR [main] (RDFDefaultErrorHandler.java:44) -
> file:///C:/Users/Utente/Downloads/LAPTOP/getset/(line 1 column 1): Content
> is not allowed in prolog.
> Exception in thread "main" org.xml.sax.SAXParseException; systemId:
> file:///C:/Users/Utente/Downloads/LAPTOP/getset/; lineNumber: 1;
> columnNumber: 1; Content is not allowed in prolog.
> 
> On Sun, May 27, 2018 at 2:05 AM, Bruno P. Kinoshita <
> brunodepau...@yahoo.com.br.invalid> wrote:
> 
>> Very likely something in your environment, especially given it's working
>> on the other laptop.
>> I think it won't be easy for others to spot what's wrong, without looking
>> at the code, and without knowing more about your environment.
>> Might be easier for yourself to compare the two environments and try to
>> isolate what's different, especially Java version, and IDE configuration.
>> CheersBruno
>> 
>>  From: javed khan 
>> To: users@jena.apache.org
>> Sent: Sunday, 27 May 2018 11:12 AM
>> Subject: Error in Jena code, when switch to another computer
>> 
>> Hello
>> 
>> I have created an application using Jena and Java swings. It was working
>> fine. I switched to another laptop and opened the same file on new laptop
>> (I re-installed Netbeans), but it gives me error everywhere asLiteral() and
>> asResource() are used. The  asLiteral() and asResource() are used in
>> multiple places but everywhere it is highlighted as error.
>> 
>> Could you please highlight the issue ? What could be the problem?
>> 
>> Regards
>> 
>> 
>> 
>> 


Re: Fuseki as Tomcat app: Setting FUSEKI_BASE

2018-02-22 Thread Chris Tomlinson
Hi Christian,

Are you sure that the FUSEKI_BASE is defined in the environment when tomcat is 
run? If you’re running tomcat as a service on Linux, for example, then you 
would need to add the export to the service or systemd definition that is used 
to run tomcat.

Chris

> On Feb 22, 2018, at 12:26 PM, Christian Schwaderer <c_schwade...@hotmail.com> 
> wrote:
> 
> Hi Chris,
> 
> 
> thanks for that hint. However, it didn't work for me. Even after rebooting 
> the system, the error is still the same: 
> "org.apache.jena.fuseki.FusekiConfigException: FUSEKI_BASE is not writable: 
> /etc/fuseki"
> 
> However,
> 
> echo $FUSEKI_BASE
> 
> gives me the changed directory.
> 
> 
> Best,
> 
> Christian
> 
> 
> 
> 
> Von: Chris Tomlinson <chris.j.tomlin...@gmail.com>
> Gesendet: Donnerstag, 22. Februar 2018 17:45
> An: users@jena.apache.org
> Betreff: Re: Fuseki as Tomcat app: Setting FUSEKI_BASE
> 
> Hi Christian,
> 
> You will need to ensure that FUSEKI_BASE is defined in the environment the 
> tomcat is run in, like:
> 
>export FUSEKI_BASE=/usr/local/fuseki/base
> 
> Chris
> 
> 
>> On Feb 22, 2018, at 10:57 AM, Christian Schwaderer 
>> <c_schwade...@hotmail.com> wrote:
>> 
>> Dear all,
>> 
>> my question might be stupid and rather basic, but I cannot find an answer 
>> anywhere.
>> 
>> 
>> So, I set up Fuseki 2.3 as a Tomcat 7 Web app. However, I cannot start it, 
>> since
>> 
>> "org.apache.jena.fuseki.FusekiConfigException: FUSEKI_BASE is not writable: 
>> /etc/fuseki"
>> 
>> 
>> I now want to change FUSEKI_BASE to a different directoy - where I can 
>> safely change permissions (what I would consider a bad idea for /etc...).
>> 
>> 
>> But I have no idea where and how to do that.
>> 
>> 
>> Thanks in advance and best,
>> 
>> Christian
>> 
> 



Re: Fuseki as Tomcat app: Setting FUSEKI_BASE

2018-02-22 Thread Chris Tomlinson
Hi Christian,

You will need to ensure that FUSEKI_BASE is defined in the environment the 
tomcat is run in, like:

export FUSEKI_BASE=/usr/local/fuseki/base

Chris


> On Feb 22, 2018, at 10:57 AM, Christian Schwaderer  
> wrote:
> 
> Dear all,
> 
> my question might be stupid and rather basic, but I cannot find an answer 
> anywhere.
> 
> 
> So, I set up Fuseki 2.3 as a Tomcat 7 Web app. However, I cannot start it, 
> since
> 
> "org.apache.jena.fuseki.FusekiConfigException: FUSEKI_BASE is not writable: 
> /etc/fuseki"
> 
> 
> I now want to change FUSEKI_BASE to a different directoy - where I can safely 
> change permissions (what I would consider a bad idea for /etc...).
> 
> 
> But I have no idea where and how to do that.
> 
> 
> Thanks in advance and best,
> 
> Christian
> 



Re: Fuseki errors with concurrent requests

2018-01-29 Thread Chris Tomlinson
Yes the loading is single-threaded. Its a simple app that performs bulk loading 
using:

DatasetAccessorFactory.createHTTP(baseUrl+"/data”);


and for the first model to transfer:

DatasetAccessor putModel(graphName, m);

and for following models:

static void addToTransferBulk(final String graphName, final Model m) {
if (currentDataset == null)
currentDataset = DatasetFactory.createGeneral();
currentDataset.addNamedModel(graphName, m);
triplesInDataset += m.size();
if (triplesInDataset > initialLoadBulkSize) {
try {
loadDatasetMutex(currentDataset);
currentDataset = null;
triplesInDataset = 0;
} catch (TimeoutException e) {
e.printStackTrace();
return;
}
}
}


> On Jan 29, 2018, at 9:30 AM, ajs6f <aj...@apache.org> wrote:
> 
> That (using a queue) would depend on what you mean by "a database load going 
> on with our own app". Are you doing those updates singlethreaded?
> 
> ajs6f
> 
>> On Jan 29, 2018, at 10:27 AM, Chris Tomlinson <chris.j.tomlin...@gmail.com> 
>> wrote:
>> 
>> I don’t have any test code per se. I was running a load tool that we wrote 
>> (code snippets included) and issuing a few simple sparql’s via the fuseki 
>> browser app. That’s the extent of the test harness.
>> 
>> I don’t see how the set up that I described would make use of a queue.
>> 
>> Chris
>> 
>> 
>>> On Jan 29, 2018, at 8:56 AM, ajs6f <aj...@apache.org> wrote:
>>> 
>>> That might be worth trying, although since TDB1 is MRSW (multiple reader or 
>>> single writer), that queuing of updates should be going on on the 
>>> server-side.
>>> 
>>> I haven't had time to look at this issue, and it's difficult to say much 
>>> without a reproducible phenomenon. Do you either of y'all have test code we 
>>> can use to demonstrate this?
>>> 
>>> ajs6f
>>> 
>>>> On Jan 29, 2018, at 5:43 AM, Mikael Pesonen <mikael.peso...@lingsoft.fi> 
>>>> wrote:
>>>> 
>>>> 
>>>> Until better solution, quick one would be to put all operations through a 
>>>> single queue?
>>>> 
>>>> Br
>>>> 
>>>> On 25.1.2018 4:11, Chris Tomlinson wrote:
>>>>> Also,
>>>>> 
>>>>> Here's a link to the fuseki config:
>>>>> 
>>>>> https://raw.githubusercontent.com/BuddhistDigitalResourceCenter/buda-base/master/conf/fuseki/bdrc-example.ttl
>>>>> 
>>>>> Chris
>>>>> 
>>>>>> On Jan 24, 2018, at 17:40, Chris Tomlinson <chris.j.tomlin...@gmail.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> On the latest 3.7.0-Snapshot (master branch) I also saw repeated 
>>>>>> occurrences of this the other day while running some queries from the 
>>>>>> fuseki browser app and with a database load going on with our own app 
>>>>>> using:
>>>>>> 
>>>>>>  DatasetAccessorFactory.createHTTP(baseUrl+"/data”);
>>>>>> 
>>>>>> 
>>>>>> with for the first model to transfer:
>>>>>> 
>>>>>>  DatasetAccessor putModel(graphName, m);
>>>>>> 
>>>>>> and for following models:
>>>>>> 
>>>>>>  static void addToTransferBulk(final String graphName, final Model m) {
>>>>>>  if (currentDataset == null)
>>>>>>  currentDataset = DatasetFactory.createGeneral();
>>>>>>  currentDataset.addNamedModel(graphName, m);
>>>>>>  triplesInDataset += m.size();
>>>>>>  if (triplesInDataset > initialLoadBulkSize) {
>>>>>>  try {
>>>>>>  loadDatasetMutex(currentDataset);
>>>>>>  currentDataset = null;
>>>>>>  triplesInDataset = 0;
>>>>>>  } catch (TimeoutException e) {
>>>>>>  e.printStackTrace();
>>>>>>  return;
>>>>>>  }
>>>>>>  }
>>>>>>  }
>>>>>> 
>>>>>> as I say the exceptions appeared while I was running some queries from 
>>>>>> from the fuseki browser ap

Re: Fuseki errors with concurrent requests

2018-01-29 Thread Chris Tomlinson
I don’t have any test code per se. I was running a load tool that we wrote 
(code snippets included) and issuing a few simple sparql’s via the fuseki 
browser app. That’s the extent of the test harness.

I don’t see how the set up that I described would make use of a queue.

Chris


> On Jan 29, 2018, at 8:56 AM, ajs6f <aj...@apache.org> wrote:
> 
> That might be worth trying, although since TDB1 is MRSW (multiple reader or 
> single writer), that queuing of updates should be going on on the server-side.
> 
> I haven't had time to look at this issue, and it's difficult to say much 
> without a reproducible phenomenon. Do you either of y'all have test code we 
> can use to demonstrate this?
> 
> ajs6f
> 
>> On Jan 29, 2018, at 5:43 AM, Mikael Pesonen <mikael.peso...@lingsoft.fi> 
>> wrote:
>> 
>> 
>> Until better solution, quick one would be to put all operations through a 
>> single queue?
>> 
>> Br
>> 
>> On 25.1.2018 4:11, Chris Tomlinson wrote:
>>> Also,
>>> 
>>> Here's a link to the fuseki config:
>>> 
>>> https://raw.githubusercontent.com/BuddhistDigitalResourceCenter/buda-base/master/conf/fuseki/bdrc-example.ttl
>>> 
>>> Chris
>>> 
>>>> On Jan 24, 2018, at 17:40, Chris Tomlinson <chris.j.tomlin...@gmail.com> 
>>>> wrote:
>>>> 
>>>> On the latest 3.7.0-Snapshot (master branch) I also saw repeated 
>>>> occurrences of this the other day while running some queries from the 
>>>> fuseki browser app and with a database load going on with our own app 
>>>> using:
>>>> 
>>>>DatasetAccessorFactory.createHTTP(baseUrl+"/data”);
>>>> 
>>>> 
>>>> with for the first model to transfer:
>>>> 
>>>>DatasetAccessor putModel(graphName, m);
>>>> 
>>>> and for following models:
>>>> 
>>>>static void addToTransferBulk(final String graphName, final Model m) {
>>>>if (currentDataset == null)
>>>>currentDataset = DatasetFactory.createGeneral();
>>>>currentDataset.addNamedModel(graphName, m);
>>>>triplesInDataset += m.size();
>>>>if (triplesInDataset > initialLoadBulkSize) {
>>>>try {
>>>>loadDatasetMutex(currentDataset);
>>>>currentDataset = null;
>>>>triplesInDataset = 0;
>>>>} catch (TimeoutException e) {
>>>>e.printStackTrace();
>>>>return;
>>>>}
>>>>}
>>>>}
>>>> 
>>>> as I say the exceptions appeared while I was running some queries from 
>>>> from the fuseki browser app:
>>>> 
>>>>> [2018-01-22 16:25:02] Fuseki INFO  [475] 200 OK (17.050 s)
>>>>> [2018-01-22 16:25:03] Fuseki INFO  [477] POST 
>>>>> http://localhost:13180/fuseki/bdrcrw
>>>>> [2018-01-22 16:25:03] BindingTDB ERROR get1(?lit)
>>>>> org.apache.jena.tdb.base.file.FileException: In the middle of an 
>>>>> alloc-write
>>>>>   at 
>>>>> org.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:311)
>>>>>   at 
>>>>> org.apache.jena.tdb.base.objectfile.ObjectFileWrapper.read(ObjectFileWrapper.java:57)
>>>>>   at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78)
>>>>>   at 
>>>>> org.apache.jena.tdb.store.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:186)
>>>>>   at 
>>>>> org.apache.jena.tdb.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:111)
>>>>>   at 
>>>>> org.apache.jena.tdb.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:70)
>>>>>   at 
>>>>> org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:128)
>>>>>   at 
>>>>> org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:82)
>>>>>   at 
>>>>> org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:50)
>>>>>   at 
>>>>> org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)
>>>>>   at org.apache.jena.tdb.solv

Re: Fuseki errors with concurrent requests

2018-01-24 Thread Chris Tomlinson
Also,

Here's a link to the fuseki config:

https://raw.githubusercontent.com/BuddhistDigitalResourceCenter/buda-base/master/conf/fuseki/bdrc-example.ttl

Chris

> On Jan 24, 2018, at 17:40, Chris Tomlinson <chris.j.tomlin...@gmail.com> 
> wrote:
> 
> On the latest 3.7.0-Snapshot (master branch) I also saw repeated occurrences 
> of this the other day while running some queries from the fuseki browser app 
> and with a database load going on with our own app using:
> 
> DatasetAccessorFactory.createHTTP(baseUrl+"/data”);
> 
> 
> with for the first model to transfer:
> 
> DatasetAccessor putModel(graphName, m);
> 
> and for following models:
> 
> static void addToTransferBulk(final String graphName, final Model m) {
> if (currentDataset == null)
> currentDataset = DatasetFactory.createGeneral();
> currentDataset.addNamedModel(graphName, m);
> triplesInDataset += m.size();
> if (triplesInDataset > initialLoadBulkSize) {
> try {
> loadDatasetMutex(currentDataset);
> currentDataset = null;
> triplesInDataset = 0;
> } catch (TimeoutException e) {
> e.printStackTrace();
> return;
> }
> }
> }
> 
> as I say the exceptions appeared while I was running some queries from from 
> the fuseki browser app:
> 
>> [2018-01-22 16:25:02] Fuseki INFO  [475] 200 OK (17.050 s)
>> [2018-01-22 16:25:03] Fuseki INFO  [477] POST 
>> http://localhost:13180/fuseki/bdrcrw
>> [2018-01-22 16:25:03] BindingTDB ERROR get1(?lit)
>> org.apache.jena.tdb.base.file.FileException: In the middle of an alloc-write
>>  at 
>> org.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:311)
>>  at 
>> org.apache.jena.tdb.base.objectfile.ObjectFileWrapper.read(ObjectFileWrapper.java:57)
>>  at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78)
>>  at 
>> org.apache.jena.tdb.store.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:186)
>>  at 
>> org.apache.jena.tdb.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:111)
>>  at 
>> org.apache.jena.tdb.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:70)
>>  at 
>> org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:128)
>>  at 
>> org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:82)
>>  at 
>> org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:50)
>>  at 
>> org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)
>>  at org.apache.jena.tdb.solver.BindingTDB.get1(BindingTDB.java:122)
>>  at 
>> org.apache.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:121)
>>  at org.apache.jena.sparql.expr.ExprVar.eval(ExprVar.java:60)
>>  at org.apache.jena.sparql.expr.ExprVar.eval(ExprVar.java:53)
>>  at org.apache.jena.sparql.expr.ExprNode.eval(ExprNode.java:93)
>>  at org.apache.jena.sparql.expr.ExprFunction2.eval(ExprFunction2.java:76)
>>  at 
>> org.apache.jena.sparql.expr.E_LogicalOr.evalSpecial(E_LogicalOr.java:58)
>>  at org.apache.jena.sparql.expr.ExprFunction2.eval(ExprFunction2.java:72)
>>  at org.apache.jena.sparql.expr.ExprNode.isSatisfied(ExprNode.java:41)
>>  at 
>> org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr.accept(QueryIterFilterExpr.java:49)
>>  at 
>> org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:69)
>>  at 
>> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
>>  at 
>> org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:66)
>>  at 
>> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
>>  at 
>> org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:66)
>>  at 
>> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
>>  at 
>> org.apache.jena.sparql.engine.iterator.QueryIterConcat.hasNextBinding(QueryIterConcat.java:82)
>>  at 
>> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
>&g

Re: Fuseki errors with concurrent requests

2018-01-24 Thread Chris Tomlinson
On the latest 3.7.0-Snapshot (master branch) I also saw repeated occurrences of 
this the other day while running some queries from the fuseki browser app and 
with a database load going on with our own app using:

DatasetAccessorFactory.createHTTP(baseUrl+"/data”);


with for the first model to transfer:

DatasetAccessor putModel(graphName, m);

and for following models:

static void addToTransferBulk(final String graphName, final Model m) {
if (currentDataset == null)
currentDataset = DatasetFactory.createGeneral();
currentDataset.addNamedModel(graphName, m);
triplesInDataset += m.size();
if (triplesInDataset > initialLoadBulkSize) {
try {
loadDatasetMutex(currentDataset);
currentDataset = null;
triplesInDataset = 0;
} catch (TimeoutException e) {
e.printStackTrace();
return;
}
}
}

as I say the exceptions appeared while I was running some queries from from the 
fuseki browser app:

> [2018-01-22 16:25:02] Fuseki INFO  [475] 200 OK (17.050 s)
> [2018-01-22 16:25:03] Fuseki INFO  [477] POST 
> http://localhost:13180/fuseki/bdrcrw
> [2018-01-22 16:25:03] BindingTDB ERROR get1(?lit)
> org.apache.jena.tdb.base.file.FileException: In the middle of an alloc-write
>   at 
> org.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:311)
>   at 
> org.apache.jena.tdb.base.objectfile.ObjectFileWrapper.read(ObjectFileWrapper.java:57)
>   at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78)
>   at 
> org.apache.jena.tdb.store.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:186)
>   at 
> org.apache.jena.tdb.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:111)
>   at 
> org.apache.jena.tdb.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:70)
>   at 
> org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:128)
>   at 
> org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:82)
>   at 
> org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:50)
>   at 
> org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)
>   at org.apache.jena.tdb.solver.BindingTDB.get1(BindingTDB.java:122)
>   at 
> org.apache.jena.sparql.engine.binding.BindingBase.get(BindingBase.java:121)
>   at org.apache.jena.sparql.expr.ExprVar.eval(ExprVar.java:60)
>   at org.apache.jena.sparql.expr.ExprVar.eval(ExprVar.java:53)
>   at org.apache.jena.sparql.expr.ExprNode.eval(ExprNode.java:93)
>   at org.apache.jena.sparql.expr.ExprFunction2.eval(ExprFunction2.java:76)
>   at 
> org.apache.jena.sparql.expr.E_LogicalOr.evalSpecial(E_LogicalOr.java:58)
>   at org.apache.jena.sparql.expr.ExprFunction2.eval(ExprFunction2.java:72)
>   at org.apache.jena.sparql.expr.ExprNode.isSatisfied(ExprNode.java:41)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr.accept(QueryIterFilterExpr.java:49)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:69)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:66)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:66)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterConcat.hasNextBinding(QueryIterConcat.java:82)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterRepeatApply.hasNextBinding(QueryIterRepeatApply.java:74)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:58)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterDistinct.getInputNextUnseen(QueryIterDistinct.java:104)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIterDistinct.hasNextBinding(QueryIterDistinct.java:70)
>   at 
> org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:114)
>   at 
> 

fuseki 3.5.0 upload error

2017-11-06 Thread Chris Tomlinson
Hi,

I am seeing the following error in fuseki 3.5.0 when I use the “upload files” 
tab:

Result: failed with message "SyntaxError: JSON Parse error: Unrecognized 
token '<‘"

I’ve tried ttl, jsonld and rdf/xml versions of the same set of triples:

@prefix ex:  .
@prefix skos:  .

ex:SomeOne
  a   ex:Item ;
  skos:prefLabel "abc def ghi”@en ;
  skos:altLabel "jkl mno pqr”@en ;
   .

It works fine in fuseki 3.4.0.

Regards,
Chris



Re: jena-text highlighting support

2017-10-22 Thread Chris Tomlinson
Hello,

With the extraneous part of the original email having been split-off, I'm 
interested to know if and how other users have provided for highlighting of 
Lucene search matches when using jena-text. 

If the literals are relatively short then perhaps highlighting where a match 
was found by Lucene is not considered necessary; however, if the literals are 
several hundreds of code-points then it becomes more useful to help with 
ultimately displaying results to users. It seems like it might be feasible to 
provide a 4th return parameter that could identify the start and end of each 
match in the returned literal.

Thanks,
Chris


> On Oct 20, 2017, at 2:00 PM, Chris Tomlinson <chris.j.tomlin...@gmail.com> 
> wrote:
> 
> Hi,
> 
> I’m interested in looking into whether and how it might be possible to 
> incorporate Lucene highlighting into jena-text. I don’t see any other work, 
> but perhaps others have dealt with the topic already. I was thinking of some 
> sort of a 4th return parameter in the PF.


Re: Fuseki TDB database size growth

2017-08-30 Thread Chris Tomlinson
Hi,

We’re going to explore using TDB2 with online compaction. We’ll be looking at 
the behavior under graph deletion and large literals. Our use case is a library 
with associated cultural heritage information (the current instance is 
https://www.tbrc.org). New information is added and corrections are made to 
existing items.

If an update is made to a Work or Person and so on that would be effected in 
Jena via deleting the corresponding named graph and uploading a revised named 
graph for the individual. We intend to keep track of diffs by using git with 
turtle files for each named graph, external to Jena.

So the online compaction would be an essential feature to keep blank nodes from 
growing unbounded.

No we do not generally expect to rewrite the entire db.

Thanks,
Chris


> On Aug 30, 2017, at 4:27 AM, Rob Vesse <rve...@dotnetrdf.org> wrote:
> 
> No, it is perfectly usable as a primary database
> 
> However, if your use case regularly rewrites your entire database then you 
> are going to have problems and this would be true of any database system, 
> although obviously implementation specifics will have an impact on this.
> 
> Rob
> 
> On 22/08/2017 03:22, "Chris Tomlinson" <chris.j.tomlin...@gmail.com> wrote:
> 
>Hi,
> 
>This is interesting to know about blank nodes and reference counting. Does 
> the comment regarding deleting triples not recovering blank nodes apply if an 
> entire named graph which includes some blank nodes is deleted?
> 
>If so it seems that in production Jena/TDB is expected to be periodically 
> reloaded from scratch or to not use blank nodes very much. 
> 
>In this case is Jena/TDB more aimed at use cases where it perhaps 
> functions like an index cache rather than a primary database. Is this 
> accurate? If so what sort of primary database systems are typically found 
> coupled with Jena/TDB?
> 
>Regards,
>Chris
> 
>> On Aug 21, 2017, at 05:28, Rob Vesse <rve...@dotnetrdf.org> wrote:
>> 
>> All the data structures used in TDB are broadly speaking append only. This 
>> means that the database Will tend to grow in size overtime.
>> 
>> Certain ways of using the database can exacerbate this. In your example I 
>> would guess that you have a lot of blank nodes present in the data?
>> 
>> Each unique blank node generates a unique identifier inside the system and 
>> will continually expand the node table. TDB does not implement reference 
>> counting so even if you delete every triple that references a given RDF node 
>> it will never be removed from the node table.
>> 
>> Similarly as the indexes are updated they do not reclaim space so the 
>> B+Tree’s will continue to grow over time.
>> 
>> Reloading from scratch creates a smaller database because it is able to 
>> maximally pack the data into the Data structures on disk and you do not have 
>> any unused identifiers allocated.
>> 
>> Rob
>> 
>> On 21/08/2017 11:20, "Lorenzo Manzoni" <lmanz...@imolinfo.it> wrote:
>> 
>>   Hi,
>> 
>>   I'm writing you because we have a behavior of fuseki TDB  we can not 
>>   understand:
>> 
>>   */the fuseki database filesystem size continues to grow even if the 
>>   number of triples does not increase substantially./*
>> 
>>   We are using the latest version of fuseki (3.4.0) as triple store of a 
>>   semantic media wiki (mw 1.24, smw 2.1.1) and all the night we have a 
>>   scheduled job that updates the wiki pages and executes maintenance 
>>   scripts(e.g. 
>>   
>> https://www.semantic-mediawiki.org/wiki/Help:Maintenance_script_%22rebuildData.php%22)
>>  
>>   . These scripts update the semantic data on the wiki and the triples on 
>>   fuseki. Basically every triple are rewritten.
>> 
>>   We have observed that the fuseki database filesystem size grew over time 
>>   to 20Gb but when we recreate it from scratch the database size is only 
>>   500 Mb.
>> 
>>   After that every day  fuseki database grows about 200Mb and the number 
>>   of triples does not change substantially
>> 
>>   I originally assumed that the rebuild data script was the problem but 
>>   when I executed it alone the fuseki database space did not increase.
>> 
>>   We are running fueski on a 64 bit redhat machine.
>> 
>>   Someone can  help us?
>> 
>>   Thanks in advance,
>> 
>>   Lorenzo
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> 



Re: Fuseki TDB database size growth

2017-08-22 Thread Chris Tomlinson
Hi Andy,

In our present production environment we perform daily full backups with 
multiple incremental's during the day and would expect do similar with a Jena 
based system.

We are accustomed to running the primary db without restarts or space 
consumption except for adding of new content for many months at a time. 

The backups are compressed master files of each resource which are replicated 
to various sites for archiving.

We have steady low levels of create activity and somewhat less update activity. 
Loading our test platforms takes on the order of a couple of hours from scratch 
which is similar to what we see with the XML db so that is a concern only if we 
are having to do such reloads owing to space loss as a consequence of “normal” 
usage.

My questions are trying to get a sense of how we should expect to use 
Jena/TDB/Fuseki. I was thinking to replace the current native XML db with Jena 
and we have explored some aspects but not nearly enough to understand the best 
practices with Jena.

After reading the comment from Rob regarding the no GC I had thought of a 
compaction tool and was going to inquire about such before I saw your reply. 
Now I want to ask about the status of TDB2. I see that it is at 0.3.0-SNAPSHOT 
aligned with Jena 3.4.0 and am wanting to know about its status as far as 
possible inclusion into Jena.

I was also not clear on the answer to my question regarding whether deleting a 
named graph reclaims any space in the TDB1 node table - I think you’re saying 
it does not. If so that seems to say that with TDB1 the best practice is to 
view Jean/TDB as a create and read system. With TDB2, online compaction permits 
CRUD operation so long as the rate of UD is not too high.

Are reads locked out during online compaction in TDB2?

Regards,
Chris


> On Aug 22, 2017, at 7:44 AM, Andy Seaborne <a...@apache.org> wrote:
> 
> There are several different things going on causing the DB to grow: Rob has 
> mentioned all of them:
> 
> 1/ No GC of the node table.
> 2/ Partial reuse of space in indexes [*].
> 3/ Bulk loaded database are tight-packed and update fragment after that when 
> updated.
> 
> [*] Free'd block in index are reused with transactions only.  One HTTP 
> request is one transaction so PUT will reuse the space, delete then add will 
> not.
> 
> Blank nodes, or any other kind of RDF term, in the node table are not garbage 
> collected away.
> 
> In TDB2 there is support for live compaction of a database.  (I got the 
> machinery working last weekend :-)  c.f. VACUUM in PostgreSQL or OPTIMIZE 
> TABLE in MySQL - both reclaim space.  TDB2 is more like a live copy of the 
> current state, not an in place chnage at the moment. It is more import to 
> compact in TDB2 than TDB1 because, for robustness and performance reasons, 
> the index are copy-on-first-write in a transaction.  [Odd side effect - the 
> state of the database at any point in time is still there in the files, until 
> you compact it.]
> 
> TDB1 (the version in Jena) equivalent is backup-restore.
> 
> But everyone backups anyway don't they? :-)
> 
> For any database, triplestore or SQL or anything, do not put the primary copy 
> of your data in the database unless you have an active support contract, and 
> then backup anyway (and test the backup).
> 
> On 22/08/17 03:22, Chris Tomlinson wrote:
>> Hi,
>> This is interesting to know about blank nodes and reference counting. Does 
>> the comment regarding deleting triples not recovering blank nodes apply if 
>> an entire named graph which includes some blank nodes is deleted?
>> If so it seems that in production Jena/TDB is expected to be periodically 
>> reloaded from scratch or to not use blank nodes very much.
> 
> Not delete them in bulk.
> 
>> In this case is Jena/TDB more aimed at use cases where it perhaps functions 
>> like an index cache rather than a primary database. Is this accurate? If so 
>> what sort of primary database systems are typically found coupled with 
>> Jena/TDB?
> 
> It is not aimed at OLTP-style applications where change is as common as 
> update.
> 
>Andy
> 
>> Regards,
>> Chris
>>> On Aug 21, 2017, at 05:28, Rob Vesse <rve...@dotnetrdf.org> wrote:
>>> 
>>> All the data structures used in TDB are broadly speaking append only. This 
>>> means that the database Will tend to grow in size overtime.
>>> 
>>> Certain ways of using the database can exacerbate this. In your example I 
>>> would guess that you have a lot of blank nodes present in the data?
>>> 
>>> Each unique blank node generates a unique identifier inside the system and 
>>> will continually expand the node table. TDB does not implement reference 
>>> count

Re: Fuseki TDB database size growth

2017-08-21 Thread Chris Tomlinson
Hi,

This is interesting to know about blank nodes and reference counting. Does the 
comment regarding deleting triples not recovering blank nodes apply if an 
entire named graph which includes some blank nodes is deleted?

If so it seems that in production Jena/TDB is expected to be periodically 
reloaded from scratch or to not use blank nodes very much. 

In this case is Jena/TDB more aimed at use cases where it perhaps functions 
like an index cache rather than a primary database. Is this accurate? If so 
what sort of primary database systems are typically found coupled with Jena/TDB?

Regards,
Chris

> On Aug 21, 2017, at 05:28, Rob Vesse  wrote:
> 
> All the data structures used in TDB are broadly speaking append only. This 
> means that the database Will tend to grow in size overtime.
> 
> Certain ways of using the database can exacerbate this. In your example I 
> would guess that you have a lot of blank nodes present in the data?
> 
> Each unique blank node generates a unique identifier inside the system and 
> will continually expand the node table. TDB does not implement reference 
> counting so even if you delete every triple that references a given RDF node 
> it will never be removed from the node table.
> 
> Similarly as the indexes are updated they do not reclaim space so the 
> B+Tree’s will continue to grow over time.
> 
> Reloading from scratch creates a smaller database because it is able to 
> maximally pack the data into the Data structures on disk and you do not have 
> any unused identifiers allocated.
> 
> Rob
> 
> On 21/08/2017 11:20, "Lorenzo Manzoni"  wrote:
> 
>Hi,
> 
>I'm writing you because we have a behavior of fuseki TDB  we can not 
>understand:
> 
>*/the fuseki database filesystem size continues to grow even if the 
>number of triples does not increase substantially./*
> 
>We are using the latest version of fuseki (3.4.0) as triple store of a 
>semantic media wiki (mw 1.24, smw 2.1.1) and all the night we have a 
>scheduled job that updates the wiki pages and executes maintenance 
>scripts(e.g. 
>
> https://www.semantic-mediawiki.org/wiki/Help:Maintenance_script_%22rebuildData.php%22)
>  
>. These scripts update the semantic data on the wiki and the triples on 
>fuseki. Basically every triple are rewritten.
> 
>We have observed that the fuseki database filesystem size grew over time 
>to 20Gb but when we recreate it from scratch the database size is only 
>500 Mb.
> 
>After that every day  fuseki database grows about 200Mb and the number 
>of triples does not change substantially
> 
>I originally assumed that the rebuild data script was the problem but 
>when I executed it alone the fuseki database space did not increase.
> 
>We are running fueski on a 64 bit redhat machine.
> 
>Someone can  help us?
> 
>Thanks in advance,
> 
>Lorenzo
> 
> 
> 
> 
> 


Re: Performance with very long strings - Re: large literals best practice?

2017-08-21 Thread Chris Tomlinson
Andy,

Thank you for the reply. I suspected that Jena/TDB might be targeted at 
somewhat different use cases. Is there a document somewhere that characterizes 
the sort of assumptions about how Jena/TDB are expcted to be used?

We'll explore our use case and let you know what we find.

Thank you again,
Chris


> On Aug 20, 2017, at 11:00, Andy Seaborne <a...@apache.org> wrote:
> 
> I don't have any experience running about thing like this. I was hoping to 
> learn from other people's experiences.
> 
> From a base-technology point of view, this isn't TDB's design centre so theer 
> may be hot-spots. The only real way to know if it is acceptable is to try an 
> experiment. It will depend on what you want to do with the store.
> 
> With 230K blobs of 17Kbytes, doing SPARQL-searching of text (regex(), 
> contains()) will be expensive.  So that is a requirement, a text index is 
> probably necessary whether you store the page content in RDF or not.
> 
> One area will be the TDB node cache, the cache of internal TDB NodeId-> 
> RDFterm (Node). This is a count-based, and does not consider the size of item 
> cached. The cache is going to keep pages cached so it's going to use heap RAM 
> especially as characters are 2 bytes.  There again, it's only 10G or so.
> 
> See the documentation for tuning caches:
> https://jena.apache.org/documentation/tdb/store-parameters.html
> 
>Andy
> 
>> On 19/08/17 15:20, Chris Tomlinson wrote:
>> Hi again,
>> Is anyone aware of any issues that may arise when storing triples in TDB 
>> that have very large string literals (~17KB)?
>> The use case is illustrated below. This seems a reasonable question under 
>> the assumption that literals are presumed to be small - like names, titles, 
>> maybe summaries or abstracts and such, rather than entire pages of text.
>> Thanks,
>> Chris
>>> On Aug 17, 2017, at 12:48 PM, Chris Tomlinson <chris.j.tomlin...@gmail.com> 
>>> wrote:
>>> 
>>> Hello,
>>> 
>>> We have 23K texts averaging 10 pp/text (total pages: 229K) and ~17KB/page, 
>>> for a total of 4GB of text. These texts are currently indexed via Lucene in 
>>> an XMLdb and we’re wanting to know if there are any known issues regarding 
>>> large literals in Jena.
>>> 
>>> In other words we are considering storing the texts like:
>>> 
>>> :Text_08357 a :EText ;
>>> various metadata about the EText
>>> :hasPage
>>>   [ :pageNum 1 ;
>>> :content “. . . 17,000 Bytes . . .” ] ,
>>>   [ :pageNum 2 ;
>>> :content “. . . 17,000 Bytes . . .” ] ,
>>>   . . .
>>> 
>>> We know that Lucene is happy with this data, but we’re not sure whether 
>>> Jena/TDB will be stressed with 229K triples with 17KB literals.
>>> 
>>> The Jena-text offers the possibility of indexing in Lucene via a separate 
>>> process and just using the search in Jena without actually storing the 
>>> literals in TDB. This is a somewhat complex configuration and it would be 
>>> preferred to not use this approach unless the size of the literals will 
>>> present a problem.
>>> 
>>> Thank you,
>>> Chris
>>> 
>>> 


Performance with very long strings - Re: large literals best practice?

2017-08-19 Thread Chris Tomlinson
Hi again,

Is anyone aware of any issues that may arise when storing triples in TDB that 
have very large string literals (~17KB)?

The use case is illustrated below. This seems a reasonable question under the 
assumption that literals are presumed to be small - like names, titles, maybe 
summaries or abstracts and such, rather than entire pages of text.

Thanks,
Chris


> On Aug 17, 2017, at 12:48 PM, Chris Tomlinson <chris.j.tomlin...@gmail.com> 
> wrote:
> 
> Hello,
> 
> We have 23K texts averaging 10 pp/text (total pages: 229K) and ~17KB/page, 
> for a total of 4GB of text. These texts are currently indexed via Lucene in 
> an XMLdb and we’re wanting to know if there are any known issues regarding 
> large literals in Jena.
> 
> In other words we are considering storing the texts like:
> 
> :Text_08357 a :EText ;
> various metadata about the EText
> :hasPage 
>   [ :pageNum 1 ;
> :content “. . . 17,000 Bytes . . .” ] ,
>   [ :pageNum 2 ;
> :content “. . . 17,000 Bytes . . .” ] ,
>   . . .
> 
> We know that Lucene is happy with this data, but we’re not sure whether 
> Jena/TDB will be stressed with 229K triples with 17KB literals.
> 
> The Jena-text offers the possibility of indexing in Lucene via a separate 
> process and just using the search in Jena without actually storing the 
> literals in TDB. This is a somewhat complex configuration and it would be 
> preferred to not use this approach unless the size of the literals will 
> present a problem.
> 
> Thank you,
> Chris
> 
> 



large literals best practice?

2017-08-17 Thread Chris Tomlinson
Hello,

We have 23K texts averaging 10 pp/text (total pages: 229K) and ~17KB/page, for 
a total of 4GB of text. These texts are currently indexed via Lucene in an 
XMLdb and we’re wanting to know if there are any known issues regarding large 
literals in Jena.

In other words we are considering storing the texts like:

:Text_08357 a :EText ;
various metadata about the EText
:hasPage 
  [ :pageNum 1 ;
:content “. . . 17,000 Bytes . . .” ] ,
  [ :pageNum 2 ;
:content “. . . 17,000 Bytes . . .” ] ,
  . . .

We know that Lucene is happy with this data, but we’re not sure whether 
Jena/TDB will be stressed with 229K triples with 17KB literals.

The Jena-text offers the possibility of indexing in Lucene via a separate 
process and just using the search in Jena without actually storing the literals 
in TDB. This is a somewhat complex configuration and it would be preferred to 
not use this approach unless the size of the literals will present a problem.

Thank you,
Chris




Re: statement ids, rdf* and reification

2017-08-08 Thread Chris Tomlinson
Hi,

Thank so much for such a thorough response. For now we’re looking closely at 
using reification of complete statements. The event-based approach seems 
limited to situations in which annotations are attached to single statement 
complexes within a single graph and we would like to support annotations that 
apply to sets of statements that may be in several named graphs. Also the 
event-based approach seemed also a bit confining when applied to simple cases 
such as:

subject property literal

especially as we wanted to retain information on ranges in our current 
ontology. Given a skeletal example:

bdr:W12827 a :Work ;
:workLccn “75903140” .

It seems workable to use reification in a stylized manner:

stmt:W12827_S0001 a rdf:Statement ;
rdf:subject bdr:W12827 ;
rdf:property :hasLCCN ;
rdf:object “75903140” ;
:retrieved [ 
:retrievedFrom http://lccn.loc.gov/75903140 
<http://lccn.loc.gov/75903140> ;
:retrievedOn “12/27/1998”^^xsd:date ] . 

It is clear that the above is not attaching the annotation to the original 
statement but rather associating the annotation with a description that is 
associated with the original statement by a convention outside the system as it 
were. This is not appealing but it seems workable.

How would an event-based approach (I’ve been thinking of it as a blank-node 
approach or “everything n-ary”) work across an arbitrary set of statements 
possibly across two or more named graphs?

After we have some experience with the reification approach perhaps we can make 
an informed proposal for a reification oriented extension to Jena.

We’re really interested in what approaches are currently taken for supporting 
annotations other than RDR, single property, and event-based. How important are 
annotations in current mainstream usage of Jena?

Thank you again for your reply,
Chris


> On Aug 8, 2017, at 5:37 AM, Andy Seaborne <a...@apache.org> wrote:
> 
> 
> 
> On 07/08/17 19:35, Chris Tomlinson wrote:
>> Hello,
>> We're investigating various approaches to adding annotations about
>> individual statements (or perhaps rarely a subset of statements) of a
>> named graph.
>> There’s note from 2015, Re: Performance Cost of Reification
>> <http://apache.markmail.org/message/js6s6ry5st73soay>, that mentions a
>> syntax like:
>> <>,
>> that was proposed for use in Sparql 1.0 and that at the time of the note was 
>> still in the ARQ
>> parser source.
> 
> The <<>> syntax, as in ARQ and discussed in SPARQL 1.0 is shorthand for 
> writing out reification, not an extension to the data model nor semantics.
> 
> <> is syntax for
> 
> ? rdf:subject s
> ? rdf:property p
> ? rdf:object o
> 
> i.e. not a triple id.
> 
> Data and/or query can be written long hand.
> 
>> The syntax is similar to that of the Blazegraph RDF*/Sparql* 
>> <https://wiki.blazegraph.com/wiki/index.php/Reification_Done_Right> and 
>> we’re interested to know if these are related ideas and > if there is any 
>> anticipation that such an approach might ever find its way into appropriate 
>> standards.
> 
> RDF* seems to be based around the assumption a statement is reified only once 
> and that the base fact is asserted in the graph. That means triple ids make 
> sense.
> 
> Note the example:
> 
> BIND( <> AS ?t ) .
> 
> which matches the graph for ?bob foaf:age ?age and matches once to make sense 
> (it's a BIND).
> 
> Reification can be multiple times (in different files, with different 
> annotations, to be merged), and you can reify a statement without needing it 
> in the data (it's necessarily not asserted).
> 
> This is why RDF* is compatible with reification but reification is not 
> compatible with RDF* : RDF* is a subset of the reification possibilities - 
> maybe its a useful subset - different discussion.
> 
> Storage systems look like they are much easier for RDF* - it looks to be an 
> extra column on the triple/quads table.
> 
> Reification has nasty cases like partial reification (e.g. just
> "? rdf:subject s . ? rdf:property p" triples).
> 
> But it is at the modelling level, not a data model extension. Reification is 
> the ability to talk about making a claim, not the statement itself. It's not 
> adding triples to the domain of discourse; it is not working on the data 
> model level.
> 
> Other approaches extend the data model such as N3 formulae ("graphs as nodes 
> of the graph").
> 
> Named Graph are weaker - can't have a graphs in a graph - but were an aproach 
> in most common use at the time of SPARQL 1.0.
> 
>> It seems that a Jena property function extension could do some of the work 
>> of statement ids but 

statement ids, rdf* and reification

2017-08-07 Thread Chris Tomlinson
Hello,

We're investigating various approaches to adding annotations about individual 
statements (or perhaps rarely a subset of statements) of a named graph.

There’s note from 2015, Re: Performance Cost of Reification 
, that mentions a syntax 
like: 

<>, 

that was proposed for use in Sparql 1.0 and that at the time of the note was 
still in the ARQ parser source.

The syntax is similar to that of the Blazegraph RDF*/Sparql* 
 and we’re 
interested to know if these are related ideas and if there is any anticipation 
that such an approach might ever find its way into appropriate standards.

It seems that a Jena property function extension could do some of the work of 
statement ids but it would be desirable to have serialization support as well.

The 2015 note indicates that reification "is a minor feature of RDF” and yet 
wanting track updates, make claims and counter-claims about particular 
statements, and so on is not for us a minor use-case.

The 2015 note illustrates using event modeling to provide a natural way of 
capturing some annotations but it does not seem to be uniformly applicable. We 
have many n-ary situations in our current ontology that work well to provide 
essentially blank nodes where annotation statements can be added to further 
describe provenance or other annotations.

However, there are plenty of situations of the form:

subject property literal

which provide no natural place to add an annotation explaining why that 
assertion has been made or indicating that the assertion is considered in error 
and so on.

Further similar cases arise of the form:

subject property object-uri

that are similarly not amenable to providing natural places to add annotation 
statements.

The idea of RDF*/Sparql* seems appealing as a uniform approach to mentioning a 
statement when there is need to decorate the statement with some annotations.

On the other hand, we have entertained the idea that every basic property could 
be modeled as a potentially n-ary case which most of the time would just have a 
single statement (ignoring an implied rdf:type statement). For (a contrived) 
example,

ex:W123 a :Work ;
:hasLCCN [ :value 741297845 ] .

rather than

ex:W123 a :Work ;
:hasLCCN 741297845 .

The former has a blank node that would readily permit adding an annotation:

ex:W123 a :Work ;
:hasLCCN [ :value 741297845 ;
:retrievedFrom http://libraryofcongress.gov ;
:retrievedOn “12/27/1997” ] .

Anyway, the question is really about the status of the RDF* idea and ay support 
latent or pending in Jena.

Thanks,
Chris







Re: Fuseki prefixes

2017-04-02 Thread Chris Tomlinson
Andy,

In at least one configuration qonsole-config.js is loaded.

We’re using fuseki.war from apache-jena-fuseki-2.5.0.tar.gz in Tomcat 8 and we 
update the qoncole-config.js in tomcat/webapps/fuseki/js/app/ with the various 
prefixes used in our application. In this configuration it is loaded in the 
query tab.

Chris


> On Apr 2, 2017, at 5:20 AM, Andy Seaborne  wrote:
> 
> 
> Is qonsole-config.js evenloaded? It comes from qonsole which the UI does not 
> currently use [*].