Metrics for query execution times

2023-08-24 Thread Brandon Sara
From what I've been able to find, it doesn't seem that there are any metrics 
provided that show how long queries are taking to complete. I can get some 
general metrics about request rates and standard Jvm metrics from a fuseki 
server, but it seems that there is nothing else provided.

Am I just missing how to enable such metrics or do they indeed not exist? If 
they don't exist, can they be added? I feel this would be a super helpful 
feature to have.

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: CVE-2023-32200

2023-07-20 Thread Brandon Sara
Awesome! Thanks for the quick response

> On Jul 20, 2023, at 11:13 AM, Andy Seaborne  wrote:
>
> "EXTERNAL EMAIL" – Always use caution when reviewing mail from outside of the 
> organization.
>
>
>
> On 20/07/2023 17:18, Brandon Sara wrote:
>> I just came across CVE-2023-32200 and was wondering, is it different than 
>> CVE-2023-22665 and, if so, how is it different?
>
>
> Jena 4.8.0 addresses CVE-2023-22665 by requiring the Java system property 
> "jena:scripting" to enable scripting.
>
> Jena 4.9.0 addresses CVE-2023-32200 which happens if scripting is enabled 
> (4.8.0). The change goes further than only addressing the security issue by 
> requiring script functions to be in an "allowed" list; that is, there is an 
> API contract for callable scripts. Other functions in the script file are not 
> callable which should help development.
>
> Running Java17 means there is no scripting engine unless the deployment
> has added one. Java11 has a scriting engine in the JDK.
>
>Andy
>

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


CVE-2023-32200

2023-07-20 Thread Brandon Sara


I just came across CVE-2023-32200 and was wondering, is it different than 
CVE-2023-22665 and, if so, how is it different?
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



Does Fuseki Support Manual Creation of Transactions?

2023-06-30 Thread Brandon Sara
Does Fuseki support the creation of transactions that can span multiple 
requests?

If so, would the following sequence of events be true or no?
1. I start a transaction
2. I submit a query to get some data for and update I want to perform
3. An update is submitted by a separate client
4. The update from the separate client is put on hold
5. I submit an update
6. I submit another query to verify changes
7. I commit & end the transaction
8. The update submitted in step 3 is allowed to proceed
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



Re: CVE-2023-22665 Risk using Fuseki Pre 4.8.0

2023-06-03 Thread Brandon Sara
Sorry, by "this code", I mean the arbitrary JavaScript.

I completely agree with your advice, I'm just trying to determine the exact 
risk level of the vulnerability for my project by better understanding what an 
attacker could be allowed to do if left unchecked. This helps us determine our 
priorities for work that needs to be completed and will help me to justify any 
necessary changes to non-technical personnel.

Thanks for your patience with my questions.

On Jun 3, 2023 6:31 AM, Andy Seaborne  wrote:
"EXTERNAL EMAIL" – Always use caution when reviewing mail from outside of the 
organization.



On 02/06/2023 17:26, Brandon Sara wrote:
> And just to be clear, this code would execute on the Fuseki server, correct?

I'm not sure what "this code" refers to.

A way to be safe is to run Fuseki with a Java17 runtime.

What is appropriate in your environment is something you have to decide.
The software is provided "without warranties or conditions of any kind".

Keeping up-to-date with software releases is good practice and that
applies to Java itself.

Unless you are running the WAR file, the choice of Java version to run
Fuseki is controlled in the server script.

Moving to Java17 as a requirement for Jena generally is something on the
project's radar.

 Andy

>
> On Jun 2, 2023, at 3:20 AM, Andy Seaborne  wrote:
>
> "EXTERNAL EMAIL" – Always use caution when reviewing mail from outside of the 
> organization.
>
> The advice from the project is to upgrade or at least run in a Java17+
> environment, otherwise anything may be possible.
>
> Andy
>
> On 01/06/2023 17:57, Brandon Sara wrote:
>> Ok. When you say “arbitrary function”, could one craft and run code that 
>> makes HTTP calls (via XMLHttpRequest or the fetch API, for example)? We 
>> don’t have sensitive data in our store, but I want to make sure that no one 
>> could make queries to other servers via queries to Fuseki.
>>
>> On Jun 1, 2023, at 7:16 AM, Andy Seaborne  wrote:
>>
>> "EXTERNAL EMAIL" – Always use caution when reviewing mail from outside of 
>> the organization.
>>
>>
>> On 01/06/2023 09:42, Rob @ DNR wrote:
>>> Yes, prior to 4.8.0 users can craft a query that calls arbitrary JavaScript 
>>> functions even if you have not explicitly configured custom scripts.
>>>
>>> As discussed on our Security Advisories page [1] the projects advice is 
>>> always to use the latest version available.
>>>
>>> Or as already noted in this thread run using Java 17 as that does not have 
>>> a script engine embedded by default. Java code is generally forward 
>>> compatible safe so even though the project releases builds made to target 
>>> Java 11 it’s fine to run that on a newer JVM.
>>
>> A Jena release is compiled with Java17 at the moment, producing Java11
>> bytecode. This is done to work around Javadoc issues; some improvements
>> haven't been backported to the Java11 codeline.
>>
>> We have Jenkins jobs for Java11, Java17 and Java-latest.
>>
>> There are also github actions in the project codebase.
>>
>> The project policy has always been "2 versions of Java" which we have
>> interpreted nowadays as two LTS. Java21 is Sept this year and, barring a
>> change of plan by OpenJDK, will be LTS.
>>
>> Andy



No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: CVE-2023-22665 Risk using Fuseki Pre 4.8.0

2023-06-02 Thread Brandon Sara
And just to be clear, this code would execute on the Fuseki server, correct?

On Jun 2, 2023, at 3:20 AM, Andy Seaborne  wrote:

"EXTERNAL EMAIL" – Always use caution when reviewing mail from outside of the 
organization.

The advice from the project is to upgrade or at least run in a Java17+
environment, otherwise anything may be possible.

Andy

On 01/06/2023 17:57, Brandon Sara wrote:
> Ok. When you say “arbitrary function”, could one craft and run code that 
> makes HTTP calls (via XMLHttpRequest or the fetch API, for example)? We don’t 
> have sensitive data in our store, but I want to make sure that no one could 
> make queries to other servers via queries to Fuseki.
>
> On Jun 1, 2023, at 7:16 AM, Andy Seaborne  wrote:
>
> "EXTERNAL EMAIL" – Always use caution when reviewing mail from outside of the 
> organization.
>
>
> On 01/06/2023 09:42, Rob @ DNR wrote:
>> Yes, prior to 4.8.0 users can craft a query that calls arbitrary JavaScript 
>> functions even if you have not explicitly configured custom scripts.
>>
>> As discussed on our Security Advisories page [1] the projects advice is 
>> always to use the latest version available.
>>
>> Or as already noted in this thread run using Java 17 as that does not have a 
>> script engine embedded by default. Java code is generally forward compatible 
>> safe so even though the project releases builds made to target Java 11 it’s 
>> fine to run that on a newer JVM.
>
> A Jena release is compiled with Java17 at the moment, producing Java11
> bytecode. This is done to work around Javadoc issues; some improvements
> haven't been backported to the Java11 codeline.
>
> We have Jenkins jobs for Java11, Java17 and Java-latest.
>
> There are also github actions in the project codebase.
>
> The project policy has always been "2 versions of Java" which we have
> interpreted nowadays as two LTS. Java21 is Sept this year and, barring a
> change of plan by OpenJDK, will be LTS.
>
> Andy
>
>>
>> Is there any particular reason you haven’t yet upgraded to 4.8.0?
>>
>> Rob
>>
>> [1]: 
>> https://jena.apache.org/about_jena/security-advisories.html#standard-mitigation-advice<https://jena.apache.org/about_jena/security-advisories.html#standard-mitigation-advice><https://jena.apache.org/about_jena/security-advisories.html#standard-mitigation-advice<https://jena.apache.org/about_jena/security-advisories.html#standard-mitigation-advice>>
>>
>> From: Brandon Sara 
>> Date: Thursday, 1 June 2023 at 02:05
>> To: users@jena.apache.org 
>> Subject: Re: CVE-2023-22665 Risk using Fuseki Pre 4.8.0
>> I’m running with a version built and run with java 11. Given this, is there 
>> still a risk/concern if I don’t have custom scripts configured at all on the 
>> Fuseki server?
>>
>> On May 31, 2023, at 12:06 PM, Andy Seaborne  wrote:
>>
>> "EXTERNAL EMAIL" – Always use caution when reviewing mail from outside of 
>> the organization.
>>
>>
>>
>> On 31/05/2023 17:17, Brandon Sara wrote:
>>>
>>> With CVE-2023-22665, what is the risk of using Fuseki pre-4.8.0 that does 
>>> not have custom scripts configured in any configurations? Is there only a 
>>> risk if custom scripts are set up to be used by Fuseki or is there a risk 
>>> regardless of configuration?
>>>
>>> Thanks.
>>
>> Java17 does not have javascript engine, unless the deployment adds one.
>>
>> So running on a Java17 means that scripts can't execute.
>>
>> The issue is Java11, where there is a script engine in the JVM runtime.
>>
>> Andy
>>
>> https://openjdk.org/jeps/372<https://openjdk.org/jeps/372><https://openjdk.org/jeps/372<https://openjdk.org/jeps/372>><https://openjdk.org/jeps/372<https://openjdk.org/jeps/372><https://openjdk.org/jeps/372<https://openjdk.org/jeps/372>>><https://openjdk.org/jeps/372%3chttps:/openjdk.org/jeps/372%3e<https://openjdk.org/jeps/372%3chttps:/openjdk.org/jeps/372%3e><https://openjdk.org/jeps/372%3chttps:/openjdk.org/jeps/372%3e<https://openjdk.org/jeps/372%3chttps:/openjdk.org/jeps/372%3e>>>
>> Nashorn removed at Java15.
>>
>>
>> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
>> Company, policies prohibit sending protected health information (PHI) by 
>> email, which may violate regulatory requirements. If sending PHI is 
>> necessary, please contact the sender for secure delivery instructions.
>>
>> Confidentiality Notice: This email message, including any attachments, is 
>> for the sole use of the intended recipient(s) and may contain confidential 
>> and privileged information. Any unauthorized review, use, disclosure or 
>> distribution is prohibited. If you are not the intended recipient, please 
>> contact the sender by reply email and destroy all copies of the original 
>> message.
>>
>



Re: CVE-2023-22665 Risk using Fuseki Pre 4.8.0

2023-06-01 Thread Brandon Sara
Ok. When you say “arbitrary function”, could one craft and run code that makes 
HTTP calls (via XMLHttpRequest or the fetch API, for example)? We don’t have 
sensitive data in our store, but I want to make sure that no one could make 
queries to other servers via queries to Fuseki.

On Jun 1, 2023, at 7:16 AM, Andy Seaborne  wrote:

"EXTERNAL EMAIL" – Always use caution when reviewing mail from outside of the 
organization.


On 01/06/2023 09:42, Rob @ DNR wrote:
> Yes, prior to 4.8.0 users can craft a query that calls arbitrary JavaScript 
> functions even if you have not explicitly configured custom scripts.
>
> As discussed on our Security Advisories page [1] the projects advice is 
> always to use the latest version available.
>
> Or as already noted in this thread run using Java 17 as that does not have a 
> script engine embedded by default. Java code is generally forward compatible 
> safe so even though the project releases builds made to target Java 11 it’s 
> fine to run that on a newer JVM.

A Jena release is compiled with Java17 at the moment, producing Java11
bytecode. This is done to work around Javadoc issues; some improvements
haven't been backported to the Java11 codeline.

We have Jenkins jobs for Java11, Java17 and Java-latest.

There are also github actions in the project codebase.

The project policy has always been "2 versions of Java" which we have
interpreted nowadays as two LTS. Java21 is Sept this year and, barring a
change of plan by OpenJDK, will be LTS.

Andy

>
> Is there any particular reason you haven’t yet upgraded to 4.8.0?
>
> Rob
>
> [1]: 
> https://jena.apache.org/about_jena/security-advisories.html#standard-mitigation-advice<https://jena.apache.org/about_jena/security-advisories.html#standard-mitigation-advice>
>
> From: Brandon Sara 
> Date: Thursday, 1 June 2023 at 02:05
> To: users@jena.apache.org 
> Subject: Re: CVE-2023-22665 Risk using Fuseki Pre 4.8.0
> I’m running with a version built and run with java 11. Given this, is there 
> still a risk/concern if I don’t have custom scripts configured at all on the 
> Fuseki server?
>
> On May 31, 2023, at 12:06 PM, Andy Seaborne  wrote:
>
> "EXTERNAL EMAIL" – Always use caution when reviewing mail from outside of the 
> organization.
>
>
>
> On 31/05/2023 17:17, Brandon Sara wrote:
>>
>> With CVE-2023-22665, what is the risk of using Fuseki pre-4.8.0 that does 
>> not have custom scripts configured in any configurations? Is there only a 
>> risk if custom scripts are set up to be used by Fuseki or is there a risk 
>> regardless of configuration?
>>
>> Thanks.
>
> Java17 does not have javascript engine, unless the deployment adds one.
>
> So running on a Java17 means that scripts can't execute.
>
> The issue is Java11, where there is a script engine in the JVM runtime.
>
> Andy
>
> https://openjdk.org/jeps/372<https://openjdk.org/jeps/372><https://openjdk.org/jeps/372<https://openjdk.org/jeps/372>><https://openjdk.org/jeps/372%3chttps:/openjdk.org/jeps/372%3e<https://openjdk.org/jeps/372%3chttps:/openjdk.org/jeps/372%3e>>
> Nashorn removed at Java15.
>
>
> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
> Company, policies prohibit sending protected health information (PHI) by 
> email, which may violate regulatory requirements. If sending PHI is 
> necessary, please contact the sender for secure delivery instructions.
>
> Confidentiality Notice: This email message, including any attachments, is for 
> the sole use of the intended recipient(s) and may contain confidential and 
> privileged information. Any unauthorized review, use, disclosure or 
> distribution is prohibited. If you are not the intended recipient, please 
> contact the sender by reply email and destroy all copies of the original 
> message.
>



Re: CVE-2023-22665 Risk using Fuseki Pre 4.8.0

2023-05-31 Thread Brandon Sara
I’m running with a version built and run with java 11. Given this, is there 
still a risk/concern if I don’t have custom scripts configured at all on the 
Fuseki server?

On May 31, 2023, at 12:06 PM, Andy Seaborne  wrote:

"EXTERNAL EMAIL" – Always use caution when reviewing mail from outside of the 
organization.



On 31/05/2023 17:17, Brandon Sara wrote:
>
> With CVE-2023-22665, what is the risk of using Fuseki pre-4.8.0 that does not 
> have custom scripts configured in any configurations? Is there only a risk if 
> custom scripts are set up to be used by Fuseki or is there a risk regardless 
> of configuration?
>
> Thanks.

Java17 does not have javascript engine, unless the deployment adds one.

So running on a Java17 means that scripts can't execute.

The issue is Java11, where there is a script engine in the JVM runtime.

Andy

https://openjdk.org/jeps/372<https://openjdk.org/jeps/372>
Nashorn removed at Java15.


No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


CVE-2023-22665 Risk using Fuseki Pre 4.8.0

2023-05-31 Thread Brandon Sara


With CVE-2023-22665, what is the risk of using Fuseki pre-4.8.0 that does not 
have custom scripts configured in any configurations? Is there only a risk if 
custom scripts are set up to be used by Fuseki or is there a risk regardless of 
configuration?

Thanks.
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



Re: [ANN] Apache Jena 4.3.1

2021-12-14 Thread Brandon Sara
Should we expect another release (like version 4.3.2) given Log4J updating to 
2.16.0 in response to this other CVE: 
https://cve.mitre.org/cgi-bin/cvename.cgi?name=2021-45046?
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



Re: Information about Apache Jena and Log4j2 vulnerability.

2021-12-10 Thread Brandon Sara
Andy, will you be releasing an RDF-Delta update that uses 4.3.1 soon as well?
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



Fuseki HTTPS options don't seem to be available

2021-12-09 Thread Brandon Sara
I was looking at the docs for Fuseki 
(https://jena.apache.org/documentation/fuseki2/fuseki-data-access-control.html#https)
 and tried running `fuseki-server —https=… —httpsPort=…` and all I get in 
return is "Unknown argument: https”. Are the docs wrong or is there a bug?

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: How to provide full text search over a union of one read-only and one mutable graph

2021-10-04 Thread Brandon Sara
> A text index is "per dataset”.

This is what I figured, but wanted to be sure.

Thanks!
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


How to provide full text search over a union of one read-only and one mutable graph

2021-10-01 Thread Brandon Sara
Here is my scenario:
- I have a large read-only set of data and a very small mutable set of data.
- The read-only set of data can be pre-populated into a TDB2 while the mutable 
set of data will start out empty but will, obviously have data injected 
overtime during runtime.
- I want one Fuseki service which combines both of these sets of data into a 
single dataset so that inference can be performed for the mutable data 
dependent upon the read-only data. (I.E. union them as named graphs into a 
single dataset).
- I know that I can accomplish the union of the two sets of data as separate 
named graphs in the final dataset, but I also need full text search on both the 
read-only and the mutable portions of the final dataset.

How do I accomplish the need for indexing both read-only and mutable data that 
get combined into a single dataset? Do I create two indexes? Am I able to share 
indexes? I know that I could create an initial index from the read-only data 
which could then be updated when changes are made to the mutable data. However, 
once the read-only data needs updates (which it will from time to time), I need 
to regenerate the entire index in order to get any changes that weren’t added 
via an update through Fuseki (which is how the read-only data would be updated, 
likely a new tdb2 would be generated).

Thanks for the help!
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: Subclass caching has some problems on Fuseki startup

2021-09-29 Thread Brandon Sara
> SNOMED has a conversion to OWL - isn't that OWL functional syntax? Or do you 
> have another tool that converts RF2 to RDF?

I used the SNOMED tool to convert to OWL functional syntax, then used robot to 
convert that to turtle

> what OWL features are you going to use?  SNOMED uses more than subclass.

equivalentClass is definitely one we want to use. sameAs is also a possibility 
(though performance may rule that one out). Many of the property stuff like 
inverseOf and what not. At this point in time, other than what is in SNOMED 
(which is honestly pretty complex/impressive), we aren’t explicitly using much 
of OWL…mainly because we haven’t been able to because things just won’t load 
when we have full text indexing in place with even just the micro profile 
active. I would hope that we could at least use EL (or micro) and the features 
provided there.
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Request: Load TDBs at Fuseki startup

2021-09-24 Thread Brandon Sara
Currently, it seems that all cached inference (at least with the transitive 
reasoner) is not loaded into cache until the first query that would query data 
from a dataset is submitted to the Fuseki server. For very large ontologies, 
this loading process can take quite a while. This basically means that the 
server isn't truly ready to accept requests until after the loading of the 
cache is complete because all requests to the dataset being loaded hang until 
the load is complete. I'm requesting that this process happen at startup and 
that a "ready" endpoint is added which returns "true" if the cache loading had 
completed. This would allow a load balancer to check for readiness before 
directing traffic to the server.

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: Subclass caching has some problems on Fuseki startup

2021-09-22 Thread Brandon Sara
> Which reasoner? IIRC SnomedCT uses various OWL features
> The default RDFS reasoner does not include the "rdf4" rule which is a 
> whole-dataset rule
> A ruleset tuned to needs may work better.

I tried this to only include subclass and equivalent class using the generic 
reasoner, but the dataset still would not load. Again, even only using the 
transitive reasoner (which I’ve found tends to be the most performant but 
haven’t run actual numbers yet) over snomed wouldn’t load the dataset.

> If you want to navigate the ontology AND apply it to data, then you may need 
> two copies, one with and one with inference. If subclass closure has been 
> applied, you can’t see easily what the immediate parent of a concept is 
> (ontology browsing task)

Yes, we plan on create a non-inferred fuseki service so that we can navigate 
(if we aren’t using the transitive reasoner, since it provides a direct 
subclass relationship) and an inferred one for all other queries.

> do you need an inference engine at runtime at all?

We will when we start pulling in realtime data that we want the SNOMED 
inference rules to help us discover new knowledge with.


No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: Subclass caching has some problems on Fuseki startup

2021-09-21 Thread Brandon Sara
We need the inference so that we can know equivalence between classes and 
subclass relationships (eg "type 2 diabetes" is still "diabetes" because it's 
is a subclass of diabetes).

Another dataset that I've never been able to get to load with any inference 
enabled is SNOMED CT. Even when removing all of the owl inference that they 
have in their dataset and pre-calculating the direct subclass relationships, 
not even the transitive reasoner will load the dataset (without modification at 
runtime) once the first query is submitted after startup of Fuseki. Granted 
it's it significantly larger than ICD-10 CM. But still, not being able to load 
it with even pre-calculated direct subclass relationships is a huge deal 
breaker. Not to mention the fact that the real power of that dataset comes when 
the owl inference built into it can actually be used. With it, inference on 
patient data can reveal potential diagnoses that would not be inferred without 
and owl reasoning.

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: Subclass caching has some problems on Fuseki startup

2021-09-13 Thread Brandon Sara
I have been able to create an easily reproducible scenario that others can use 
to replicate and test the issues that I’m seeing:

1. Start fuseki using the config that I’ve listed below.
2. Attempt to load the latest version of ICD-10 CM as provided freely by 
BioPortal: https://bioportal.bioontology.org/ontologies/ICD10CM

If inference is enabled, then I can’t even get the turtle file to load in its 
entirety. If I load the turtle file without inference, then the load completes, 
but upon restarting the server and submitting a request, the service doesn’t 
finish processing the request in any reasonable amount of time, no matter how 
simple the query of the request is (one that actually queries data from the 
dataset at least).

Config:

PREFIX dcterms: 
PREFIX fuseki: 
PREFIX ja: 
PREFIX rdf: 
PREFIX rdfs: 
PREFIX skos: 
PREFIX tdb2: 
PREFIX text: 

[] rdf:type fuseki:Server ;
  fuseki:pingEP true ;
  fuseki:statsEP true ;
  fuseki:metricsEP true ;
  fuseki:compactEP true ;

  ja:context [
ja:cxtName "arq:queryTimeout" ;
ja:cxtValue "1,6" ;
  ] ;
.

<#kgService> a fuseki:Service ;
  fuseki:name "kg" ;
  fuseki:dataset <#kgIndexedDataset> ;
  fuseki:endpoint [ fuseki:operation fuseki:query; ] ;
  fuseki:endpoint [ fuseki:operation fuseki:update; ] ;
  fuseki:endpoint [ fuseki:operation fuseki:gsp_r; ] ;
  fuseki:endpoint [ fuseki:operation fuseki:gsp_rw; fuseki:name "data"; ] ;
.

<#kgIndexedDataset> rdf:type text:TextDataset ;
  text:dataset <#kgInferredDataset> ;
  text:index <#kgIndex> ;
.

<#kgIndex> a text:TextIndexLucene ;
  text:directory  ;
  text:entityMap <#kgEntityMap> ;
  text:storeValues true ;
  text:queryParser [ a text:ComplexPhraseQueryParser ]
.

<#kgEntityMap> a text:EntityMap ;
  text:defaultField "label" ;
  text:entityField "uri" ;
  text:uidField "uid" ;
  text:langField "lang" ;
  text:graphField "graph" ;
  text:map (
[ text:field "id" ;
  text:predicate dcterms:identifier ]

[ text:field "label" ;
  text:predicate rdfs:label ]
  ) ;
.

<#kgInferredDataset> a ja:RDFDataset ;
  ja:defaultGraph <#kgInferenceModel> ;
.

<#kgInferenceModel> a ja:InfModel ;
  ja:baseModel <#kgTdbGraph> ;
  ja:reasoner [
ja:reasonerURL 
  ] ;
.

<#kgTdbGraph> a tdb2:GraphTDB2 ;
  tdb2:dataset <#kgTdbDataset> ;
.

<#kgTdbDataset> a tdb2:DatasetTDB2 ;
  tdb2:location "/fuseki/databases/kg" ;
.



No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Subclass caching has some problems on Fuseki startup

2021-08-27 Thread Brandon Sara
I’ve finally tracked down the problem (at least at a high level). When using 
the Transitive Reasoner, there is a block of code which caches all sub class 
triples 
(https://github.com/apache/jena/blob/main/jena-core/src/main/java/org/apache/jena/reasoner/transitiveReasoner/TransitiveEngine.java#L316-L326).
 Part of this code searches for all sub properties of `subClassOf` and begins 
caching triples for those sub-properties. In my situation, I’ve added 
`owl:equivalentClass` manually (since only TransitiveReasoner` is being used) 
and manually made it a sub property of `subClassOf`. The data that I’m 
uploading right now has a lot of equivalent class triples (~>300k). It seems, 
if I’m understanding the code correctly as I’ve been debugging it, that not 
only is the triple cached…but a traversal of many other triples occurs when the 
caching occurs for even a single triple, is that correct? This would explain 
why (1) it never seems to finish what it is doing and (2) the memory grows 
very, very large while doing it. I ran a single query last night and after more 
than 6 hours, 8 CPUs, and 20GB of RAM, it still never finished loading the 
cache. It seems as though that the runtime of this could be exponential in 
nature. My dataset is well over 20 million records (maybe even more, I still 
haven’t gotten a full count yet, but I know for a fact that it is well over 10 
million and believe it to be well more than 20 million). Like I’ve mentioned 
before, there are basically no individuals in the dataset, it’s all ontology 
because it is health care industry coding systems and classifications.

Another strange thing, which I’ve mentioned before, is that I don’t have any of 
these issues when I initially load the data, I can load everything with just 4 
GB of RAM, it loads in a reasonable amount of time, and I can submit queries of 
pretty much any complexity after the upload is complete with no issues, and 
they are very fast too. This only occurs when the server has been restarted and 
the first query that actually pulls something from the dataset (I.E. not an 
empty query) is submitted (no matter how simple or complex that query may be).

Is this a bug or should `owl:equivalent` class work without my own manual 
specification of it?

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: IO IdleTimeout issue with Fuseki

2021-08-26 Thread Brandon Sara
> If it is after each restart, maybe the local state has been messed up by the 
> earlier problems.

I wondered this as well. However, I’ve just barely done another upload of the 
data to a fresh tdb2 db and immediately after the upload, I can query all that 
I want and things are super fast. But once the server restarts, the memory 
grows until it reaches its limit trying to load the dataset into memory (6GB of 
RAM) and no queries are ever able to complete because the dataset never loads.


> Does the patch server log have a fetch entry?

There are actually no fetches needed to the delta server for this to occur. All 
I need to do is submit a query, even one that is super simple and very 
specific. The delta server is only hit to determine if the instance is up to 
date, it finds that it is and no patch fetch ever occurs.


> If some CPU at 100%?

While trying to load the dataset, I’m seeing my docker container reporting CPU 
usage (with 4 cores allocated) fluctuate between 200%~400%.



> On Aug 24, 2021, at 12:07 PM, Andy Seaborne  wrote:
>
> Then sorry I don't know what's happening.
>
> If it is after each restart, maybe the local stste has been messed up by
> the earlier problems.
>
> Does the patch server log have a fetch entry?
>
> If some CPU at 100%?
>
> Andy
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: IO IdleTimeout issue with Fuseki

2021-08-24 Thread Brandon Sara
OK. So it seems this may not be the cause of the issue that I’m seeing. 
However, I’m still seeing the loading of the dataset never finish. We have left 
a server running for over 48 hours and it never finishes loading…or, at least 
we believe that is what is happening. We do know that we can’t submit any 
successful queries against it at all. We have given the server 12GB of RAM and 
still no luck. When I initially load the DB with values, I only allocated 6GB 
of RAM and after completing, I was able to submit any query that I wanted and 
they all were very fast to return results as well. It’s only on the restart of 
the server that I seem to see this issue. And only after I’ve added this extra 
large amount of data to the dataset.

On Aug 24, 2021, at 3:03 AM, Andy Seaborne 
mailto:a...@apache.org>> wrote:

"EXTERNAL EMAIL" - This email originated from outside of the organization. Do 
not click or open attachments unless you recognize the sender and know the 
content is safe. If you are unsure, please contact CTS at 
hel...@pointclickcare.com<mailto:hel...@pointclickcare.com>.

If the IP address+port if the client application, it's a network warning
about the client. This could possibly be indicating the client has gone
way going away without reading the whole of the results.

https://github.com/eclipse/jetty.project/issues/4728<https://github.com/eclipse/jetty.project/issues/4728>

suggests it may be related to a mismatch on idle timeout of an
intermediate server.

Andy

On 24/08/2021 00:28, Brandon Sara wrote:
> I’m still seeing this issue, even after the latest update of RDF-Delta. 
> Anyone have any ideas?
> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
> Company, policies prohibit sending protected health information (PHI) by 
> email, which may violate regulatory requirements. If sending PHI is 
> necessary, please contact the sender for secure delivery instructions.
>
> Confidentiality Notice: This email message, including any attachments, is for 
> the sole use of the intended recipient(s) and may contain confidential and 
> privileged information. Any unauthorized review, use, disclosure or 
> distribution is prohibited. If you are not the intended recipient, please 
> contact the sender by reply email and destroy all copies of the original 
> message.
>


No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: IO IdleTimeout issue with Fuseki

2021-08-23 Thread Brandon Sara
I’m still seeing this issue, even after the latest update of RDF-Delta. Anyone 
have any ideas?
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: IO IdleTimeout issue with Fuseki

2021-08-18 Thread Brandon Sara
> What's at 172.18.0.1:60440? Judging by the port number, is it the application 
> client?

Yeah, I believe it was the host IP. I run the server in a docker container and 
was hitting it with a simple REST client.



On Aug 18, 2021, at 6:43 AM, Andy Seaborne 
mailto:a...@apache.org>> wrote:

Hi there,

This is possibly an effect of the other Delta problems.

The last necessary fix (unrelated to your issues) was put in the HEAD of
rdf-delta codebase yesterday so HEAD is about ready. If you could build
and run with that code, it would be great.

> l=/172.18.0.2:3043>,
> r=/172.18.0.1:60440>,

What's at 172.18.0.1:60440? Judging by the port number, is it the
application client?

Andy


No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: IO IdleTimeout issue with Fuseki

2021-08-17 Thread Brandon Sara
Also, on subsequent requests, I get no logs like it is trying to reload the db. 
It shows that the request was received, but it just waits. I’m assuming that 
perhaps a lock is put on the dataset in memory and it is never let go from the 
first request.

On Aug 17, 2021, at 4:38 PM, Brandon Sara 
mailto:brandon.s...@collectivemedicaltech.com.INVALID>>
 wrote:

"EXTERNAL EMAIL" - This email originated from outside of the organization. Do 
not click or open attachments unless you recognize the sender and know the 
content is safe. If you are unsure, please contact CTS at 
hel...@pointclickcare.com<mailto:hel...@pointclickcare.com>.

I’m having an issue with Fuseki where, once the first request is submitted, the 
server never returns a response and never returns any responses for subsequent 
requests either. The server also starts increasing its memory usage quite 
significantly until it finally runs out of memory, a GC occurs, and the whole 
process starts over with the next request. In looking at the logs, I found that 
the log output at the bottom of this message repeated every 30 seconds or so 
(the default jetty timeout time period). Even while completely “idle”, this 
same output would spit out like clockwork, indefinitely. The only thing that I 
can do to stop it is to stop the server. This has made it so that I cannot 
query anything from my dataset. The only change that has occurred is the size 
of the dataset…which makes sense as it now takes longer to load into memory, 
thus suddenly going over the timeout limit.

Is this a bug? I would expect at least the request to fail with a timeout 
status if something like this were to occur.

I’m using RDF-Delta’s embedded Fuseki, 6GB of RAM, TDB2, running in a Docker 
container.

LOGS (formatted for readability):

[Connector-Scheduler-7ac48e10-1-23] org.eclipse.jetty.io.IdleTimeout :
SocketChannelEndPoint@114206bf{
l=/172.18.0.2:3043<http:///172.18.0.2:3043>,
r=/172.18.0.1:60440<http:///172.18.0.1:60440>,
OPEN,
fill=-,
flush=-,
to=30001/3
}
{ io=0/0, kio=0, kro=1 }->
SslConnection@3e970f93{
NOT_HANDSHAKING,
eio=-1/-1,
di=-1,
fill=IDLE,
flush=IDLE
}~>
DecryptedEndPoint@5fce624a{
l=/172.18.0.2:3043<http:///172.18.0.2:3043>,
r=/172.18.0.1:60440<http:///172.18.0.1:60440>,
OPEN,
fill=-,
flush=-,
to=330537/3
}=>
HttpConnection@6f2ce996[
p=HttpParser{ s=END, 271 of 271 },
g=HttpGenerator@12e7e639{ s=START }
]=>
HttpChannelOverHttp@1bba6dcf{
s=HttpChannelState@5eb8f0dc{
s=HANDLING
rs=BLOCKING
os=OPEN
is=READY
awp=false
se=false
i=true
al=0
},
r=1,
c=false/false,
a=HANDLING,
uri=https://localhost:3043/rdf,
age=330183
}
idle timeout check, elapsed: 3 ms, remaining: 0 ms

[Connector-Scheduler-7ac48e10-1-23] org.eclipse.jetty.io.IdleTimeout :
SocketChannelEndPoint@114206bf{
l=/172.18.0.2:3043<http:///172.18.0.2:3043>,
r=/172.18.0.1:60440<http:///172.18.0.1:60440>,
OPEN,
fill=-,
flush=-,
to=30002/3
}
{ io=0/0, kio=0, kro=1 }->
SslConnection@3e970f93{
NOT_HANDSHAKING,
eio=-1/-1,
di=-1,
fill=IDLE,
flush=IDLE
}~>
DecryptedEndPoint@5fce624a{
l=/172.18.0.2:3043<http:///172.18.0.2:3043>,
r=/172.18.0.1:60440<http:///172.18.0.1:60440>,
OPEN,
fill=-,
flush=-,
to=330538/3
}=>
HttpConnection@6f2ce996[
p=HttpParser{ s=END, 271 of 271 },
g=HttpGenerator@12e7e639{ s=START }
]=>
HttpChannelOverHttp@1bba6dcf{
s=HttpChannelState@5eb8f0dc{
s=HANDLING
rs=BLOCKING
os=OPEN
is=READY
awp=false
se=false
i=true
al=0
},
r=1,
c=false/false,
a=HANDLING,
uri=https://localhost:3043/rdf,
age=330184
}
idle timeout expired

[Connector-Scheduler-7ac48e10-1-23] org.eclipse.jetty.io.FillInterest : onFail 
FillInterest@3cb42e3f{null}

java.util.concurrent.TimeoutException: Idle timeout expired: 3/3 ms
at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171) 
~[delta-server.jar:?]
at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113) 
~[delta-server.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 [?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]

[Connector-Scheduler-7ac48e10-1-23] org.eclipse.jetty.io.WriteFlusher : 
ignored: WriteFlusher@7186c8ca{IDLE}->null

java.util.concurrent.TimeoutException: Idle timeout expired: 3/3 ms
at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171) 
~[delta-server.jar:?]
at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113) 
~[delta-server.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
at java.util.concurrent.FutureTask.run(Future

IO IdleTimeout issue with Fuseki

2021-08-17 Thread Brandon Sara
I’m having an issue with Fuseki where, once the first request is submitted, the 
server never returns a response and never returns any responses for subsequent 
requests either. The server also starts increasing its memory usage quite 
significantly until it finally runs out of memory, a GC occurs, and the whole 
process starts over with the next request. In looking at the logs, I found that 
the log output at the bottom of this message repeated every 30 seconds or so 
(the default jetty timeout time period). Even while completely “idle”, this 
same output would spit out like clockwork, indefinitely. The only thing that I 
can do to stop it is to stop the server. This has made it so that I cannot 
query anything from my dataset. The only change that has occurred is the size 
of the dataset…which makes sense as it now takes longer to load into memory, 
thus suddenly going over the timeout limit.

Is this a bug? I would expect at least the request to fail with a timeout 
status if something like this were to occur.

I’m using RDF-Delta’s embedded Fuseki, 6GB of RAM, TDB2, running in a Docker 
container.

LOGS (formatted for readability):

[Connector-Scheduler-7ac48e10-1-23] org.eclipse.jetty.io.IdleTimeout :
  SocketChannelEndPoint@114206bf{
l=/172.18.0.2:3043,
r=/172.18.0.1:60440,
OPEN,
fill=-,
flush=-,
to=30001/3
  }
  { io=0/0, kio=0, kro=1 }->
  SslConnection@3e970f93{
NOT_HANDSHAKING,
eio=-1/-1,
di=-1,
fill=IDLE,
flush=IDLE
  }~>
  DecryptedEndPoint@5fce624a{
l=/172.18.0.2:3043,
r=/172.18.0.1:60440,
OPEN,
fill=-,
flush=-,
to=330537/3
  }=>
  HttpConnection@6f2ce996[
p=HttpParser{ s=END, 271 of 271 },
g=HttpGenerator@12e7e639{ s=START }
  ]=>
  HttpChannelOverHttp@1bba6dcf{
s=HttpChannelState@5eb8f0dc{
  s=HANDLING
  rs=BLOCKING
  os=OPEN
  is=READY
  awp=false
  se=false
  i=true
  al=0
},
r=1,
c=false/false,
a=HANDLING,
uri=https://localhost:3043/rdf,
age=330183
  }
  idle timeout check, elapsed: 3 ms, remaining: 0 ms

[Connector-Scheduler-7ac48e10-1-23] org.eclipse.jetty.io.IdleTimeout :
  SocketChannelEndPoint@114206bf{
l=/172.18.0.2:3043,
r=/172.18.0.1:60440,
OPEN,
fill=-,
flush=-,
to=30002/3
  }
  { io=0/0, kio=0, kro=1 }->
  SslConnection@3e970f93{
NOT_HANDSHAKING,
eio=-1/-1,
di=-1,
fill=IDLE,
flush=IDLE
  }~>
  DecryptedEndPoint@5fce624a{
l=/172.18.0.2:3043,
r=/172.18.0.1:60440,
OPEN,
fill=-,
flush=-,
to=330538/3
  }=>
  HttpConnection@6f2ce996[
p=HttpParser{ s=END, 271 of 271 },
g=HttpGenerator@12e7e639{ s=START }
  ]=>
  HttpChannelOverHttp@1bba6dcf{
s=HttpChannelState@5eb8f0dc{
  s=HANDLING
  rs=BLOCKING
  os=OPEN
  is=READY
  awp=false
  se=false
  i=true
  al=0
},
r=1,
c=false/false,
a=HANDLING,
uri=https://localhost:3043/rdf,
age=330184
  }
  idle timeout expired

[Connector-Scheduler-7ac48e10-1-23] org.eclipse.jetty.io.FillInterest : onFail 
FillInterest@3cb42e3f{null}

java.util.concurrent.TimeoutException: Idle timeout expired: 3/3 ms
at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171) 
~[delta-server.jar:?]
at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113) 
~[delta-server.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 
[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 [?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]

[Connector-Scheduler-7ac48e10-1-23] org.eclipse.jetty.io.WriteFlusher : 
ignored: WriteFlusher@7186c8ca{IDLE}->null

java.util.concurrent.TimeoutException: Idle timeout expired: 3/3 ms
at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171) 
~[delta-server.jar:?]
at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113) 
~[delta-server.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) 
[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
 [?:?]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
[?:?]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
[?:?]
at java.lang.Thread.run(Thread.java:834) [?:?]

[Connector-Scheduler-7ac48e10-1-23] org.eclipse.jetty.io.AbstractEndPoint :
  Ignored idle endpoint SocketChannelEndPoint@114206bf{
l=/172.18.0.2:3043,
  

Re: Need recommendation for memory settings using Fuseki/Delta server

2021-08-12 Thread Brandon Sara
As far as I am aware, we are using TDB2. Is there a way for me to verify this?

I’ve also discovered that this happens after a sync has failed because the 
patch returned a 404. Given that which patch comes next is automated, it seems 
quite strange that this would ever happen. It is what was causing my fuseki 
servers to loose the current delta patch version (which I reported via GitHub). 
For some reason, once this scenario occurs, no matter how much RAM I give it, 
the fuseki server can never update to the latest patch. I can’t get any metrics 
on it because of the fact that metrics don’t start reporting until the fuseki 
server is up (which is why I requested that the initial sync happen in the 
background and allow the fuseki server to start right away…it’s all coming 
together isn’t it ;) ). The fuseki server requests the latest patch…which I 
assume it obtains (I’m able to query the delta server directly and get it just 
fine…and it returns quite quickly), then it gets stuck and the server startup 
times out after 1 hour. A couple of days ago, when this happened, I had debug 
logging turned on and found that it consistently always stopped at the same 
point for the patch it was trying to load in `BlockAccessMapped` (which seemed 
to be where it was reading the input stream of the patch file) and this was 
always the last log before it froze:

TRACE [main-1] org.apache.jena.dboe.base.file.BlockAccessMapped : 262750 => 
[256, 4964352]


> On Aug 12, 2021, at 2:13 PM, Andy Seaborne  wrote:
>
> "EXTERNAL EMAIL" - This email originated from outside of the organization. Do 
> not click or open attachments unless you recognize the sender and know the 
> content is safe. If you are unsure, please contact CTS at 
> hel...@pointclickcare.com.
>
> And you use TDB1?
>
> TDB1 can use more memory - and between all the other components it might all 
> amount to 6G if the server isn't able to do some of the background tidy-up 
> work for a while.
>
> TDB2 does not have this effect.
>
>Andy
>
> On 12/08/2021 17:19, Brandon Sara wrote:
>> I believe that I’ve found the problem. It could be two different problems 
>> actually. One was that, during some experimentation locally, I ended up 
>> running a VERY LONG running update script…which never actually finished. 
>> This seems like it could have been the cause for things not running smoothly 
>> locally. As for another environment, I’ve found that if I have updates that 
>> are too large, the sync from the delta server runs out of memory. Some of 
>> these patch files were about 30 MB. And like you mentioned, this causes very 
>> long running updates…which cause the memory to run out…but strangely doesn’t 
>> crash the server or throw any errors. It consistently stopped at the exact 
>> same point (according to the debug logs) in its update every single time I 
>> restarted the server. To remedy this, I’ve split up the manual updates that 
>> I’m applying into smaller batches, things seem to be running smoothly again. 
>> But this does bring up a concern as to what workarounds I would need if I 
>> ever needed to do a large scale dynamic insert/delete via an update script.
>>> On Aug 12, 2021, at 7:37 AM, Andy Seaborne  wrote:
>>>
>>> "EXTERNAL EMAIL" - This email originated from outside of the organization. 
>>> Do not click or open attachments unless you recognize the sender and know 
>>> the content is safe. If you are unsure, please contact CTS at 
>>> hel...@pointclickcare.com.
>>>
>>>
>>>
>>> On 11/08/2021 21:21, Brandon Sara wrote:
>>>>> 10s of millions triples of RDFS schema and no instance data?
>>>> Yeah, it’s kinda weird. I inherited this project and am working on fixing 
>>>> much of the structuring, but in the mean time, need to keep it going as 
>>>> is. We are loading ICD-10 CM, SNOMED CT, and many other medical 
>>>> ontologies/thesauri…hence the large ontology. Pretty much every concept is 
>>>> treated as a class. At this point in time, we are using ontology itself 
>>>> for some inference and mapping. Eventually, we will be bringing instance 
>>>> data into the KG to do more powerful inference using the medical 
>>>> ontologies I mentioned.
>>>
>>> Try running without it as a test.
>>>
>>> The transitive reasoner fires up either as the when the server starts or 
>>> first request (can't remember which).
>>>
>>>>> custom:id has super properties?
>>>> No
>>>
>>> From what you've said, that takes not much memory - at very worse, it 
>>> populates the node cache which is an LRU ca

Re: Need recommendation for memory settings using Fuseki/Delta server

2021-08-12 Thread Brandon Sara
I forgot to mention that after splitting up into smaller batches of updates, 
memory never exceeded ~3.9GB during or after the updates. So that was another 
very good sign.

> On Aug 12, 2021, at 7:37 AM, Andy Seaborne  wrote:
>
> "EXTERNAL EMAIL" - This email originated from outside of the organization. Do 
> not click or open attachments unless you recognize the sender and know the 
> content is safe. If you are unsure, please contact CTS at 
> hel...@pointclickcare.com.
>
>
>
> On 11/08/2021 21:21, Brandon Sara wrote:
>>> 10s of millions triples of RDFS schema and no instance data?
>> Yeah, it’s kinda weird. I inherited this project and am working on fixing 
>> much of the structuring, but in the mean time, need to keep it going as is. 
>> We are loading ICD-10 CM, SNOMED CT, and many other medical 
>> ontologies/thesauri…hence the large ontology. Pretty much every concept is 
>> treated as a class. At this point in time, we are using ontology itself for 
>> some inference and mapping. Eventually, we will be bringing instance data 
>> into the KG to do more powerful inference using the medical ontologies I 
>> mentioned.
>
> Try running without it as a test.
>
> The transitive reasoner fires up either as the when the server starts or 
> first request (can't remember which).
>
>>> custom:id has super properties?
>> No
>
> From what you've said, that takes not much memory - at very worse, it 
> populates the node cache which is an LRU cache and usually 2G is enough. 
> (unless you have a lot of very large literals - many lines of text).
>
>>> is the request causing the database to be sync'ed before the request starts?
>> Yes
>
> That's a source of RAM use if there are large pending updates.
>
> Also try the query
>
> SELECT * {} or ASK{}
>
> which does all the end-to-end stuff for setup and sync but does not touch the 
> data.
>
> The other thing to try is point VisualVM at the process and look for the 
> memory usage and heap usage.
>
>Andy
>

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: Need recommendation for memory settings using Fuseki/Delta server

2021-08-12 Thread Brandon Sara
I believe that I’ve found the problem. It could be two different problems 
actually. One was that, during some experimentation locally, I ended up 
running a VERY LONG running update script…which never actually finished. This 
seems like it could have been the cause for things not running smoothly 
locally. As for another environment, I’ve found that if I have updates that are 
too large, the sync from the delta server runs out of memory. Some of these 
patch files were about 30 MB. And like you mentioned, this causes very long 
running updates…which cause the memory to run out…but strangely doesn’t crash 
the server or throw any errors. It consistently stopped at the exact same point 
(according to the debug logs) in its update every single time I restarted the 
server. To remedy this, I’ve split up the manual updates that I’m applying into 
smaller batches, things seem to be running smoothly again. But this does bring 
up a concern as to what workarounds I would need if I ever needed to do a large 
scale dynamic insert/delete via an update script.

> On Aug 12, 2021, at 7:37 AM, Andy Seaborne  wrote:
>
> "EXTERNAL EMAIL" - This email originated from outside of the organization. Do 
> not click or open attachments unless you recognize the sender and know the 
> content is safe. If you are unsure, please contact CTS at 
> hel...@pointclickcare.com.
>
>
>
> On 11/08/2021 21:21, Brandon Sara wrote:
>>> 10s of millions triples of RDFS schema and no instance data?
>> Yeah, it’s kinda weird. I inherited this project and am working on fixing 
>> much of the structuring, but in the mean time, need to keep it going as is. 
>> We are loading ICD-10 CM, SNOMED CT, and many other medical 
>> ontologies/thesauri…hence the large ontology. Pretty much every concept is 
>> treated as a class. At this point in time, we are using ontology itself for 
>> some inference and mapping. Eventually, we will be bringing instance data 
>> into the KG to do more powerful inference using the medical ontologies I 
>> mentioned.
>
> Try running without it as a test.
>
> The transitive reasoner fires up either as the when the server starts or 
> first request (can't remember which).
>
>>> custom:id has super properties?
>> No
>
> From what you've said, that takes not much memory - at very worse, it 
> populates the node cache which is an LRU cache and usually 2G is enough. 
> (unless you have a lot of very large literals - many lines of text).
>
>>> is the request causing the database to be sync'ed before the request starts?
>> Yes
>
> That's a source of RAM use if there are large pending updates.
>
> Also try the query
>
> SELECT * {} or ASK{}
>
> which does all the end-to-end stuff for setup and sync but does not touch the 
> data.
>
> The other thing to try is point VisualVM at the process and look for the 
> memory usage and heap usage.
>
>Andy
>

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: Need recommendation for memory settings using Fuseki/Delta server

2021-08-11 Thread Brandon Sara
> 10s of millions triples of RDFS schema and no instance data?
Yeah, it’s kinda weird. I inherited this project and am working on fixing much 
of the structuring, but in the mean time, need to keep it going as is. We are 
loading ICD-10 CM, SNOMED CT, and many other medical ontologies/thesauri…hence 
the large ontology. Pretty much every concept is treated as a class. At this 
point in time, we are using ontology itself for some inference and mapping. 
Eventually, we will be bringing instance data into the KG to do more powerful 
inference using the medical ontologies I mentioned.

> custom:id has super properties?
No

> is the request causing the database to be sync'ed before the request starts?
Yes

On Aug 11, 2021, at 12:44 PM, Andy Seaborne 
mailto:a...@apache.org>> wrote:

"EXTERNAL EMAIL" - This email originated from outside of the organization. Do 
not click or open attachments unless you recognize the sender and know the 
content is safe. If you are unsure, please contact CTS at 
hel...@pointclickcare.com<mailto:hel...@pointclickcare.com>.



On 11/08/2021 19:07, Brandon Sara wrote:
>> What properties are transitive?
> Right now, it is just an ontology…so, everything is properties and classes. 
> So, subClassOf and subPropertyOf exist on nearly every subject node.

10s of millions triples of RDFS schema and no instance data?

>
>> Example query?
>
> PREFIX : 
> PREFIX custom: <http://example.com/<http://example.com>>
>
> SELECT *
> WHERE {
> :42 custom:id ?id
> }
>
>
> (`:42` has only one `custom:id` triple and has no `owl:sameAs` inference…only 
> “TransitiveReasoner” is being used)

custom:id has super properties?


And is the request causing the database to be sync'ed before the request
starts?

>
> On Aug 11, 2021, at 8:29 AM, Andy Seaborne 
> mailto:a...@apache.org><mailto:a...@apache.org>> wrote:
>
> "EXTERNAL EMAIL" - This email originated from outside of the organization. Do 
> not click or open attachments unless you recognize the sender and know the 
> content is safe. If you are unsure, please contact CTS at 
> hel...@pointclickcare.com<mailto:hel...@pointclickcare.com><mailto:hel...@pointclickcare.com>.
>
>
>
> On 11/08/2021 01:17, Brandon Sara wrote:
>> Can I get some recommendations on how to best tweak/setup memory for my 
>> fuseki servers? Here is my setup:
>>
>> - I’ve got a single TDB with at least several million triples (I don’t know 
>> the exact amount yet, but perhaps around 10s of millions, maybe 100s of 
>> millions…a the very least, I need it to scale to 100s of millions).
>> - Everything is put in the default graph currently (wanting to change 
>> this…but can’t the this point in time).
>> - The “TransitiveReasoner” is being used on the dataset.
>
> What properties are transitive?
>
> (and maybe 
> https://jena.apache.org/documentation/rdfs/?<https://jena.apache.org/documentation/rdfs/?><https://jena.apache.org/documentation/rdfs/?<https://jena.apache.org/documentation/rdfs/?>>)
>
>> - Full text indexing over two different fields is enabled using Lucene.
>> - The servers are running embedded Fuseki via rdf-delta and sync via a 
>> central rdf-delta server.
>> - The simplest of queries won’t finish and runs out of memory with, at the 
>> very least, 6 GB of RAM.
>
> Example query?
>
> And is the request causing the database to be sync'ed before the request
> starts?
>
>>
>> Also, should I be tweaking my non-heap memory to be larger for the Fuseki 
>> server?
>
> Unlikely.
>
> Andy
>
>>
>> Thanks.
>>
>> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
>> Company, policies prohibit sending protected health information (PHI) by 
>> email, which may violate regulatory requirements. If sending PHI is 
>> necessary, please contact the sender for secure delivery instructions.
>>
>> Confidentiality Notice: This email message, including any attachments, is 
>> for the sole use of the intended recipient(s) and may contain confidential 
>> and privileged information. Any unauthorized review, use, disclosure or 
>> distribution is prohibited. If you are not the intended recipient, please 
>> contact the sender by reply email and destroy all copies of the original 
>> message.
>>
>
>
> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
> Company, policies prohibit sending protected health information (PHI) by 
> email, which may violate regulatory requirements. If sending PHI is 
> necessary, please contact the sender for secure delivery instructions.
>
> Confidentiality Notice: This ema

Re: Need recommendation for memory settings using Fuseki/Delta server

2021-08-11 Thread Brandon Sara
> What properties are transitive?
Right now, it is just an ontology…so, everything is properties and classes. So, 
subClassOf and subPropertyOf exist on nearly every subject node.

> Example query?

PREFIX : 
PREFIX custom: <http://example.com/>

SELECT *
WHERE {
  :42 custom:id ?id
}


(`:42` has only one `custom:id` triple and has no `owl:sameAs` inference…only 
“TransitiveReasoner” is being used)

On Aug 11, 2021, at 8:29 AM, Andy Seaborne 
mailto:a...@apache.org>> wrote:

"EXTERNAL EMAIL" - This email originated from outside of the organization. Do 
not click or open attachments unless you recognize the sender and know the 
content is safe. If you are unsure, please contact CTS at 
hel...@pointclickcare.com<mailto:hel...@pointclickcare.com>.



On 11/08/2021 01:17, Brandon Sara wrote:
> Can I get some recommendations on how to best tweak/setup memory for my 
> fuseki servers? Here is my setup:
>
> - I’ve got a single TDB with at least several million triples (I don’t know 
> the exact amount yet, but perhaps around 10s of millions, maybe 100s of 
> millions…a the very least, I need it to scale to 100s of millions).
> - Everything is put in the default graph currently (wanting to change 
> this…but can’t the this point in time).
> - The “TransitiveReasoner” is being used on the dataset.

What properties are transitive?

(and maybe 
https://jena.apache.org/documentation/rdfs/?<https://jena.apache.org/documentation/rdfs/?>)

> - Full text indexing over two different fields is enabled using Lucene.
> - The servers are running embedded Fuseki via rdf-delta and sync via a 
> central rdf-delta server.
> - The simplest of queries won’t finish and runs out of memory with, at the 
> very least, 6 GB of RAM.

Example query?

And is the request causing the database to be sync'ed before the request
starts?

>
> Also, should I be tweaking my non-heap memory to be larger for the Fuseki 
> server?

Unlikely.

Andy

>
> Thanks.
>
> No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
> Company, policies prohibit sending protected health information (PHI) by 
> email, which may violate regulatory requirements. If sending PHI is 
> necessary, please contact the sender for secure delivery instructions.
>
> Confidentiality Notice: This email message, including any attachments, is for 
> the sole use of the intended recipient(s) and may contain confidential and 
> privileged information. Any unauthorized review, use, disclosure or 
> distribution is prohibited. If you are not the intended recipient, please 
> contact the sender by reply email and destroy all copies of the original 
> message.
>


No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Need recommendation for memory settings using Fuseki/Delta server

2021-08-10 Thread Brandon Sara
Can I get some recommendations on how to best tweak/setup memory for my fuseki 
servers? Here is my setup:

- I’ve got a single TDB with at least several million triples (I don’t know the 
exact amount yet, but perhaps around 10s of millions, maybe 100s of millions…a 
the very least, I need it to scale to 100s of millions).
- Everything is put in the default graph currently (wanting to change this…but 
can’t the this point in time).
- The “TransitiveReasoner” is being used on the dataset.
- Full text indexing over two different fields is enabled using Lucene.
- The servers are running embedded Fuseki via rdf-delta and sync via a central 
rdf-delta server.
- The simplest of queries won’t finish and runs out of memory with, at the very 
least, 6 GB of RAM.

Also, should I be tweaking my non-heap memory to be larger for the Fuseki 
server?

Thanks.

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: Compact won't execute on inference dataset

2021-05-10 Thread Brandon Sara
I think that even that slight change to endpoint structure would help quite a 
bit.

On May 10, 2021, at 11:18 AM, Rob Vesse 
mailto:rve...@dotnetrdf.org>> wrote:




No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: Bug: Compact won't execute on inference dataset

2021-05-10 Thread Brandon Sara
> Why is that?

It would be most helpful if we had one single endpoint to manage a dataset, 
rather than two.


> The HTTP operation ha to identify the database and DatasetGraphMapLink can 
> have multiple, graphs from different databases.

Thank you for this info, I hadn’t realized that you could combine graphs from 
different DBs for a service.


> Compaction only works on TDB2 databases.

While this makes sense. The wording of the documentation and of the endpoint 
itself leads one to believe that the dataset in general, regardless of the fact 
that the DB is “wrapped” by graph logic like inference, is all that is required 
for a dataset that is ultimately backed by a TDB2 db. For instance: 
`/$/compact/{dataset}` is the verbiage that is used for the endpoint. But in 
the config, I can set the “dataset” of my service to something that ultimately 
references a TDB2 DB. How is one to know that “dataset” in the compaction 
endpoint is not necessarily synonymous to “dataset” in the assembler config?


> What sort of inference are you using?

We are using normal OWL inference via the different supplied OWL reasoners and 
one service that uses the transitive reasoner.

On May 8, 2021, at 3:17 AM, Andy Seaborne 
mailto:a...@apache.org>> wrote:

"EXTERNAL EMAIL" - This email originated from outside of the organization. Do 
not click or open attachments unless you recognize the sender and know the 
content is safe. If you are unsure, please contact CTS at 
hel...@pointclickcare.com<mailto:hel...@pointclickcare.com>.



On 08/05/2021 00:05, Brandon Sara wrote:
While I can see how this will work, it is a pretty undesirable solution.

Why is that?

Are there any other options? Or is there a way to get this working for a 
situation like this in the code?

The HTTP operation ha to identify the database and DatasetGraphMapLink can have 
multiple, graphs from different databases.

The type of configuration in the test is quite prevalent in the docs and 
examples. I would think that there should either be a disclaimer that 
compaction won’t work if the pattern is used or the code should be updated to 
work for the situation, no?

Compaction only works on TDB2 databases.

An inference service isn't a TDB2 database.

What sort of inference are you using?

   Andy

On May 7, 2021, at 3:28 PM, Andy Seaborne 
mailto:a...@apache.org>> wrote:

"EXTERNAL EMAIL" - This email originated from outside of the organization. Do 
not click or open attachments unless you recognize the sender and know the 
content is safe. If you are unsure, please contact CTS at 
hel...@pointclickcare.com<mailto:hel...@pointclickcare.com>.



On 07/05/2021 20:44, Brandon Sara wrote:
I’ve found what I believe is a bug. If you try to run compaction via the 
fuseki-main `/$/compact/{ds}` endpoint on an `ja:RDFDataset` that has a 
defaultGraph of `ja:InfModel`, compaction will not execute because the 
resulting `DatasetGraphMapLink` type does not inherit `DatasetGraphSwitchable` 
nor `DatasetGraphWrapper`. I am able to run compaction just fine with the 
command line tool on this dataset, it is just being restricted from running 
when a request comes through `ActionCompact`.

Correct - while DatasetGraphWrapper might be possible, DatasetGraphMapLink is 
an independent ass

There are many ways datasets can be build out of dataets and models.

But you don't need to.

You can have a service that exposes the dataset directly.

# Service 1
<#data> rdf:type fuseki:Service ;
   fuseki:name "withInference" ;
   fuseki:dataset <#inf-dataset> .

<#inf-dataset>
 

:tdbGraph rdf:type tdb2:GraphTDB2 ;
   tdb2:dataset <#storage> .

# Service 2
## storage

<#tdb> rdf:type fuseki:Service ;
   fuseki:name "storage" ;
   # No operations or endpoints.
   fuseki:dataset <#dataset> ;
.

<#storage>  rdf:type  tdb2:DatasetTDB2 ;
   tdb2:location "DB2"
   .

then compact admin operations to
 http://localhost:3030/$/compact/storage

  Andy

I have written a test that can be used to duplicate the issue: 
https://github.com/bsara/jena/blob/compact-with-inference/jena-fuseki2/jena-fuseki-main/src/test/java/org/apache/jena/fuseki/main/TestConfigFile.java#L300-L323
I would try and fix the issue myself, but I know very little about the inner 
workings and intricacies of compaction, graph types, and how it all interacts.

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.
Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibite

Re: Bug: Compact won't execute on inference dataset

2021-05-07 Thread Brandon Sara
While I can see how this will work, it is a pretty undesirable solution. Are 
there any other options? Or is there a way to get this working for a situation 
like this in the code?

The type of configuration in the test is quite prevalent in the docs and 
examples. I would think that there should either be a disclaimer that 
compaction won’t work if the pattern is used or the code should be updated to 
work for the situation, no?

> On May 7, 2021, at 3:28 PM, Andy Seaborne  wrote:
>
> "EXTERNAL EMAIL" - This email originated from outside of the organization. Do 
> not click or open attachments unless you recognize the sender and know the 
> content is safe. If you are unsure, please contact CTS at 
> hel...@pointclickcare.com.
>
>
>
> On 07/05/2021 20:44, Brandon Sara wrote:
>> I’ve found what I believe is a bug. If you try to run compaction via the 
>> fuseki-main `/$/compact/{ds}` endpoint on an `ja:RDFDataset` that has a 
>> defaultGraph of `ja:InfModel`, compaction will not execute because the 
>> resulting `DatasetGraphMapLink` type does not inherit 
>> `DatasetGraphSwitchable` nor `DatasetGraphWrapper`. I am able to run 
>> compaction just fine with the command line tool on this dataset, it is just 
>> being restricted from running when a request comes through `ActionCompact`.
>
> Correct - while DatasetGraphWrapper might be possible, DatasetGraphMapLink is 
> an independent ass
>
> There are many ways datasets can be build out of dataets and models.
>
> But you don't need to.
>
> You can have a service that exposes the dataset directly.
>
> # Service 1
> <#data> rdf:type fuseki:Service ;
>fuseki:name "withInference" ;
>fuseki:dataset <#inf-dataset> .
>
> <#inf-dataset>
>  
>
> :tdbGraph rdf:type tdb2:GraphTDB2 ;
>tdb2:dataset <#storage> .
>
> # Service 2
> ## storage
>
> <#tdb> rdf:type fuseki:Service ;
>fuseki:name "storage" ;
># No operations or endpoints.
>fuseki:dataset <#dataset> ;
> .
>
> <#storage>  rdf:type  tdb2:DatasetTDB2 ;
>tdb2:location "DB2"
>.
>
> then compact admin operations to
>  http://localhost:3030/$/compact/storage
>
>   Andy
>
>> I have written a test that can be used to duplicate the issue: 
>> https://github.com/bsara/jena/blob/compact-with-inference/jena-fuseki2/jena-fuseki-main/src/test/java/org/apache/jena/fuseki/main/TestConfigFile.java#L300-L323
>> I would try and fix the issue myself, but I know very little about the inner 
>> workings and intricacies of compaction, graph types, and how it all 
>> interacts.
>

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Re: TDB2 Writing Very Slowly to S3 Volume

2021-05-07 Thread Brandon Sara
Thank you for your help guys. We ended up moving the db to a real filesystem 
rather than a network drive. Problem solved. :)
No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.



Bug: Compact won't execute on inference dataset

2021-05-07 Thread Brandon Sara
I’ve found what I believe is a bug. If you try to run compaction via the 
fuseki-main `/$/compact/{ds}` endpoint on an `ja:RDFDataset` that has a 
defaultGraph of `ja:InfModel`, compaction will not execute because the 
resulting `DatasetGraphMapLink` type does not inherit `DatasetGraphSwitchable` 
nor `DatasetGraphWrapper`. I am able to run compaction just fine with the 
command line tool on this dataset, it is just being restricted from running 
when a request comes through `ActionCompact`.

I have written a test that can be used to duplicate the issue: 
https://github.com/bsara/jena/blob/compact-with-inference/jena-fuseki2/jena-fuseki-main/src/test/java/org/apache/jena/fuseki/main/TestConfigFile.java#L300-L323

I would try and fix the issue myself, but I know very little about the inner 
workings and intricacies of compaction, graph types, and how it all interacts.

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


TDB2 Writing Very Slowly to S3 Volume

2021-04-28 Thread Brandon Sara
My Setup:
I’m running a few fuseki servers via Docker containers. I need the storage to 
be persistent across container restarts, so I’m using TDB2 for my storage. The 
TDB2 database are stored on a volume that is mounted to the Docker containers. 
This volume is part of our S3 instance. The Fuseki servers’ individual DBs are 
kept in sync using RDF-Delta. The dataset in question is using full text search 
using jena-text (lucene) with two properties being indexed (though, they occur 
often in the dataset). The reasoner being used is `TransitiveReasoner`. I have 
only one default graph and no other graphs.

My Problem:
To upload ~10 MB of data (in a ttl file format), it is taking sometimes more 
than 3 hours to complete! We tried turning off full text search and it cut the 
time in ~half. But still 1.5 hours for only 10MB of triple data is wy too 
long. Does anyone have any ideas of how we could fix this issue (except the 
obvious to not use a network connected disk)?

Thanks.

No PHI in Email: PointClickCare and Collective Medical, A PointClickCare 
Company, policies prohibit sending protected health information (PHI) by email, 
which may violate regulatory requirements. If sending PHI is necessary, please 
contact the sender for secure delivery instructions.

Confidentiality Notice: This email message, including any attachments, is for 
the sole use of the intended recipient(s) and may contain confidential and 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited. If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


Adding auto-deletion to compact task?

2021-04-09 Thread Brandon Sara
I would love it if we could have an automatic deletion option with the compact 
task, otherwise, a good amount of work can end up being needed just to ensure 
that a no longer needed `Data-` folder is removed. Is this something that 
the maintainers are willing to consider? If so, I’d be willing to help with the 
implementation of it.

Thanks.
-- 


*NOTICES*:

 

1.  **No PHI in Email**.  Collective Medical policy 
prohibits sending protected health information by email, which may violate 
applicable law. If sending PHI is necessary, please contact me for secure 
delivery instructions.

 

2.  **Confidentiality**.  This message and any 
attachments may be confidential and proprietary. If you received this in 
error, please contact me immediately and delete this message.  




smime.p7s
Description: S/MIME cryptographic signature


Re: Compaction on already compacted dataset causes dataset to grow

2021-04-07 Thread Brandon Sara
Thank you for the explanation and quick response time. This might be worth 
adding to the documentation (if it isn’t already there).

> On Apr 7, 2021, at 6:48 AM, Andy Seaborne  wrote:
> 
> It may well do.
> 
> The exact size of databases depends on the order it is created. It changes 
> how the B+Tree nodes split over their life so while the B+tree holds the same 
> data, the space used can differ. It should settle down to the same size if 
> done repeatedly.
> 
> It may also depend on what exactly is being reported about a "file sized". 
> TDB2 uses sparse files - allocates 8M chunks but does not use all the space 
> immediately. Different OS and different tools on Linux seem to report 
> differently, whether it is allocated space or used space.
> 
>   Andy
> 
> On 06/04/2021 21:43, Brandon Sara wrote:
>> I have a very large dataset. Before compaction, it was ~51 GB. I ran
>> compaction (using tdb2.tdbcompact cli tool) and it dropped down to 6.7 GB.
>> I then wanted to see how long it would take to run compaction on an already
>> compacted dataset. After running it, it grew in size to 7.4 GB, then it
>> grew with every subsequent compaction until it reached 7.6 GB.
>> Is this a bug? Do I have something configured incorrectly? Would compaction
>> not cause the dataset to grow in size if I ran it via the fuseki webapp
>> /$/compact/* endpoint?
>> Jena Version: 3.17.0
>> Thanks.


-- 


*NOTICES*:

 

1.  **No PHI in Email**.  Collective Medical policy 
prohibits sending protected health information by email, which may violate 
applicable law. If sending PHI is necessary, please contact me for secure 
delivery instructions.

 

2.  **Confidentiality**.  This message and any 
attachments may be confidential and proprietary. If you received this in 
error, please contact me immediately and delete this message.  




smime.p7s
Description: S/MIME cryptographic signature


ARQ vs plain HTTP, is one better than the other?

2021-04-06 Thread Brandon Sara
I have a fuseki server that i communicate with remotely. I can use ARQ to
create and send the query via RDFConnection and use ARQ to handling
iterating over the results. I could also just submit an HTTP request and
use something like Jackson to map the resulting JSON-LD to my POJOs. Are
there any advantages to one of these methods over the other (apart from the
APIs that ARQ provides and the ease of using a jackson object mapper)? Is
one method recommended to be used in this situation over the other?
(Perhaps ARQ isn't even intended to be used in this way)

Thanks.

-- 


*NOTICES*:

 

1.  **No PHI in Email**.  Collective Medical policy 
prohibits sending protected health information by email, which may violate 
applicable law. If sending PHI is necessary, please contact me for secure 
delivery instructions.

 

2.  **Confidentiality**.  This message and any 
attachments may be confidential and proprietary. If you received this in 
error, please contact me immediately and delete this message.  




Compaction on already compacted dataset causes dataset to grow

2021-04-06 Thread Brandon Sara
I have a very large dataset. Before compaction, it was ~51 GB. I ran
compaction (using tdb2.tdbcompact cli tool) and it dropped down to 6.7 GB.
I then wanted to see how long it would take to run compaction on an already
compacted dataset. After running it, it grew in size to 7.4 GB, then it
grew with every subsequent compaction until it reached 7.6 GB.

Is this a bug? Do I have something configured incorrectly? Would compaction
not cause the dataset to grow in size if I ran it via the fuseki webapp
/$/compact/* endpoint?

Jena Version: 3.17.0

Thanks.

-- 


*NOTICES*:

 

1.  **No PHI in Email**.  Collective Medical policy 
prohibits sending protected health information by email, which may violate 
applicable law. If sending PHI is necessary, please contact me for secure 
delivery instructions.

 

2.  **Confidentiality**.  This message and any 
attachments may be confidential and proprietary. If you received this in 
error, please contact me immediately and delete this message.