OntDocumentManager and LocatorClassLoader

2020-06-24 Thread Chris Tomlinson
Hi,

I've got a problem with OntDocumentManager when fetching resources from an 
element of the classpath via relative urls like.

When I use:

OntDocumentManager odm = new OntDocumentManager("A/B/C/ont-policy.rdf")

or

OntDocumentManager odm = new 
OntDocumentManager("https://x/a/b/c/ont-policy.rdf;)

all works as expected. The relative urls are retrieved using the path to the 
ont-policy.rdf either file or url. Part of making this work is

xmlns:base =""

in the policy file.

However, trying

OntDocumentManager odm = new OntDocumentManager("plop/ont-policy.rdf")

finds the policy resource via the LocatorClassLoader - seen with TRACE enabled 
- as expected; but then all the relative urls are prepended with the path to 
the app.jar:

java -jar /path/to/where/to/find/app.jar

I figure I'm missing some config or a uri scheme or something like that. I'm 
currently using 3.15.

Thank you for your help,
Chris

Re: location-mapping found but not used

2020-06-24 Thread Martynas Jusevičius
Hi Andy,

is FileManager already removed in the latest 3.16.0-SNAPSHOT? I've
started to get weird "No interface expected here" errors on a class
which extends FileManager.

Moreover, in the StreamManager, what happened to the methods like
hasCachedModel(), addCacheModel() and most importantly loadModel()?
Did they just go away with no replacement?

On Sun, Jun 21, 2020 at 11:00 AM Andy Seaborne  wrote:
>
>
>
> On 19/06/2020 22:49, Martynas Jusevičius wrote:
> > Hi,
> >
> > I'm trying to get location-mapping.n3 used by Jena 3.16.0-SNAPSHOT.
>
> Which one? What is the build time? Very recent, at a guess. JENA-1917.
>
> And are you using all the apache-jena-lib jars? (there are two
> LocationMappers - RIOT's, and a legacy one).
>
> > I placed it in etc/location-mapping.n3
>
> Change the name to .ttl.
>
> > and I can see in the logs it's
> > being read -- but not used afterwards. When http://spinrdf.org/sp# is
> > resolved, Jena thinks it's not mapped and proceeds to read from HTTP.
> > Also it looks like the LocationMapper and JenaIOEnvironment do not
> > agree with each other?
>
> You mean LocationMapper in jena-core logged it.  Old world. Legacy. Only
> left to keep ontology tests working.  Likely to be moved out of src/main.
>
> If you are working directly, use StreamManager.  FileManager will have
> deprecations next release.
>
>  Andy
>
> > The former finds the mapping file but the
> > latter does not? What is the purpose of JenaIOEnvironment at all?
> >
> > 23:43:17,247 [main] DEBUG LocationMapper:392 - Mapping:
> > http://spinrdf.org/spl => etc/spl.spin.ttl
> > 23:43:17,255 [main] DEBUG LocationMapper:392 - Mapping:
> > http://spinrdf.org/spl# => etc/spl.spin.ttl
> > 23:43:17,257 [main] DEBUG LocationMapper:392 - Mapping:
> > http://spinrdf.org/spin => etc/spin.ttl
> > 23:43:17,258 [main] DEBUG LocationMapper:392 - Mapping:
> > http://spinrdf.org/spin# => etc/spin.ttl
> > 23:43:17,259 [main] DEBUG LocationMapper:392 - Mapping:
> > http://spinrdf.org/sp => etc/sp.ttl
> > 23:43:17,259 [main] DEBUG LocationMapper:392 - Mapping:
> > http://spinrdf.org/sp# => etc/sp.ttl
> > 23:43:17,274 [main] DEBUG JenaIOEnvironment:180 - Failed to find
> > configuration: 
> > location-mapping.ttl;location-mapping.rdf;etc/location-mapping.rdf;etc/location-mapping.ttl
> > 23:43:17,476 [main] DEBUG info:337 - System architecture: 64 bit
> > 23:43:17,566 [main] DEBUG info:336 - System architecture: 64 bit
> > 23:43:18,093 [main] DEBUG AdapterFileManager:412 -
> > readModel(model,http://spinrdf.org/sp#)
> > 23:43:18,094 [main] DEBUG AdapterFileManager:429 -
> > readModel(model,http://spinrdf.org/sp#, null)
> > 23:43:18,094 [main] DEBUG StreamManager:144 - Not mapped: 
> > http://spinrdf.org/sp#
> > 23:43:19,082 [main] DEBUG HttpOp:1075 - [1] GET http://spinrdf.org/sp
> >
> > Thanks.
> >
> > Martynas
> >


Re: Jena ARQ

2020-06-24 Thread Martynas Jusevičius
I had this problem with 3.15.0. Upgraded to 3.16.0-SNAPSHOT and the
problem went away.

I think the issue is that Apache Commons Codec 1.14 contains the
MurmurHash3 class while for example 1.11 does not. You probably got an
older Codec version on the classpath which then gets used by Jena.

On Wed, Jun 24, 2020 at 2:37 PM Dr. Chavdar Ivanov  wrote:
>
> Hi Lorenz,
>
> Thanks for the reference. I see that indeed this came in 3.15
> I use Maven and as you say it should be ok if the 3.15.0 is ok.
> I will check the dependencies to be sure all is ok there.
>
> Best regards
> Chavdar
>
> -Original Message-
> From: Lorenz Buehmann 
> Sent: Wednesday, 24 June, 2020 12:02
> To: users@jena.apache.org
> Subject: Re: Jena ARQ
>
> Has been changed with JENA-1812 [1]
>
> But I'm wondering where you get errors now? If you use Maven and use
> 3.15.0 as dependencies, you should not see any errors.
>
>
> [1]
> https://issues.apache.org/jira/browse/JENA-1812?jql=project%20%3D%20JENA%20AND%20fixVersion%20%3D%20%22Jena%203.15.0%22%20ORDER%20BY%20key%20DESC%2C%20lastViewed%20DESC
>
> On 24.06.20 11:26, Dr. Chavdar Ivanov wrote:
> > Hi all,
> >
> > Is there some change in the packaging of commons?
> >
> > In the BlankNodeAllocatorHash.java the line below gives a problem if 
> > version 3.15.0 is used and it is ok if 3.14.0 is used
> >
> > import org.apache.commons.codec.digest.MurmurHash3;
> >
> > It cannot recognise the codec
> >
> > Is it an issue with 3.15.0 or something else?
> >
> > Best regards
> > Chavdar
> >


Re: TDB2 parallel load on cloud SSD and other observations/questions

2020-06-24 Thread Andy Seaborne

Hi Isroel,

It is worth trying each of the loaders to see how they perform on your 
machine.  Theer si a lot of parallel work when unlimited parallel 
loading of quads happens so may it is a mismatch to the machine setup. 
Databases created by any loader are compatible.


On 22/06/2020 16:57, Isroel Kogan wrote:

thank you Rob - I have confused the terminology. Indeed each run processes 0.5m 
quads.
what is UOM of the batch loading? M/s?

Looking at the output of iotop - the 3 main threads - which comprise the 
lionshare of the activity - have pretty steady reads - of about 900-950 M/s w 
little variation beyond that. write varies a little more - but 2-6 M/s. What is 
puzzling to me - is that in a configuration with twice the CPU and twice the 
RAM - the load average remained the same in absolute terms.
RAM usage was steady at about 8GB in the 32GB configuration - now its been 
steady at 4.8 (16GB configuration) -
in both cases it is severely underutilized.


That might be due to how much mmap memory the process can have.  There 
is a per-process restriction.



I'm not clear if you are loading an empty database or an existing one. 
It makes a difference


The indexes are sorted (they are B+Trees) so as the database grows, the 
depth of the tree (size of sorted data) grows and insert rates drop.




Another factor is the nature of the data : if there are many "large 
literals" such as long strings, there is more byte-shifting needed which 
slows the first phase, not the indexing steps.


Andy



I'm not conversant in Java - I've been trying jstat and jmap - the latter hasnt 
been working. how best to verify the heap size? what else can affect things?

On 2020/06/22 09:01:36, Rob Vesse  wrote:

Isabel

I think there might be a fundamental misunderstanding happening about batch sizes here.  
The batch sizes are fixed for a run and never changes, the "batch size" you 
refer to is a speed calculation e.g

 19:03:24 INFO  loader :: Add: 248,000,000 github_1_fixed.nq (Batch: 
3,562 / Avg: 38,030)
 19:05:19 INFO  loader :: Add: 248,500,000 github_1_fixed.nq (Batch: 
4,322 / Avg: 37,443)
 19:07:36 INFO  loader :: Add: 249,000,000 github_1_fixed.nq (Batch: 
3,649 / Avg: 36,759)
 19:09:19 INFO  loader :: Add: 249,500,000 github_1_fixed.nq (Batch: 
4,868 / Avg: 36,283)
 19:11:41 INFO  loader :: Add: 250,000,000 github_1_fixed.nq (Batch: 
3,532 / Avg: 35,622)
 19:11:41 INFO  loader ::   Elapsed: 7,017.95 seconds [2020/06/21 
19:11:41 UTC]
 19:13:58 INFO  loader :: Add: 250,500,000 github_1_fixed.nq (Batch: 
3,643 / Avg: 35,009)

Note that each batch is 500,000 quads as the number after "Add:" increases by 500,000 each time.  
The "Batch" in brackets is referring to the calculated loading speed for the current batch, where 
"Avg" is the calculated loading speed over the entire load.

So yes the speed is decreasing over time, this is a commonly reported issue on 
these lists but there is no simple fix for this because it depends both on 
hardware and data.  The output you have provided suggests that you are stuck on 
IO, Andy is the primary developer on TDB2 so he may be able to shed more light 
on what might be going on.

Rob

On 22/06/2020, 06:32, "Isroel Kogan"  wrote:

 Hi Andy - thanks for your comments.
 
 Instead of responding point by point, its best if I present a clearer picture -as  I also have a better understanding of the factors so far.
 
 
 GCP instance stats:

 $ lscpu
 
 Architecture:x86_64

 CPU op-mode(s):  32-bit, 64-bit
 Byte Order:  Little Endian
 Address sizes:   46 bits physical, 48 bits virtual
 CPU(s):  10
 On-line CPU(s) list: 0-9
 Thread(s) per core:  2
 Core(s) per socket:  5
 Socket(s):   1
 NUMA node(s):1
 Vendor ID:   GenuineIntel
 CPU family:  6
 Model:   63
 Model name:  Intel(R) Xeon(R) CPU @ 2.30GHz
 Stepping:0
 CPU MHz: 2300.000
 BogoMIPS:4600.00
 Hypervisor vendor:   KVM
 Virtualization type: full
 L1d cache:   32K
 L1i cache:   32K
 L2 cache:256K
 L3 cache:46080K
 NUMA node0 CPU(s):   0-9
 
 16GB RAM

 (I configured less RAM because on a prior iteration - out of 32GB - only 8 
was being used)
 
 3TB local SSD
 
 according to google cloud - max performance for this size is as follows:

 Storage space Partitions IOPS  Throughput 
(MB/s)
 Read  Write
Read Write
 
 3 TB 8 680,000  360,000   2,650 1,400
 
 
 https://cloud.google.com/compute/docs/disks
 
 I'm not getting that - but performance is an order of magnitude or more 

RE: Jena ARQ

2020-06-24 Thread Dr. Chavdar Ivanov
Hi Lorenz,

Thanks for the reference. I see that indeed this came in 3.15
I use Maven and as you say it should be ok if the 3.15.0 is ok.
I will check the dependencies to be sure all is ok there.

Best regards
Chavdar

-Original Message-
From: Lorenz Buehmann  
Sent: Wednesday, 24 June, 2020 12:02
To: users@jena.apache.org
Subject: Re: Jena ARQ

Has been changed with JENA-1812 [1]

But I'm wondering where you get errors now? If you use Maven and use
3.15.0 as dependencies, you should not see any errors.


[1]
https://issues.apache.org/jira/browse/JENA-1812?jql=project%20%3D%20JENA%20AND%20fixVersion%20%3D%20%22Jena%203.15.0%22%20ORDER%20BY%20key%20DESC%2C%20lastViewed%20DESC

On 24.06.20 11:26, Dr. Chavdar Ivanov wrote:
> Hi all,
>
> Is there some change in the packaging of commons?
>
> In the BlankNodeAllocatorHash.java the line below gives a problem if version 
> 3.15.0 is used and it is ok if 3.14.0 is used
>
> import org.apache.commons.codec.digest.MurmurHash3;
>
> It cannot recognise the codec
>
> Is it an issue with 3.15.0 or something else?
>
> Best regards
> Chavdar
>


Re: Jena ARQ

2020-06-24 Thread Lorenz Buehmann
Has been changed with JENA-1812 [1]

But I'm wondering where you get errors now? If you use Maven and use
3.15.0 as dependencies, you should not see any errors.


[1]
https://issues.apache.org/jira/browse/JENA-1812?jql=project%20%3D%20JENA%20AND%20fixVersion%20%3D%20%22Jena%203.15.0%22%20ORDER%20BY%20key%20DESC%2C%20lastViewed%20DESC

On 24.06.20 11:26, Dr. Chavdar Ivanov wrote:
> Hi all,
>
> Is there some change in the packaging of commons?
>
> In the BlankNodeAllocatorHash.java the line below gives a problem if version 
> 3.15.0 is used and it is ok if 3.14.0 is used
>
> import org.apache.commons.codec.digest.MurmurHash3;
>
> It cannot recognise the codec
>
> Is it an issue with 3.15.0 or something else?
>
> Best regards
> Chavdar
>


Jena ARQ

2020-06-24 Thread Dr. Chavdar Ivanov
Hi all,

Is there some change in the packaging of commons?

In the BlankNodeAllocatorHash.java the line below gives a problem if version 
3.15.0 is used and it is ok if 3.14.0 is used

import org.apache.commons.codec.digest.MurmurHash3;

It cannot recognise the codec

Is it an issue with 3.15.0 or something else?

Best regards
Chavdar