Re: Fuseki context path?

2022-02-14 Thread Andy Seaborne




On 14/02/2022 21:31, Andy Seaborne wrote:



On 14/02/2022 20:59, aj...@apache.org wrote:
I'm afraid that doesn't work because I'm interested in proxying the 
entire

application, not a single dataset. I want to expose the whole UI, admin,
SPARQL editor and all.

I've tried proxying as you describe using --localhost, but the static
resources and JavaScript that compose the UI don't come through properly
when I have a path fragment on the other side a la:

ProxyPass /fuseki http://localhost:3030

  I'd really rather not get into rewriting HTML! I was hoping for a 
simple:


Not sure this is a fuseki problem. If a reverse proxy renames the URLs, 
the backend needs to do the reverse mapping within content. Whether full 
webapp/Tomcat or not, the backend is fairly oblivious to the RP rewrite.
Always tricky with JS sent to the client because the front end doesn't 
know there is an RP.



ProxyPass /fuseki http://localhost:3030/fuseki


Not clear here - are there multiple names under that root?



style of action.

Does that make sense?


Re: Fuseki context path?

2022-02-14 Thread A. Soroka
I can if needed, but it seems like a simple thing for the standalone to do.
If it can't be done now I will put in a PR.

Adam

On Mon, Feb 14, 2022, 4:29 PM Martynas Jusevičius 
wrote:

> Adam,
>
> Why not use the WAR file then in a servlet container?
>
> On Mon, 14 Feb 2022 at 21.59,  wrote:
>
> > I'm afraid that doesn't work because I'm interested in proxying the
> entire
> > application, not a single dataset. I want to expose the whole UI, admin,
> > SPARQL editor and all.
> >
> > I've tried proxying as you describe using --localhost, but the static
> > resources and JavaScript that compose the UI don't come through properly
> > when I have a path fragment on the other side a la:
> >
> > ProxyPass /fuseki http://localhost:3030
> >
> >  I'd really rather not get into rewriting HTML! I was hoping for a
> simple:
> >
> > ProxyPass /fuseki http://localhost:3030/fuseki
> >
> > style of action.
> >
> > Does that make sense?
> >
> > Adam
> >
> >
> > On Mon, Feb 14, 2022, 2:27 PM Andy Seaborne  wrote:
> >
> > >
> > >
> > > On 14/02/2022 17:30, aj...@apache.org wrote:
> > > > I'm probably missing something obvious, because I haven't looked at
> > > Fuseki
> > > > in quite some time. I cannot seem to find any way to set the servlet
> > > > context path for Fuseki in its standalone (non-WAR) incarnation,
> which
> > I
> > > > want to do in order to get it proxied behind httpd.
> > >
> > > For Fuseki standalone server (in the download) and Fuseki Main:
> > >
> > > Set the name of the dataset to a path. The name can have a "/" in it
> but
> > > it seems to need the service name to help it distinguish between the
> > > "sparql" query service and /some/path/dataset thinking "dataset" is the
> > > service (routing has been decided before the named services are
> > > available to inspect).
> > >
> > > fuseki-server /some/path/dataset/sparql
> > >
> > > Is that enough for you?
> > >
> > > BTW:
> > >
> > > One way to proxy is to run it on a known port and then use --localhost
> -
> > > the Fuseki server then will only talk to HTTP traffic on the localhost
> > > interface (IPv4 or IPv6), not to directly sent traffic.
> > >
> > >  Andy
> > >
> > > > Is there a setting here, or will I have to define a Jetty
> configuration
> > > (in
> > > > which case, do we have an example available?)?
> > > >
> > > > Thanks for any info!
> > > >
> > > > Adam
> > > >
> > >
> >
>


Re: Fuseki context path?

2022-02-14 Thread Andy Seaborne




On 14/02/2022 20:59, aj...@apache.org wrote:

I'm afraid that doesn't work because I'm interested in proxying the entire
application, not a single dataset. I want to expose the whole UI, admin,
SPARQL editor and all.

I've tried proxying as you describe using --localhost, but the static
resources and JavaScript that compose the UI don't come through properly
when I have a path fragment on the other side a la:

ProxyPass /fuseki http://localhost:3030

  I'd really rather not get into rewriting HTML! I was hoping for a simple:

ProxyPass /fuseki http://localhost:3030/fuseki

style of action.

Does that make sense?


yes, and it should work.

What's the problem?
Does http://localhost:3030/fuseki work on it's own?

sparql.org is behind httpd:


  ServerName sparql.org
  ServerAdmin ...@

  ## Vhost docroot
  DocumentRoot "/var/www/html"

  ## Directories, there should at least be a declaration
  ## for /var/www/html

  
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Require all granted
  

  ## Logging
  ErrorLog "/var/log/apache2/sparql.org.error.log"
  ServerSignature Off
  CustomLog "/var/log/apache2/sparql.org.access.log" combined

  ## Server aliases
  ServerAlias www.sparql.org
  ServerAlias sparql.net
  ServerAlias www.sparql.net

  ## Custom fragment
  ProxyPass / http://127.0.0.1:3030/ max=4
  ProxyPassReverse / http://127.0.0.1:3030/
  ProxyPreserveHost On
  RequestHeader set X-Forwarded-Proto http




PS JENA-2281 for improving the path routing.



Adam


On Mon, Feb 14, 2022, 2:27 PM Andy Seaborne  wrote:




On 14/02/2022 17:30, aj...@apache.org wrote:

I'm probably missing something obvious, because I haven't looked at

Fuseki

in quite some time. I cannot seem to find any way to set the servlet
context path for Fuseki in its standalone (non-WAR) incarnation, which I
want to do in order to get it proxied behind httpd.


For Fuseki standalone server (in the download) and Fuseki Main:

Set the name of the dataset to a path. The name can have a "/" in it but
it seems to need the service name to help it distinguish between the
"sparql" query service and /some/path/dataset thinking "dataset" is the
service (routing has been decided before the named services are
available to inspect).

fuseki-server /some/path/dataset/sparql

Is that enough for you?

BTW:

One way to proxy is to run it on a known port and then use --localhost -
the Fuseki server then will only talk to HTTP traffic on the localhost
interface (IPv4 or IPv6), not to directly sent traffic.

  Andy


Is there a setting here, or will I have to define a Jetty configuration

(in

which case, do we have an example available?)?

Thanks for any info!

Adam







Re: Fuseki context path?

2022-02-14 Thread Martynas Jusevičius
Adam,

Why not use the WAR file then in a servlet container?

On Mon, 14 Feb 2022 at 21.59,  wrote:

> I'm afraid that doesn't work because I'm interested in proxying the entire
> application, not a single dataset. I want to expose the whole UI, admin,
> SPARQL editor and all.
>
> I've tried proxying as you describe using --localhost, but the static
> resources and JavaScript that compose the UI don't come through properly
> when I have a path fragment on the other side a la:
>
> ProxyPass /fuseki http://localhost:3030
>
>  I'd really rather not get into rewriting HTML! I was hoping for a simple:
>
> ProxyPass /fuseki http://localhost:3030/fuseki
>
> style of action.
>
> Does that make sense?
>
> Adam
>
>
> On Mon, Feb 14, 2022, 2:27 PM Andy Seaborne  wrote:
>
> >
> >
> > On 14/02/2022 17:30, aj...@apache.org wrote:
> > > I'm probably missing something obvious, because I haven't looked at
> > Fuseki
> > > in quite some time. I cannot seem to find any way to set the servlet
> > > context path for Fuseki in its standalone (non-WAR) incarnation, which
> I
> > > want to do in order to get it proxied behind httpd.
> >
> > For Fuseki standalone server (in the download) and Fuseki Main:
> >
> > Set the name of the dataset to a path. The name can have a "/" in it but
> > it seems to need the service name to help it distinguish between the
> > "sparql" query service and /some/path/dataset thinking "dataset" is the
> > service (routing has been decided before the named services are
> > available to inspect).
> >
> > fuseki-server /some/path/dataset/sparql
> >
> > Is that enough for you?
> >
> > BTW:
> >
> > One way to proxy is to run it on a known port and then use --localhost -
> > the Fuseki server then will only talk to HTTP traffic on the localhost
> > interface (IPv4 or IPv6), not to directly sent traffic.
> >
> >  Andy
> >
> > > Is there a setting here, or will I have to define a Jetty configuration
> > (in
> > > which case, do we have an example available?)?
> > >
> > > Thanks for any info!
> > >
> > > Adam
> > >
> >
>


Re: Fuseki context path?

2022-02-14 Thread ajs6f
I'm afraid that doesn't work because I'm interested in proxying the entire
application, not a single dataset. I want to expose the whole UI, admin,
SPARQL editor and all.

I've tried proxying as you describe using --localhost, but the static
resources and JavaScript that compose the UI don't come through properly
when I have a path fragment on the other side a la:

ProxyPass /fuseki http://localhost:3030

 I'd really rather not get into rewriting HTML! I was hoping for a simple:

ProxyPass /fuseki http://localhost:3030/fuseki

style of action.

Does that make sense?

Adam


On Mon, Feb 14, 2022, 2:27 PM Andy Seaborne  wrote:

>
>
> On 14/02/2022 17:30, aj...@apache.org wrote:
> > I'm probably missing something obvious, because I haven't looked at
> Fuseki
> > in quite some time. I cannot seem to find any way to set the servlet
> > context path for Fuseki in its standalone (non-WAR) incarnation, which I
> > want to do in order to get it proxied behind httpd.
>
> For Fuseki standalone server (in the download) and Fuseki Main:
>
> Set the name of the dataset to a path. The name can have a "/" in it but
> it seems to need the service name to help it distinguish between the
> "sparql" query service and /some/path/dataset thinking "dataset" is the
> service (routing has been decided before the named services are
> available to inspect).
>
> fuseki-server /some/path/dataset/sparql
>
> Is that enough for you?
>
> BTW:
>
> One way to proxy is to run it on a known port and then use --localhost -
> the Fuseki server then will only talk to HTTP traffic on the localhost
> interface (IPv4 or IPv6), not to directly sent traffic.
>
>  Andy
>
> > Is there a setting here, or will I have to define a Jetty configuration
> (in
> > which case, do we have an example available?)?
> >
> > Thanks for any info!
> >
> > Adam
> >
>


Re: How to resolve a transaction error

2022-02-14 Thread Andy Seaborne

Hi Erik,

Do you have a small example that reproduces this? (outisde of your 
docker setup)


The SLF4J warning shouldn't happen either.

Andy

On 14/02/2022 16:56, Erik Bijsterbosch wrote:

Hi Lorenz,

We base all functionality on docker, so we dockerise every new
application we develop locally and want to ship/deploy to a server.
Docker limitations could bother us, that's what I keep in mind too, but so
far application errors still lead the way in debugging.
It could be that the TX abort error is not explicit enough and is thrown as
a general exception handler
I'm not a java programmer, so I hope someone will have a look at this.

Regards,
Erik


Op ma 14 feb. 2022 om 10:41 schreef Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de>:


Hi

On 14.02.22 09:26, Erik Bijsterbosch wrote:

Hi,

I want to resolve the transaction error I mentioned  before in an earlier
post/conversation.
This question was cluttered too much with context to get noticed, I

guess.

So here's a new attempt...

After starting a (4.4.0 docker) fuseki server or a fuseki geosparql

server

with inference enabled on my large dataset I get the following error
message:

fuseki_1| Write transaction with no commit() or abort()

before

end() - forced abort

Inference seemed to work earlier on this dataset with my
previous implementation and I assume now this is data related.
What can I do to debug this?


what was your previous implementation? What did you change?

Do we already know if it works without Docker?



Regards,
Erik







AW: xloader "Can't find gzip program"

2022-02-14 Thread Andy Seaborne

Thanks for the details.  Good to add to the collective experience.

One reason to parse the file to /dev/null before trying to load it.

It doesn't look like there is much you can do. Reading the man page for 
bzip2recover, it's going to loose some data and if that is not aligned 
to N-triples, it will break the parser.  Only by finding and fixing up 
the damaged (in the NT sense) block file will it recover most of the data.


Andy

On 14/02/2022 13:19, Neubert, Joachim wrote:

The error was in the binary:
lbzcat: "/zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2": compressed 
data error: bad block header magic

That created non-RDF input:

  [nbt@e6810f891672 ~]$ bzcat 
/zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2 | sed -n 
'4052914958,4052914960p;4052914961q'
  
"\u0646\u062C\u0645 \u0641\u064A \u0643\u0648\u0643\u0628\u0629 
\u0627\u0644\u062B\u0648\u0631"@ar .

bzcat: Compressed file ends unexpectedly;
 perhaps it is corrupted?  *Possible* reason follows.
bzcat: Success
 Input file = /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2, 
output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

  "star in 
the constellation Taurus"@en .
 
  .

which in turn produced:

03:02:18 INFO  Nodes   :: Add: 4,052,000,000 latest-truthy.nt (Batch: 
108,189 / Avg: 102,550)
03:02:26 ERROR riot:: [line: 4052914959, col: 80] Bad input stream 
[java.io.IOException: Unexpected end of stream]
Exception in thread "AsyncParser" org.apache.jena.riot.RiotException: [line: 
4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream]
 at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
 at 
org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
 at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
 at org.apache.jena.riot.lang.LangNTuple.parseTriple(LangNTuple.java:95)
 at 
org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:61)
 at 
org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:53)
 at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
 at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:186)
 at org.apache.jena.riot.RDFParser.read(RDFParser.java:366)
 at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:335)
 at org.apache.jena.riot.RDFParser.parse(RDFParser.java:310)
 at 
org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:552)
 at 
org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$0(ProcBuildNodeTableX.java:198)
 at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
 at 
org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$1(ProcBuildNodeTableX.java:194)
 at java.base/java.lang.Thread.run(Thread.java:829)

Cheers, Joachim


-Ursprüngliche Nachricht-
Von: Andy Seaborne 
Gesendet: Montag, 14. Februar 2022 13:46
An: users@jena.apache.org
Betreff: Re: AW: AW: AW: AW: xloader "Can't find gzip program"



On 14/02/2022 08:01, Neubert, Joachim wrote:

Thanks, Andy, the TDB2 assembler fixed it, and all worked well.

I've tried to load wikidata-truthy then, but apparently the bzip file
was damaged at line 4052914959 - have to try again


How annoying.

Is it an RDF syntax error or bad binary or somethign else?

--

My experience is that gz is faster to load.

bz2 emphases compactness over speed.

  Andy



Cheers, Joachim


-Ursprüngliche Nachricht-
Von: Andy Seaborne 
Gesendet: Samstag, 12. Februar 2022 11:15
An: users@jena.apache.org
Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"

Hi Joachim,

Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".

The build setup is for repeatable builds of releases. Any build from
the X.Y.Z release source, with the same JDK, will generate the byte-wise

same jar files.


Each release build fixes the timestamp and uses that, and it gets in
the POM as property . It only get
updated when a release happens otherwise the POM file is going to get
modified several times a week.

Thankfully, we have --version on most commands as well.

That's timestamps explained.



You seem to have run the TDB2 xloader, then given the text index
builder a assembler description for TDB1.

Fuseki with --loc determines the database type by looking at the file
layout, but 

Re: Does Jena need maintainance regarding to disk space?

2022-02-14 Thread Andy Seaborne

Yes, it's a good idea.

TDB2 in Fuseki has the "compact" operation to do this without stopping 
the server. It creates a new "Data-/" directory and you can delete 
lower numbered databases.


TDB1 - needs the server stopping for a rebuild. If you can stop updates, 
stop updates, 9server now read-only) backup, rebuild from backup then 
stop the server and swap the databases.


Andy

On 14/02/2022 11:39, Mikael Pesonen wrote:


Hi,

we have now 13M triples and space usage of Jena data folder is 88G which 
seems high. This is not including text index.
Should we cleanup/compress/rebuild etc the database regularly in order 
to keep disk usage lower, or is this normal disk usage?


BR



Re: Fuseki context path?

2022-02-14 Thread Andy Seaborne




On 14/02/2022 17:30, aj...@apache.org wrote:

I'm probably missing something obvious, because I haven't looked at Fuseki
in quite some time. I cannot seem to find any way to set the servlet
context path for Fuseki in its standalone (non-WAR) incarnation, which I
want to do in order to get it proxied behind httpd.


For Fuseki standalone server (in the download) and Fuseki Main:

Set the name of the dataset to a path. The name can have a "/" in it but 
it seems to need the service name to help it distinguish between the 
"sparql" query service and /some/path/dataset thinking "dataset" is the 
service (routing has been decided before the named services are 
available to inspect).


fuseki-server /some/path/dataset/sparql

Is that enough for you?

BTW:

One way to proxy is to run it on a known port and then use --localhost - 
the Fuseki server then will only talk to HTTP traffic on the localhost 
interface (IPv4 or IPv6), not to directly sent traffic.


Andy


Is there a setting here, or will I have to define a Jetty configuration (in
which case, do we have an example available?)?

Thanks for any info!

Adam



Fuseki context path?

2022-02-14 Thread ajs6f
I'm probably missing something obvious, because I haven't looked at Fuseki
in quite some time. I cannot seem to find any way to set the servlet
context path for Fuseki in its standalone (non-WAR) incarnation, which I
want to do in order to get it proxied behind httpd.

Is there a setting here, or will I have to define a Jetty configuration (in
which case, do we have an example available?)?

Thanks for any info!

Adam


Re: How to resolve a transaction error

2022-02-14 Thread Erik Bijsterbosch
Hi Lorenz,

We base all functionality on docker, so we dockerise every new
application we develop locally and want to ship/deploy to a server.
Docker limitations could bother us, that's what I keep in mind too, but so
far application errors still lead the way in debugging.
It could be that the TX abort error is not explicit enough and is thrown as
a general exception handler
I'm not a java programmer, so I hope someone will have a look at this.

Regards,
Erik


Op ma 14 feb. 2022 om 10:41 schreef Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de>:

> Hi
>
> On 14.02.22 09:26, Erik Bijsterbosch wrote:
> > Hi,
> >
> > I want to resolve the transaction error I mentioned  before in an earlier
> > post/conversation.
> > This question was cluttered too much with context to get noticed, I
> guess.
> > So here's a new attempt...
> >
> > After starting a (4.4.0 docker) fuseki server or a fuseki geosparql
> server
> > with inference enabled on my large dataset I get the following error
> > message:
> >
> > fuseki_1| Write transaction with no commit() or abort()
> before
> > end() - forced abort
> >
> > Inference seemed to work earlier on this dataset with my
> > previous implementation and I assume now this is data related.
> > What can I do to debug this?
>
> what was your previous implementation? What did you change?
>
> Do we already know if it works without Docker?
>
> >
> > Regards,
> > Erik
> >
>


AW: AW: AW: AW: AW: xloader "Can't find gzip program"

2022-02-14 Thread Neubert, Joachim
The error was in the binary:
lbzcat: "/zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2": compressed 
data error: bad block header magic

That created non-RDF input:

 [nbt@e6810f891672 ~]$ bzcat 
/zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2 | sed -n 
'4052914958,4052914960p;4052914961q'
  
"\u0646\u062C\u0645 \u0641\u064A \u0643\u0648\u0643\u0628\u0629 
\u0627\u0644\u062B\u0648\u0631"@ar .

bzcat: Compressed file ends unexpectedly;
perhaps it is corrupted?  *Possible* reason follows.
bzcat: Success
Input file = /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2, 
output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

  
"star in the constellation Taurus"@en .
 
  .

which in turn produced:

03:02:18 INFO  Nodes   :: Add: 4,052,000,000 latest-truthy.nt (Batch: 
108,189 / Avg: 102,550)
03:02:26 ERROR riot:: [line: 4052914959, col: 80] Bad input stream 
[java.io.IOException: Unexpected end of stream]
Exception in thread "AsyncParser" org.apache.jena.riot.RiotException: [line: 
4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of 
stream]
at 
org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163)
at 
org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148)
at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105)
at org.apache.jena.riot.lang.LangNTuple.parseTriple(LangNTuple.java:95)
at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:61)
at 
org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:53)
at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43)
at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:186)
at org.apache.jena.riot.RDFParser.read(RDFParser.java:366)
at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:335)
at org.apache.jena.riot.RDFParser.parse(RDFParser.java:310)
at 
org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:552)
at 
org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$0(ProcBuildNodeTableX.java:198)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1541)
at 
org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$1(ProcBuildNodeTableX.java:194)
at java.base/java.lang.Thread.run(Thread.java:829)

Cheers, Joachim

> -Ursprüngliche Nachricht-
> Von: Andy Seaborne 
> Gesendet: Montag, 14. Februar 2022 13:46
> An: users@jena.apache.org
> Betreff: Re: AW: AW: AW: AW: xloader "Can't find gzip program"
> 
> 
> 
> On 14/02/2022 08:01, Neubert, Joachim wrote:
> > Thanks, Andy, the TDB2 assembler fixed it, and all worked well.
> >
> > I've tried to load wikidata-truthy then, but apparently the bzip file
> > was damaged at line 4052914959 - have to try again
> 
> How annoying.
> 
> Is it an RDF syntax error or bad binary or somethign else?
> 
> --
> 
> My experience is that gz is faster to load.
> 
> bz2 emphases compactness over speed.
> 
>  Andy
> 
> >
> > Cheers, Joachim
> >
> >> -Ursprüngliche Nachricht-
> >> Von: Andy Seaborne 
> >> Gesendet: Samstag, 12. Februar 2022 11:15
> >> An: users@jena.apache.org
> >> Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"
> >>
> >> Hi Joachim,
> >>
> >> Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".
> >>
> >> The build setup is for repeatable builds of releases. Any build from
> >> the X.Y.Z release source, with the same JDK, will generate the byte-wise
> same jar files.
> >>
> >> Each release build fixes the timestamp and uses that, and it gets in
> >> the POM as property . It only get
> >> updated when a release happens otherwise the POM file is going to get
> >> modified several times a week.
> >>
> >> Thankfully, we have --version on most commands as well.
> >>
> >> That's timestamps explained.
> >>
> >> 
> >>
> >> You seem to have run the TDB2 xloader, then given the text index
> >> builder a assembler description for TDB1.
> >>
> >> Fuseki with --loc determines the database type by looking at the file
> >> layout, but assemblers don't.
> >>
> >> The version output can be changed to say "TDB1" without too much
> >> disruption. Small tweak that might have helped shown this up earlier.
> >>
> >>   Andy
> >>
> >> On 11/02/2022 23:06, Neubert, Joachim wrote:
> >>> Sorry, my fault: I've actually had jena-4.4.0 active, not 

Re: AW: AW: AW: AW: xloader "Can't find gzip program"

2022-02-14 Thread Andy Seaborne




On 14/02/2022 08:01, Neubert, Joachim wrote:

Thanks, Andy, the TDB2 assembler fixed it, and all worked well.

I've tried to load wikidata-truthy then, but apparently the bzip file was 
damaged at line 4052914959 - have to try again


How annoying.

Is it an RDF syntax error or bad binary or somethign else?

--

My experience is that gz is faster to load.

bz2 emphases compactness over speed.

Andy



Cheers, Joachim


-Ursprüngliche Nachricht-
Von: Andy Seaborne 
Gesendet: Samstag, 12. Februar 2022 11:15
An: users@jena.apache.org
Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"

Hi Joachim,

Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".

The build setup is for repeatable builds of releases. Any build from the X.Y.Z
release source, with the same JDK, will generate the byte-wise same jar files.

Each release build fixes the timestamp and uses that, and it gets in the POM
as property . It only get updated when a
release happens otherwise the POM file is going to get modified several
times a week.

Thankfully, we have --version on most commands as well.

That's timestamps explained.



You seem to have run the TDB2 xloader, then given the text index builder a
assembler description for TDB1.

Fuseki with --loc determines the database type by looking at the file layout,
but assemblers don't.

The version output can be changed to say "TDB1" without too much
disruption. Small tweak that might have helped shown this up earlier.

  Andy

On 11/02/2022 23:06, Neubert, Joachim wrote:

Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.

Now the loading works smoothly:

22:50:10 INFO  Load node table  = 62 seconds
22:50:10 INFO  Load ingest data = 37 seconds
22:50:10 INFO  Build index SPO  = 7 seconds
22:50:10 INFO  Build index POS  = 12 seconds
22:50:10 INFO  Build index OSP  = 9 seconds
22:50:10 INFO  Overall  127 seconds
22:50:10 INFO  Overall  00h 02m 07s
22:50:10 INFO  Triples loaded   = 1000
22:50:10 INFO  Quads loaded = 0
22:50:10 INFO  Overall Rate 78740 tuples per second


That's output from tdb2.xloader.

At 10m up to 500m (laptop) or maybe 1B (server), triples, also try
"tdb2.tdbloader --loader=parallel"


However, the text indexing crashes, when called like that:

java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug
--desc=/tmp/temp.ttl

org.apache.jena.assembler.exceptions.AssemblerException: caught:

Unable to check TDB lock owner, the lock file contents appear to be for a
TDB2 database.  Please try loading this location as a TDB2 database. See
https://jena.apache.org/documentation/tdb/faqs.html for more
information.

doing:
  root: file:///tmp/temp.ttl#dataset with type:
http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class
org.apache.jena.tdb.assembler.DatasetAssemblerTDB1


But that is TDB1


  root: http://localhost/jena_example/#text_dataset with type:
http://jena.apache.org/text#TextDataset assembler class: class
org.apache.jena.query.text.assembler.TextDatasetAssembler


...

Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check

TDB lock owner, the lock file contents appear to be for a TDB2 database.
Please try loading this location as a TDB2 database. See
https://jena.apache.org/documentation/tdb/faqs.html for more
information.

  at
org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:
110)


org.apache.jena.tdb == TDB1


  at

org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139)

  at

org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java
:262)

  at

org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)

  at

org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)

  at

org.apache.jena.tdb.transaction.DatasetGraphTransaction.(DatasetGra
phTransaction.java:72)

  at
org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)

...


  ... 23 more
2022-02-11 22:50:12 ABORTED

cat /var/lib/fuseki/databases/temp/tdb.lock
32907

Cheers, Joachim


Does Jena need maintainance regarding to disk space?

2022-02-14 Thread Mikael Pesonen



Hi,

we have now 13M triples and space usage of Jena data folder is 88G which 
seems high. This is not including text index.
Should we cleanup/compress/rebuild etc the database regularly in order 
to keep disk usage lower, or is this normal disk usage?


BR



Re: How to resolve a transaction error

2022-02-14 Thread Lorenz Buehmann

Hi

On 14.02.22 09:26, Erik Bijsterbosch wrote:

Hi,

I want to resolve the transaction error I mentioned  before in an earlier
post/conversation.
This question was cluttered too much with context to get noticed, I guess.
So here's a new attempt...

After starting a (4.4.0 docker) fuseki server or a fuseki geosparql server
with inference enabled on my large dataset I get the following error
message:

fuseki_1| Write transaction with no commit() or abort() before
end() - forced abort

Inference seemed to work earlier on this dataset with my
previous implementation and I assume now this is data related.
What can I do to debug this?


what was your previous implementation? What did you change?

Do we already know if it works without Docker?



Regards,
Erik



How to resolve a transaction error

2022-02-14 Thread Erik Bijsterbosch
Hi,

I want to resolve the transaction error I mentioned  before in an earlier
post/conversation.
This question was cluttered too much with context to get noticed, I guess.
So here's a new attempt...

After starting a (4.4.0 docker) fuseki server or a fuseki geosparql server
with inference enabled on my large dataset I get the following error
message:

fuseki_1| Write transaction with no commit() or abort() before
end() - forced abort

Inference seemed to work earlier on this dataset with my
previous implementation and I assume now this is data related.
What can I do to debug this?

Regards,
Erik


AW: AW: AW: AW: xloader "Can't find gzip program"

2022-02-14 Thread Neubert, Joachim
Thanks, Andy, the TDB2 assembler fixed it, and all worked well.

I've tried to load wikidata-truthy then, but apparently the bzip file was 
damaged at line 4052914959 - have to try again

Cheers, Joachim

> -Ursprüngliche Nachricht-
> Von: Andy Seaborne 
> Gesendet: Samstag, 12. Februar 2022 11:15
> An: users@jena.apache.org
> Betreff: Re: AW: AW: AW: xloader "Can't find gzip program"
> 
> Hi Joachim,
> 
> Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03".
> 
> The build setup is for repeatable builds of releases. Any build from the X.Y.Z
> release source, with the same JDK, will generate the byte-wise same jar files.
> 
> Each release build fixes the timestamp and uses that, and it gets in the POM
> as property . It only get updated when a
> release happens otherwise the POM file is going to get modified several
> times a week.
> 
> Thankfully, we have --version on most commands as well.
> 
> That's timestamps explained.
> 
> 
> 
> You seem to have run the TDB2 xloader, then given the text index builder a
> assembler description for TDB1.
> 
> Fuseki with --loc determines the database type by looking at the file layout,
> but assemblers don't.
> 
> The version output can be changed to say "TDB1" without too much
> disruption. Small tweak that might have helped shown this up earlier.
> 
>  Andy
> 
> On 11/02/2022 23:06, Neubert, Joachim wrote:
> > Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT.
> >
> > Now the loading works smoothly:
> >
> > 22:50:10 INFO  Load node table  = 62 seconds
> > 22:50:10 INFO  Load ingest data = 37 seconds
> > 22:50:10 INFO  Build index SPO  = 7 seconds
> > 22:50:10 INFO  Build index POS  = 12 seconds
> > 22:50:10 INFO  Build index OSP  = 9 seconds
> > 22:50:10 INFO  Overall  127 seconds
> > 22:50:10 INFO  Overall  00h 02m 07s
> > 22:50:10 INFO  Triples loaded   = 1000
> > 22:50:10 INFO  Quads loaded = 0
> > 22:50:10 INFO  Overall Rate 78740 tuples per second
> 
> That's output from tdb2.xloader.
> 
> At 10m up to 500m (laptop) or maybe 1B (server), triples, also try
> "tdb2.tdbloader --loader=parallel"
> 
> > However, the text indexing crashes, when called like that:
> >
> > java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug
> > --desc=/tmp/temp.ttl
> >
> > org.apache.jena.assembler.exceptions.AssemblerException: caught:
> Unable to check TDB lock owner, the lock file contents appear to be for a
> TDB2 database.  Please try loading this location as a TDB2 database. See
> https://jena.apache.org/documentation/tdb/faqs.html for more
> information.
> >doing:
> >  root: file:///tmp/temp.ttl#dataset with type:
> > http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class
> > org.apache.jena.tdb.assembler.DatasetAssemblerTDB1
> 
> But that is TDB1
> 
> >  root: http://localhost/jena_example/#text_dataset with type:
> > http://jena.apache.org/text#TextDataset assembler class: class
> > org.apache.jena.query.text.assembler.TextDatasetAssembler
> >
> ...
> > Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check
> TDB lock owner, the lock file contents appear to be for a TDB2 database.
> Please try loading this location as a TDB2 database. See
> https://jena.apache.org/documentation/tdb/faqs.html for more
> information.
> >  at
> > org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:
> > 110)
> 
> org.apache.jena.tdb == TDB1
> 
> >  at
> org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139)
> >  at
> org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java
> :262)
> >  at
> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226)
> >  at
> org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240)
> >  at
> org.apache.jena.tdb.transaction.DatasetGraphTransaction.(DatasetGra
> phTransaction.java:72)
> >  at
> > org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114)
> ...
> 
> >  ... 23 more
> > 2022-02-11 22:50:12 ABORTED
> >
> > cat /var/lib/fuseki/databases/temp/tdb.lock
> > 32907
> >
> > Cheers, Joachim