AW: xloader "Can't find gzip program"
Thanks for the details. Good to add to the collective experience. One reason to parse the file to /dev/null before trying to load it. It doesn't look like there is much you can do. Reading the man page for bzip2recover, it's going to loose some data and if that is not aligned to N-triples, it will break the parser. Only by finding and fixing up the damaged (in the NT sense) block file will it recover most of the data. Andy On 14/02/2022 13:19, Neubert, Joachim wrote: The error was in the binary: lbzcat: "/zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2": compressed data error: bad block header magic That created non-RDF input: [nbt@e6810f891672 ~]$ bzcat /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2 | sed -n '4052914958,4052914960p;4052914961q' <http://www.wikidata.org/entity/Q85112545> <http://schema.org/description> "\u0646\u062C\u0645 \u0641\u064A \u0643\u0648\u0643\u0628\u0629 \u0627\u0644\u062B\u0648\u0631"@ar . bzcat: Compressed file ends unexpectedly; perhaps it is corrupted? *Possible* reason follows. bzcat: Success Input file = /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2, output file = (stdout) It is possible that the compressed file(s) have become corrupted. You can use the -tvv option to test integrity of such files. You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files. <http://www.wikidata.org/entity/Q85112545> <http://schema.org/description> "star in the constellation Taurus"@en . <https://www.wikidata.org/wiki/Special:EntityData/Q85112563> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Dataset> . which in turn produced: 03:02:18 INFO Nodes :: Add: 4,052,000,000 latest-truthy.nt (Batch: 108,189 / Avg: 102,550) 03:02:26 ERROR riot:: [line: 4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream] Exception in thread "AsyncParser" org.apache.jena.riot.RiotException: [line: 4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream] at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163) at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148) at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105) at org.apache.jena.riot.lang.LangNTuple.parseTriple(LangNTuple.java:95) at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:61) at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:53) at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43) at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:186) at org.apache.jena.riot.RDFParser.read(RDFParser.java:366) at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:335) at org.apache.jena.riot.RDFParser.parse(RDFParser.java:310) at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:552) at org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$0(ProcBuildNodeTableX.java:198) at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) at org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$1(ProcBuildNodeTableX.java:194) at java.base/java.lang.Thread.run(Thread.java:829) Cheers, Joachim -Ursprüngliche Nachricht- Von: Andy Seaborne Gesendet: Montag, 14. Februar 2022 13:46 An: users@jena.apache.org Betreff: Re: AW: AW: AW: AW: xloader "Can't find gzip program" On 14/02/2022 08:01, Neubert, Joachim wrote: Thanks, Andy, the TDB2 assembler fixed it, and all worked well. I've tried to load wikidata-truthy then, but apparently the bzip file was damaged at line 4052914959 - have to try again How annoying. Is it an RDF syntax error or bad binary or somethign else? -- My experience is that gz is faster to load. bz2 emphases compactness over speed. Andy Cheers, Joachim -Ursprüngliche Nachricht- Von: Andy Seaborne Gesendet: Samstag, 12. Februar 2022 11:15 An: users@jena.apache.org Betreff: Re: AW: AW: AW: xloader "Can't find gzip program" Hi Joachim, Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03". The build setup is for repeatable builds of releases. Any build from the X.Y.Z release source, with the same JDK, will generate the byte-wise same jar files. Each release build fixes the timestamp and uses that, and it gets in the POM as property . It only get updated when a release happens otherwise the POM file is going to get modified several times a week. Thankfully, we have --version on most commands as well. That's timestamps explained. You seem to have run the TDB2 xloader, then given the text index builder
AW: AW: AW: AW: AW: xloader "Can't find gzip program"
The error was in the binary: lbzcat: "/zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2": compressed data error: bad block header magic That created non-RDF input: [nbt@e6810f891672 ~]$ bzcat /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2 | sed -n '4052914958,4052914960p;4052914961q' <http://www.wikidata.org/entity/Q85112545> <http://schema.org/description> "\u0646\u062C\u0645 \u0641\u064A \u0643\u0648\u0643\u0628\u0629 \u0627\u0644\u062B\u0648\u0631"@ar . bzcat: Compressed file ends unexpectedly; perhaps it is corrupted? *Possible* reason follows. bzcat: Success Input file = /zbw/var/wikidata/2022-02-03/rdf/latest-truthy.nt.bz2, output file = (stdout) It is possible that the compressed file(s) have become corrupted. You can use the -tvv option to test integrity of such files. You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files. <http://www.wikidata.org/entity/Q85112545> <http://schema.org/description> "star in the constellation Taurus"@en . <https://www.wikidata.org/wiki/Special:EntityData/Q85112563> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Dataset> . which in turn produced: 03:02:18 INFO Nodes :: Add: 4,052,000,000 latest-truthy.nt (Batch: 108,189 / Avg: 102,550) 03:02:26 ERROR riot:: [line: 4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream] Exception in thread "AsyncParser" org.apache.jena.riot.RiotException: [line: 4052914959, col: 80] Bad input stream [java.io.IOException: Unexpected end of stream] at org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.fatal(ErrorHandlerFactory.java:163) at org.apache.jena.riot.lang.LangEngine.raiseException(LangEngine.java:148) at org.apache.jena.riot.lang.LangEngine.nextToken(LangEngine.java:105) at org.apache.jena.riot.lang.LangNTuple.parseTriple(LangNTuple.java:95) at org.apache.jena.riot.lang.LangNTriples.parseOne(LangNTriples.java:61) at org.apache.jena.riot.lang.LangNTriples.runParser(LangNTriples.java:53) at org.apache.jena.riot.lang.LangBase.parse(LangBase.java:43) at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:186) at org.apache.jena.riot.RDFParser.read(RDFParser.java:366) at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:335) at org.apache.jena.riot.RDFParser.parse(RDFParser.java:310) at org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:552) at org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$0(ProcBuildNodeTableX.java:198) at java.base/java.util.ArrayList.forEach(ArrayList.java:1541) at org.apache.jena.tdb2.xloader.ProcBuildNodeTableX.lambda$exec2$1(ProcBuildNodeTableX.java:194) at java.base/java.lang.Thread.run(Thread.java:829) Cheers, Joachim > -Ursprüngliche Nachricht- > Von: Andy Seaborne > Gesendet: Montag, 14. Februar 2022 13:46 > An: users@jena.apache.org > Betreff: Re: AW: AW: AW: AW: xloader "Can't find gzip program" > > > > On 14/02/2022 08:01, Neubert, Joachim wrote: > > Thanks, Andy, the TDB2 assembler fixed it, and all worked well. > > > > I've tried to load wikidata-truthy then, but apparently the bzip file > > was damaged at line 4052914959 - have to try again > > How annoying. > > Is it an RDF syntax error or bad binary or somethign else? > > -- > > My experience is that gz is faster to load. > > bz2 emphases compactness over speed. > > Andy > > > > > Cheers, Joachim > > > >> -Ursprüngliche Nachricht- > >> Von: Andy Seaborne > >> Gesendet: Samstag, 12. Februar 2022 11:15 > >> An: users@jena.apache.org > >> Betreff: Re: AW: AW: AW: xloader "Can't find gzip program" > >> > >> Hi Joachim, > >> > >> Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03". > >> > >> The build setup is for repeatable builds of releases. Any build from > >> the X.Y.Z release source, with the same JDK, will generate the byte-wise > same jar files. > >> > >> Each release build fixes the timestamp and uses that, and it gets in > >> the POM as property . It only get > >> updated when a release happens otherwise the POM file is going to get > >> modified several times a week. > >> > >> Thankfully, we have --version on most commands as well. > >> > >> That's timestamps explained. > >> > >> > >> > >> You seem to have run the TDB2 xloader, then given the text inde
Re: AW: AW: AW: AW: xloader "Can't find gzip program"
On 14/02/2022 08:01, Neubert, Joachim wrote: Thanks, Andy, the TDB2 assembler fixed it, and all worked well. I've tried to load wikidata-truthy then, but apparently the bzip file was damaged at line 4052914959 - have to try again How annoying. Is it an RDF syntax error or bad binary or somethign else? -- My experience is that gz is faster to load. bz2 emphases compactness over speed. Andy Cheers, Joachim -Ursprüngliche Nachricht- Von: Andy Seaborne Gesendet: Samstag, 12. Februar 2022 11:15 An: users@jena.apache.org Betreff: Re: AW: AW: AW: xloader "Can't find gzip program" Hi Joachim, Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03". The build setup is for repeatable builds of releases. Any build from the X.Y.Z release source, with the same JDK, will generate the byte-wise same jar files. Each release build fixes the timestamp and uses that, and it gets in the POM as property . It only get updated when a release happens otherwise the POM file is going to get modified several times a week. Thankfully, we have --version on most commands as well. That's timestamps explained. You seem to have run the TDB2 xloader, then given the text index builder a assembler description for TDB1. Fuseki with --loc determines the database type by looking at the file layout, but assemblers don't. The version output can be changed to say "TDB1" without too much disruption. Small tweak that might have helped shown this up earlier. Andy On 11/02/2022 23:06, Neubert, Joachim wrote: Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT. Now the loading works smoothly: 22:50:10 INFO Load node table = 62 seconds 22:50:10 INFO Load ingest data = 37 seconds 22:50:10 INFO Build index SPO = 7 seconds 22:50:10 INFO Build index POS = 12 seconds 22:50:10 INFO Build index OSP = 9 seconds 22:50:10 INFO Overall 127 seconds 22:50:10 INFO Overall 00h 02m 07s 22:50:10 INFO Triples loaded = 1000 22:50:10 INFO Quads loaded = 0 22:50:10 INFO Overall Rate 78740 tuples per second That's output from tdb2.xloader. At 10m up to 500m (laptop) or maybe 1B (server), triples, also try "tdb2.tdbloader --loader=parallel" However, the text indexing crashes, when called like that: java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug --desc=/tmp/temp.ttl org.apache.jena.assembler.exceptions.AssemblerException: caught: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information. doing: root: file:///tmp/temp.ttl#dataset with type: http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class org.apache.jena.tdb.assembler.DatasetAssemblerTDB1 But that is TDB1 root: http://localhost/jena_example/#text_dataset with type: http://jena.apache.org/text#TextDataset assembler class: class org.apache.jena.query.text.assembler.TextDatasetAssembler ... Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information. at org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java: 110) org.apache.jena.tdb == TDB1 at org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139) at org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java :262) at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226) at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240) at org.apache.jena.tdb.transaction.DatasetGraphTransaction.(DatasetGra phTransaction.java:72) at org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114) ... ... 23 more 2022-02-11 22:50:12 ABORTED cat /var/lib/fuseki/databases/temp/tdb.lock 32907 Cheers, Joachim
AW: AW: AW: AW: xloader "Can't find gzip program"
Thanks, Andy, the TDB2 assembler fixed it, and all worked well. I've tried to load wikidata-truthy then, but apparently the bzip file was damaged at line 4052914959 - have to try again Cheers, Joachim > -Ursprüngliche Nachricht- > Von: Andy Seaborne > Gesendet: Samstag, 12. Februar 2022 11:15 > An: users@jena.apache.org > Betreff: Re: AW: AW: AW: xloader "Can't find gzip program" > > Hi Joachim, > > Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03". > > The build setup is for repeatable builds of releases. Any build from the X.Y.Z > release source, with the same JDK, will generate the byte-wise same jar files. > > Each release build fixes the timestamp and uses that, and it gets in the POM > as property . It only get updated when a > release happens otherwise the POM file is going to get modified several > times a week. > > Thankfully, we have --version on most commands as well. > > That's timestamps explained. > > > > You seem to have run the TDB2 xloader, then given the text index builder a > assembler description for TDB1. > > Fuseki with --loc determines the database type by looking at the file layout, > but assemblers don't. > > The version output can be changed to say "TDB1" without too much > disruption. Small tweak that might have helped shown this up earlier. > > Andy > > On 11/02/2022 23:06, Neubert, Joachim wrote: > > Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT. > > > > Now the loading works smoothly: > > > > 22:50:10 INFO Load node table = 62 seconds > > 22:50:10 INFO Load ingest data = 37 seconds > > 22:50:10 INFO Build index SPO = 7 seconds > > 22:50:10 INFO Build index POS = 12 seconds > > 22:50:10 INFO Build index OSP = 9 seconds > > 22:50:10 INFO Overall 127 seconds > > 22:50:10 INFO Overall 00h 02m 07s > > 22:50:10 INFO Triples loaded = 1000 > > 22:50:10 INFO Quads loaded = 0 > > 22:50:10 INFO Overall Rate 78740 tuples per second > > That's output from tdb2.xloader. > > At 10m up to 500m (laptop) or maybe 1B (server), triples, also try > "tdb2.tdbloader --loader=parallel" > > > However, the text indexing crashes, when called like that: > > > > java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug > > --desc=/tmp/temp.ttl > > > > org.apache.jena.assembler.exceptions.AssemblerException: caught: > Unable to check TDB lock owner, the lock file contents appear to be for a > TDB2 database. Please try loading this location as a TDB2 database. See > https://jena.apache.org/documentation/tdb/faqs.html for more > information. > >doing: > > root: file:///tmp/temp.ttl#dataset with type: > > http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class > > org.apache.jena.tdb.assembler.DatasetAssemblerTDB1 > > But that is TDB1 > > > root: http://localhost/jena_example/#text_dataset with type: > > http://jena.apache.org/text#TextDataset assembler class: class > > org.apache.jena.query.text.assembler.TextDatasetAssembler > > > ... > > Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check > TDB lock owner, the lock file contents appear to be for a TDB2 database. > Please try loading this location as a TDB2 database. See > https://jena.apache.org/documentation/tdb/faqs.html for more > information. > > at > > org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java: > > 110) > > org.apache.jena.tdb == TDB1 > > > at > org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139) > > at > org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java > :262) > > at > org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226) > > at > org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240) > > at > org.apache.jena.tdb.transaction.DatasetGraphTransaction.(DatasetGra > phTransaction.java:72) > > at > > org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114) > ... > > > ... 23 more > > 2022-02-11 22:50:12 ABORTED > > > > cat /var/lib/fuseki/databases/temp/tdb.lock > > 32907 > > > > Cheers, Joachim
Re: AW: AW: AW: xloader "Can't find gzip program"
Hi Joachim, Aside: I've realised why the timestampes are fixed at "2022-01-30 15:03". The build setup is for repeatable builds of releases. Any build from the X.Y.Z release source, with the same JDK, will generate the byte-wise same jar files. Each release build fixes the timestamp and uses that, and it gets in the POM as property . It only get updated when a release happens otherwise the POM file is going to get modified several times a week. Thankfully, we have --version on most commands as well. That's timestamps explained. You seem to have run the TDB2 xloader, then given the text index builder a assembler description for TDB1. Fuseki with --loc determines the database type by looking at the file layout, but assemblers don't. The version output can be changed to say "TDB1" without too much disruption. Small tweak that might have helped shown this up earlier. Andy On 11/02/2022 23:06, Neubert, Joachim wrote: Sorry, my fault: I've actually had jena-4.4.0 active, not 4.5.0-SNAPSHOT. Now the loading works smoothly: 22:50:10 INFO Load node table = 62 seconds 22:50:10 INFO Load ingest data = 37 seconds 22:50:10 INFO Build index SPO = 7 seconds 22:50:10 INFO Build index POS = 12 seconds 22:50:10 INFO Build index OSP = 9 seconds 22:50:10 INFO Overall 127 seconds 22:50:10 INFO Overall 00h 02m 07s 22:50:10 INFO Triples loaded = 1000 22:50:10 INFO Quads loaded = 0 22:50:10 INFO Overall Rate 78740 tuples per second That's output from tdb2.xloader. At 10m up to 500m (laptop) or maybe 1B (server), triples, also try "tdb2.tdbloader --loader=parallel" However, the text indexing crashes, when called like that: java -cp $FUSEKI_HOME/fuseki-server.jar jena.textindexer --debug --desc=/tmp/temp.ttl org.apache.jena.assembler.exceptions.AssemblerException: caught: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information. doing: root: file:///tmp/temp.ttl#dataset with type: http://jena.hpl.hp.com/2008/tdb#DatasetTDB assembler class: class org.apache.jena.tdb.assembler.DatasetAssemblerTDB1 But that is TDB1 root: http://localhost/jena_example/#text_dataset with type: http://jena.apache.org/text#TextDataset assembler class: class org.apache.jena.query.text.assembler.TextDatasetAssembler ... Caused by: org.apache.jena.tdb.base.file.FileException: Unable to check TDB lock owner, the lock file contents appear to be for a TDB2 database. Please try loading this location as a TDB2 database. See https://jena.apache.org/documentation/tdb/faqs.html for more information. at org.apache.jena.tdb.base.file.LocationLock.getOwner(LocationLock.java:110) org.apache.jena.tdb == TDB1 at org.apache.jena.tdb.base.file.LocationLock.canObtain(LocationLock.java:139) at org.apache.jena.tdb.StoreConnection._makeAndCache(StoreConnection.java:262) at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:226) at org.apache.jena.tdb.StoreConnection.make(StoreConnection.java:240) at org.apache.jena.tdb.transaction.DatasetGraphTransaction.(DatasetGraphTransaction.java:72) at org.apache.jena.tdb.sys.TDBMaker.createDirect(TDBMaker.java:114) ... ... 23 more 2022-02-11 22:50:12 ABORTED cat /var/lib/fuseki/databases/temp/tdb.lock 32907 Cheers, Joachim
AW: AW: AW: xloader "Can't find gzip program"
) at org.apache.jena.tdb.assembler.DatasetAssemblerTDB1.createDataset(DatasetAssemblerTDB1.java:46) at org.apache.jena.sparql.core.assembler.DatasetAssembler.open(DatasetAssembler.java:40) at org.apache.jena.sparql.core.assembler.DatasetAssembler.open(DatasetAssembler.java:33) at org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup.openBySpecificType(AssemblerGroup.java:157) ... 23 more 2022-02-11 22:50:12 ABORTED cat /var/lib/fuseki/databases/temp/tdb.lock 32907 Cheers, Joachim > -Ursprüngliche Nachricht- > Von: Andy Seaborne > Gesendet: Freitag, 11. Februar 2022 23:06 > An: users@jena.apache.org > Betreff: Re: AW: AW: xloader "Can't find gzip program" > > > > On 11/02/2022 21:38, Neubert, Joachim wrote: > > Strange - I should have the same version: > > > > sudo tar xzvf > > /usr/local/src/apache-jena-fuseki-4.5.0-20220209.180144-12.tar.gz > > Different jar file : apache-jena-4.5.0-20220209.180144-12 (no Fuseki) but > weird anyway. > > wget > https://repository.apache.org/content/groups/snapshots/org/apache/jena/ > apache-jena/4.5.0-SNAPSHOT/apache-jena-4.5.0-20220209.180144-12.zip > > then the zip file is: > > 27372309 Feb 9 18:26 apache-jena-4.5.0-20220209.180144-12.zip > > > apache-jena-4.5.0-SNAPSHOT/bin/tdb2.tdbloader --version > > Jena: VERSION: 4.5.0-SNAPSHOT > Jena: BUILD_DATE: 2022-02-09T18:01:44Z > ARQ:VERSION: 4.5.0-SNAPSHOT > ARQ:BUILD_DATE: 2022-02-09T18:01:44Z > TDB2: VERSION: 4.5.0-SNAPSHOT > TDB2: BUILD_DATE: 2022-02-09T18:01:44Z > > yet the TDB2 jar is dated 30th Jan, as are the files inside it -- can't > explain that. > > 294846 Jan 30 15:03 > apache-jena-4.5.0-SNAPSHOT/lib/jena-tdb2-4.5.0-SNAPSHOT.jar > > The tdb2.xloader script is 10485 bytes and has > > SORT_THREADS="2" > > in it. Is that what your copy of the script have in it? > > I'll clear the Jenkins workspace and schedule a new build. > > Andy > > > > > but the jarfile date is of Jan 30: > > > > ll apache-jena-fuseki-4.5.0-SNAPSHOT/ > > total 35868 > > -rw-r--r-- 1 root root36975 Jan 30 15:02 LICENSE > > -rw-r--r-- 1 root root 8914 Jan 30 15:02 NOTICE > > -rw-r--r-- 1 root root 1151 Jan 30 15:02 README > > drwxr-xr-x 2 root root 179 Feb 11 20:47 bin > > -rwxr-xr-x 1 root root12339 Jan 30 15:02 fuseki > > -rwxr-xr-x 1 root root 1241 Jan 30 15:02 fuseki-backup > > -rwxr-xr-x 1 root root 3370 Jan 30 15:02 fuseki-server > > -rw-r--r-- 1 root root 1264 Jan 30 15:02 fuseki-server.bat > > -rw-r--r-- 1 root root 36631864 Jan 30 15:02 fuseki-server.jar > > -rw-r--r-- 1 root root 2217 Jan 30 15:02 fuseki.service > > -rw-r--r-- 1 root root 2124 Jan 30 15:02 log4j2.properties > > drwxr-xr-x 4 root root 121 Jan 30 15:02 webapp > > > > Cheers, Joachim > > > >> -Ursprüngliche Nachricht- > >> Von: Andy Seaborne > >> Gesendet: Freitag, 11. Februar 2022 22:30 > >> An: users@jena.apache.org > >> Betreff: Re: AW: xloader "Can't find gzip program" > >> > >> Works for me - make sure it is the latest dev build (the one down the > >> bottom) > >> > >> I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09) > >> > >> and loaded a few millions triples with no problems. > >> > >> rm -rf DB2 > >> apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2 > >> ~/Datasets/BSBM/bsbm-5m.nt.gz > >> > >> Andy > >> > >> On 11/02/2022 21:20, Neubert, Joachim wrote: > >>> Hi Andy, > >>> > >>> Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster > >>> - > >> however, the same error at SPO start. > >>> > >>> Please let me know if I can help with tracing/reproducing the error. > >>> > >>> Cheers, Joachim > >>> > >>>> -Ursprüngliche Nachricht- > >>>> Von: Andy Seaborne > >>>> Gesendet: Freitag, 11. Februar 2022 21:07 > >>>> An: users@jena.apache.org > >>>> Betreff: Re: xloader "Can't find gzip program" > >>>> > >>>> Hi Joachim, > >>>> > >>>> https://issues.apache.org/jira/browse/JENA-2277 > >>>> https://issues.apache.org/jira/browse/JENA-2279 > >>>> > >>>> There are two fixes for tdb2.xloader which are now in the > >>>> development > >&
Re: AW: AW: xloader "Can't find gzip program"
On 11/02/2022 21:38, Neubert, Joachim wrote: Strange - I should have the same version: sudo tar xzvf /usr/local/src/apache-jena-fuseki-4.5.0-20220209.180144-12.tar.gz Different jar file : apache-jena-4.5.0-20220209.180144-12 (no Fuseki) but weird anyway. wget https://repository.apache.org/content/groups/snapshots/org/apache/jena/apache-jena/4.5.0-SNAPSHOT/apache-jena-4.5.0-20220209.180144-12.zip then the zip file is: 27372309 Feb 9 18:26 apache-jena-4.5.0-20220209.180144-12.zip apache-jena-4.5.0-SNAPSHOT/bin/tdb2.tdbloader --version Jena: VERSION: 4.5.0-SNAPSHOT Jena: BUILD_DATE: 2022-02-09T18:01:44Z ARQ:VERSION: 4.5.0-SNAPSHOT ARQ:BUILD_DATE: 2022-02-09T18:01:44Z TDB2: VERSION: 4.5.0-SNAPSHOT TDB2: BUILD_DATE: 2022-02-09T18:01:44Z yet the TDB2 jar is dated 30th Jan, as are the files inside it -- can't explain that. 294846 Jan 30 15:03 apache-jena-4.5.0-SNAPSHOT/lib/jena-tdb2-4.5.0-SNAPSHOT.jar The tdb2.xloader script is 10485 bytes and has SORT_THREADS="2" in it. Is that what your copy of the script have in it? I'll clear the Jenkins workspace and schedule a new build. Andy but the jarfile date is of Jan 30: ll apache-jena-fuseki-4.5.0-SNAPSHOT/ total 35868 -rw-r--r-- 1 root root36975 Jan 30 15:02 LICENSE -rw-r--r-- 1 root root 8914 Jan 30 15:02 NOTICE -rw-r--r-- 1 root root 1151 Jan 30 15:02 README drwxr-xr-x 2 root root 179 Feb 11 20:47 bin -rwxr-xr-x 1 root root12339 Jan 30 15:02 fuseki -rwxr-xr-x 1 root root 1241 Jan 30 15:02 fuseki-backup -rwxr-xr-x 1 root root 3370 Jan 30 15:02 fuseki-server -rw-r--r-- 1 root root 1264 Jan 30 15:02 fuseki-server.bat -rw-r--r-- 1 root root 36631864 Jan 30 15:02 fuseki-server.jar -rw-r--r-- 1 root root 2217 Jan 30 15:02 fuseki.service -rw-r--r-- 1 root root 2124 Jan 30 15:02 log4j2.properties drwxr-xr-x 4 root root 121 Jan 30 15:02 webapp Cheers, Joachim -Ursprüngliche Nachricht- Von: Andy Seaborne Gesendet: Freitag, 11. Februar 2022 22:30 An: users@jena.apache.org Betreff: Re: AW: xloader "Can't find gzip program" Works for me - make sure it is the latest dev build (the one down the bottom) I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09) and loaded a few millions triples with no problems. rm -rf DB2 apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2 ~/Datasets/BSBM/bsbm-5m.nt.gz Andy On 11/02/2022 21:20, Neubert, Joachim wrote: Hi Andy, Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster - however, the same error at SPO start. Please let me know if I can help with tracing/reproducing the error. Cheers, Joachim -Ursprüngliche Nachricht- Von: Andy Seaborne Gesendet: Freitag, 11. Februar 2022 21:07 An: users@jena.apache.org Betreff: Re: xloader "Can't find gzip program" Hi Joachim, https://issues.apache.org/jira/browse/JENA-2277 https://issues.apache.org/jira/browse/JENA-2279 There are two fixes for tdb2.xloader which are now in the development builds: https://repository.apache.org/content/groups/snapshots/org/apache/jen a/ (these are not official releases and have not been voted on by the PMC) If you coudl test them and let us know if they work or whether theer are further problems, that would be great. Andy On 11/02/2022 17:53, Neubert, Joachim wrote: I've just started tests with xloader. It aborts with 17:21:56 INFO Data:: Triples = 10,000,000 ; Quads = 0 17:21:57 INFO =-=-=-=-=-=-=-= 17:21:57 INFO 17:21:57 INFO Build SPO 17:21:57 INFO (Very long pause likely at this point) 17:21:58 INFO Index :: Build index SPO java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't find gzip program at org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBui ldIn dexX.java:207) at org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildInde xX.ja va:121) at org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.ja va:1 06) at org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.jav a:94 ) at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80) at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92) at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58) at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45) at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28) Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program at org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java :67 ) at org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBui ldIn dexX.java:183) ... 8 more Of course, /usr/bin/gzip is in the path. My configuration is below, tdb2.xloader was called with --threads=12. Any idea what could be wrong? Cheers, Joachim Configuration: openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime
AW: AW: xloader "Can't find gzip program"
Strange - I should have the same version: sudo tar xzvf /usr/local/src/apache-jena-fuseki-4.5.0-20220209.180144-12.tar.gz but the jarfile date is of Jan 30: ll apache-jena-fuseki-4.5.0-SNAPSHOT/ total 35868 -rw-r--r-- 1 root root36975 Jan 30 15:02 LICENSE -rw-r--r-- 1 root root 8914 Jan 30 15:02 NOTICE -rw-r--r-- 1 root root 1151 Jan 30 15:02 README drwxr-xr-x 2 root root 179 Feb 11 20:47 bin -rwxr-xr-x 1 root root12339 Jan 30 15:02 fuseki -rwxr-xr-x 1 root root 1241 Jan 30 15:02 fuseki-backup -rwxr-xr-x 1 root root 3370 Jan 30 15:02 fuseki-server -rw-r--r-- 1 root root 1264 Jan 30 15:02 fuseki-server.bat -rw-r--r-- 1 root root 36631864 Jan 30 15:02 fuseki-server.jar -rw-r--r-- 1 root root 2217 Jan 30 15:02 fuseki.service -rw-r--r-- 1 root root 2124 Jan 30 15:02 log4j2.properties drwxr-xr-x 4 root root 121 Jan 30 15:02 webapp Cheers, Joachim > -Ursprüngliche Nachricht- > Von: Andy Seaborne > Gesendet: Freitag, 11. Februar 2022 22:30 > An: users@jena.apache.org > Betreff: Re: AW: xloader "Can't find gzip program" > > Works for me - make sure it is the latest dev build (the one down the > bottom) > > I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09) > > and loaded a few millions triples with no problems. > > rm -rf DB2 > apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2 > ~/Datasets/BSBM/bsbm-5m.nt.gz > > Andy > > On 11/02/2022 21:20, Neubert, Joachim wrote: > > Hi Andy, > > > > Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster - > however, the same error at SPO start. > > > > Please let me know if I can help with tracing/reproducing the error. > > > > Cheers, Joachim > > > >> -Ursprüngliche Nachricht- > >> Von: Andy Seaborne > >> Gesendet: Freitag, 11. Februar 2022 21:07 > >> An: users@jena.apache.org > >> Betreff: Re: xloader "Can't find gzip program" > >> > >> Hi Joachim, > >> > >> https://issues.apache.org/jira/browse/JENA-2277 > >> https://issues.apache.org/jira/browse/JENA-2279 > >> > >> There are two fixes for tdb2.xloader which are now in the development > >> builds: > >> > >> https://repository.apache.org/content/groups/snapshots/org/apache/jen > >> a/ > >> > >> (these are not official releases and have not been voted on by the > >> PMC) > >> > >> If you coudl test them and let us know if they work or whether theer > >> are further problems, that would be great. > >> > >> Andy > >> > >> > >> On 11/02/2022 17:53, Neubert, Joachim wrote: > >>> I've just started tests with xloader. It aborts with > >>> > >>> 17:21:56 INFO Data:: Triples = 10,000,000 ; Quads = 0 > >>> 17:21:57 INFO =-=-=-=-=-=-=-= > >>> 17:21:57 INFO > >>> 17:21:57 INFO Build SPO > >>> 17:21:57 INFO (Very long pause likely at this point) > >>> 17:21:58 INFO Index :: Build index SPO > >>> java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't > >>> find > >> gzip program > >>> at > >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBui > >> ldIn > >> dexX.java:207) > >>> at > >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildInde > >> xX.ja > >> va:121) > >>> at > >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.ja > >> va:1 > >> 06) > >>> at > >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.jav > >> a:94 > >> ) > >>> at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80) > >>> at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92) > >>> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58) > >>> at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45) > >>> at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28) > >>> Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program > >>> at > >> org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java > >> :67 > >> ) > >>> at > >> org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBui > >> ldIn > >> dexX.java:183) > >>> ... 8 more > >>> > >>> Of course, /usr/bin/gzip is in the path. My configuration is below, > >
Re: AW: xloader "Can't find gzip program"
Works for me - make sure it is the latest dev build (the one down the bottom) I just grabbed apache-jena-4.5.0-20220209.180144-12.zip (2022-02-09) and loaded a few millions triples with no problems. rm -rf DB2 apache-jena-4.5.0-SNAPSHOT/bin/tdb2.xloader --loc DB2 ~/Datasets/BSBM/bsbm-5m.nt.gz Andy On 11/02/2022 21:20, Neubert, Joachim wrote: Hi Andy, Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster - however, the same error at SPO start. Please let me know if I can help with tracing/reproducing the error. Cheers, Joachim -Ursprüngliche Nachricht- Von: Andy Seaborne Gesendet: Freitag, 11. Februar 2022 21:07 An: users@jena.apache.org Betreff: Re: xloader "Can't find gzip program" Hi Joachim, https://issues.apache.org/jira/browse/JENA-2277 https://issues.apache.org/jira/browse/JENA-2279 There are two fixes for tdb2.xloader which are now in the development builds: https://repository.apache.org/content/groups/snapshots/org/apache/jena/ (these are not official releases and have not been voted on by the PMC) If you coudl test them and let us know if they work or whether theer are further problems, that would be great. Andy On 11/02/2022 17:53, Neubert, Joachim wrote: I've just started tests with xloader. It aborts with 17:21:56 INFO Data:: Triples = 10,000,000 ; Quads = 0 17:21:57 INFO =-=-=-=-=-=-=-= 17:21:57 INFO 17:21:57 INFO Build SPO 17:21:57 INFO (Very long pause likely at this point) 17:21:58 INFO Index :: Build index SPO java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't find gzip program at org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIn dexX.java:207) at org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIndexX.ja va:121) at org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.java:1 06) at org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.java:94 ) at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80) at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92) at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58) at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45) at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28) Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program at org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java:67 ) at org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIn dexX.java:183) ... 8 more Of course, /usr/bin/gzip is in the path. My configuration is below, tdb2.xloader was called with --threads=12. Any idea what could be wrong? Cheers, Joachim Configuration: openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime Environment 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build 11.0.13+8-LTS, mixed mode, sharing) JAVA_OPTS: -d64 -Xmx12G Loader: tdb2.xloader Jena: VERSION: 4.4.0 Jena: BUILD_DATE: 2022-01-30T15:09:41Z ARQ:VERSION: 4.4.0 ARQ:BUILD_DATE: 2022-01-30T15:09:41Z TDB:VERSION: 4.4.0 TDB:BUILD_DATE: 2022-01-30T15:09:41Z Use fuseki tdb2.xloader on file /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz 17:20:13 INFO Setup: 17:20:13 INFODatabase: /zbw/var/lib/fuseki/databases/temp 17:20:13 INFOData: /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz 17:20:13 INFOTMPDIR: /zbw/var/lib/fuseki/databases/temp 17:20:13 INFO 17:20:13 INFO Load node table -- Joachim Neubert ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg 21 20354 Hamburg Phone +49-40-42834-462
AW: xloader "Can't find gzip program"
Hi Andy, Thanks! The code of 4.5.0-SNAPSHOT seems to run significantly faster - however, the same error at SPO start. Please let me know if I can help with tracing/reproducing the error. Cheers, Joachim > -Ursprüngliche Nachricht- > Von: Andy Seaborne > Gesendet: Freitag, 11. Februar 2022 21:07 > An: users@jena.apache.org > Betreff: Re: xloader "Can't find gzip program" > > Hi Joachim, > > https://issues.apache.org/jira/browse/JENA-2277 > https://issues.apache.org/jira/browse/JENA-2279 > > There are two fixes for tdb2.xloader which are now in the development > builds: > > https://repository.apache.org/content/groups/snapshots/org/apache/jena/ > > (these are not official releases and have not been voted on by the PMC) > > If you coudl test them and let us know if they work or whether theer are > further problems, that would be great. > > Andy > > > On 11/02/2022 17:53, Neubert, Joachim wrote: > > I've just started tests with xloader. It aborts with > > > > 17:21:56 INFO Data:: Triples = 10,000,000 ; Quads = 0 > > 17:21:57 INFO =-=-=-=-=-=-=-= > > 17:21:57 INFO > > 17:21:57 INFO Build SPO > > 17:21:57 INFO (Very long pause likely at this point) > > 17:21:58 INFO Index :: Build index SPO > > java.lang.RuntimeException: org.apache.jena.tdb2.TDBException: Can't find > gzip program > >at > org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIn > dexX.java:207) > >at > org.apache.jena.tdb2.xloader.ProcBuildIndexX.buildIndex(ProcBuildIndexX.ja > va:121) > >at > org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec2(ProcBuildIndexX.java:1 > 06) > >at > org.apache.jena.tdb2.xloader.ProcBuildIndexX.exec(ProcBuildIndexX.java:94 > ) > >at tdb2.xloader.CmdxBuildIndex.exec(CmdxBuildIndex.java:80) > >at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92) > >at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58) > >at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45) > >at tdb2.xloader.CmdxBuildIndex.main(CmdxBuildIndex.java:28) > > Caused by: org.apache.jena.tdb2.TDBException: Can't find gzip program > >at > org.apache.jena.tdb2.xloader.BulkLoaderX.gzipProgram(BulkLoaderX.java:67 > ) > >at > org.apache.jena.tdb2.xloader.ProcBuildIndexX.sort_build_index(ProcBuildIn > dexX.java:183) > >... 8 more > > > > Of course, /usr/bin/gzip is in the path. My configuration is below, > tdb2.xloader was called with --threads=12. > > > > Any idea what could be wrong? > > > > Cheers, Joachim > > > > > > Configuration: > > openjdk version "11.0.13" 2021-10-19 LTS OpenJDK Runtime Environment > > 18.9 (build 11.0.13+8-LTS) OpenJDK 64-Bit Server VM 18.9 (build > > 11.0.13+8-LTS, mixed mode, sharing) > > JAVA_OPTS: -d64 -Xmx12G > > Loader: tdb2.xloader > > Jena: VERSION: 4.4.0 > > Jena: BUILD_DATE: 2022-01-30T15:09:41Z > > ARQ:VERSION: 4.4.0 > > ARQ:BUILD_DATE: 2022-01-30T15:09:41Z > > TDB:VERSION: 4.4.0 > > TDB:BUILD_DATE: 2022-01-30T15:09:41Z > > > > Use fuseki tdb2.xloader on file > > /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz > > 17:20:13 INFO Setup: > > 17:20:13 INFODatabase: /zbw/var/lib/fuseki/databases/temp > > 17:20:13 INFOData: /zbw/var/wikidata/2022-02-03/rdf/test.nt.gz > > 17:20:13 INFOTMPDIR: /zbw/var/lib/fuseki/databases/temp > > 17:20:13 INFO > > 17:20:13 INFO Load node table > > > > > > -- > > Joachim Neubert > > > > ZBW - Leibniz Information Centre for Economics Neuer Jungfernstieg 21 > > 20354 Hamburg > > Phone +49-40-42834-462 > > > >