On Sun, Aug 28, 2022 at 11:00 AM Lorenz Buehmann <buehm...@informatik.uni-leipzig.de> wrote: > > Hi Andy, > > thanks for fast response. > > I see - the only drawback with wrapping the streams into TriG is when we > have Turtle syntax files (or lets say any non N-Triples format) - afaik, > prefixes aren't allowed inside graphs, i.e. at that point you're lost. > What I did now is to pipe those files into riot first which then > generates N-Triples which then can be wrapped in TriG graphs. Indeed, we > have the riot overhead here, i.e. the data is parsed twice. Still faster > though then loading graphs in separate TDB loader calls, so I guess I > can live with this.
I had a similar question a few years ago, and Claus responded: https://stackoverflow.com/questions/63467067/converting-rdf-triples-to-quads-from-command-line/63716278 > > Having a follow up question: > > I could see a huge difference between read compressed (Bzip) vs > uncompressed file: > > I put the output until the triples have been loaded here as the index > creating should be affected by the compression: > > > # uncompressed with tdb2.tdbloader > > 14:24:40 INFO loader :: Add: 163,000,000 > river_planet-latest.osm.pbf.ttl (Batch: 144,320 / Avg: 140,230) > 14:24:42 INFO loader :: Finished: > output/river_planet-latest.osm.pbf.ttl: 163,310,838 tuples in 1165.30s > (Avg: 140,145) > > > # compressed with tdb2.tdbloader > > 17:37:37 INFO loader :: Add: 163,000,000 > river_planet-latest.osm.pbf.ttl.bz2 (Batch: 19,424 / Avg: 16,050) > 17:37:40 INFO loader :: Finished: > output/river_planet-latest.osm.pbf.ttl.bz2: 163,310,838 tuples in > 10158.16s (Avg: 16,076) > > > So loading the compressed file is ~9x slower then the compressed one. > Can we consider this as expected? Note, here we have a geospatial > dataset with millions of geometry literals. Not sure if this is also > something that makes things worse. > > What are your experiences with loading compressed vs uncompressed data? > > > Cheers, > > Lorenz > > > On 26.08.22 17:02, Andy Seaborne wrote: > > Hi Lorenz, > > > > No - there isn't an option. > > > > The way to do it is to prepare the load as quads by, for example, > > wrapping in TriG syntax around the files or adding the G to N-triples. > > > > This can be done streaming and piped into the loader (with --syntax= > > if not N-quads). > > > > > By the way, the tdb2.xloader has no option for named graphs at all? > > > > The input needs to be prepared as quads. > > > > Andy > > > > On 26/08/2022 15:03, Lorenz Buehmann wrote: > >> Hi all, > >> > >> is there any option to use TDB2 bulk loader (tdb2.xloader or just > >> tdb2.loader) to load multiple files into multiple different named > >> graphs? Like > >> > >> tdb2.loader --loc ./tdb2/dataset --graph <g1> file1 --graph <g2> > >> file2 ... > >> > >> I'm asking because I thought the initial loading is way faster then > >> iterating over multiple (graph, file) pairs and running the TDB2 > >> loader for each pair? > >> > >> > >> By the way, the tdb2.xloader has no option for named graphs at all? > >> > >> > >> Cheers, > >> > >> Lorenz > >>