On 07/07/14 14:49, Guido Zuccarelli wrote:
I did the following script:
for f in ~/workspace/extraidos/*; do
riot $f >> data.nt
doneIt was extremately slow, so I will have to do it using the first way.
It's faster then the loading ... if nothing else, loading does that and
also other work.
What did you mean when you said "the input must be in nt"?.
tdloader can't know the syntax of stdin, is it assumes N-triples
(commonly used for dumps)
A file with the name of all the ttl files written in triples?
for example:
_:uri1 <name> 3D2658086.ttl
_:uri2 <name> 3D1218343208681.ttl
...
That isn't legal TTL or NT : This is illegal --> 3D2658086.ttl
Andy
Thanks again,
Guido
Date: Fri, 4 Jul 2014 21:12:23 +0100
From: [email protected]
To: [email protected]
Subject: Re: Bulk load on several files
On 04/07/14 19:04, Guido Zuccarelli wrote:
Thank you! I think this would be the easier way.
You can go from ttl files to nt that easy?
"riot" will output N-triples/N-Quads.
In fact, it's a good idea to parse your files before loading - it
catches syntax problems (inc warnings) that it's good to know about
before loading.
Andy
Best regards
Date: Fri, 4 Jul 2014 18:48:12 +0100
From: [email protected]
To: [email protected]
Subject: Re: Bulk load on several files
On 04/07/14 18:27, Andy Seaborne wrote:
On 04/07/14 17:20, Guido Zuccarelli wrote:
Hello,
I have a directory with 200,000+ ttl files that I want to
load into a TDB database. The command help only specifies the sintaxis
for one file load.
tdbloader2 --help
==>
Usage: tdbloader2 --loc location datafile ...
"..." indicates as many files as you like.
I tried with the following command:
cat ../listaExtraidos.txt | tdbloader2 --loc
/home/guidoz/workspace/rdfMaven/database
if it's reading from stdin, then the input must be N-quads (N-triples)
where listaExtraidos.txt is a space-separated list of ttl files
obtained by the ls command.
It hits me this exception:
12:35:17 -- TDB Bulk Loader Start
12:35:17 Data phase
File does not exist: -
A minor bug - just now fixed.
Is there any way to do this, or I will need to join the files?
PS
better to put all on the tdbloader2 if you can get 200K files there else ...
Do not join files if they have any blank nodes.
_:a is the same blank node within a file.
If you do a blank node with label, after concatenation, it will be the
same blank node in all files.
for each file:
riotcmd.riot file.ttl >> data.nt
then tdbloader --loc whatever "data.nt" (or tdbloader2)
The parser command "riot" will generate stable identifiers that don't clash.
Andy
Guido.