Hi Andy,
Many thanks for these ideas, I'm going to try the curl & riot solutions.
> Modify the s-get script to handle --output and set the "Accept:"
header then please submit a pull request for the changes
I had made an attempt to modify the s-get script in the same way as for
s-query but it didn't work : if I have a moment I'll try to understand
how the options are handled.
Le 28/01/2019 à 14:19, Andy Seaborne a écrit :
On 28/01/2019 11:04, Vincent Ventresque wrote:
Hello,
I want to export a named graph which is stored in a TDB dataset, and
I want to store the output in several files (for the named graph
contains +/- 9.5 M triples).
My idea is to use "split" command in order to cut the output of the
export into pieces. However, this solution with "split" requires
ntriples or nquads (one triple per line, so that the files are not
cut in the middle of an assertion ; besides, it's also more practical
to have a triple per line if I want to transform the data with perl
or sed).
I found a solution with s-query but had to edit the ruby s-query
script to get ntriples (see below).
There are other possible solutions for an export via command-line
utilities : "s-get" and "tdbdump". If I understand well, "tdbdump"
gives nquads as output, but one can't export only a part of the data,
everything is exported at once. The "s-get" solution allows to select
a named graph in the dataset, but I couldn't change the output format.
Are there better solutions to get an export in several files?
Ways I can think of:
1/ Modify the s-get script to handle --output and set the "Accept:"
header then please submit a pull request for the changes.
2/ Use curl
curl --header 'Accept: application/n-triples' \
'http://localhost:3030/ds?graph=http://bnf_titres'
3/ Parse the s-get output:
s-get ... | riot --syntax TTL
Andy
Thanks in advance,
VV.
~~~~~~~~~~~ 1) SOLUTION WITH s-query ~~~~~~~~~~~~~~~~~~~~~
1.1) Edit s-query ruby script (add nt)
-- l. 572 : when "json","xml","text","csv","tsv","nt"
-- l. 574 : when :json,:xml,:text,:csv,:tsv,:nt
-- l. 515 : opts.on('--output=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
-- l. 519 : opts.on('--accept=TYPE', [:json,:xml,:text,:csv,:tsv,:nt],
1.2) Command
/my/path/to/fuseki/bin/s-query
--service=http://localhost:3030/BnF_text_v2/ "construct { ?s ?p ?o }
where { graph <http://bnf_titres> { ?s ?p ?o }}" --output=nt | split
-l 500000 - --additional-suffix=.nt BnfTextTitres-
~~~~~~~~~~~ 2) SOLUTION WITH tdbdump (nquads but no named graph)
~~~~~~~~~~~~~~~~~~~~~
/my/path/to/jena/bin/tdbdump
--loc=/my/path/to/fuseki/run/databases/BnF_text_v2
--graph=http://bnf_titres | split -l 500000 - --additional-suffix=.nt
BnfTextTitres-
=> Unknown argument: graph
~~~~~~~~~~~ 3) SOLUTION WITH s-get (named graph ok, but turtle
output) ~~~~~~~~~~~~~~~~~~~~~
/my/path/to/fuseki/bin/s-get http://localhost:3030/BnF_text_v2/data
http://bnf_titres --output=text | split -l 500000 -
--additional-suffix=.nt BnfTextTitres-
=> /my/path/to/fuseki/bin/s-get:364:in `cmd_soh': invalid option:
--output=text (OptionParser::InvalidOption)
from /my/path/to/fuseki/bin/fuseki/bin/s-get:715:in `<main>'