Re: riot not triggering ERROR on bad IRI

2017-04-18 Thread Laura Morales
> Did you use the --strict flag? Thank you, this seems to work.

Re: riot not triggering ERROR on bad IRI

2017-04-18 Thread Andy Seaborne
On 18/04/17 18:53, A. Soroka wrote: Did you use the --strict flag? Won't help. It's a issue specific to RDF/XML. Strictly, spaces are legal. RDF/XML did not get revised at RDF 1.1 so RDF 1.0 rules apply. It emits "RDF URI References" which were a guess at where IRIs were going, but

Re: riot not triggering ERROR on bad IRI

2017-04-18 Thread A. Soroka
Did you use the --strict flag? --- A. Soroka The University of Virginia Library > On Apr 18, 2017, at 8:09 AM, Laura Morales wrote: > > This is the RDF/XML: > https://svn.apache.org/repos/asf/hivemind/hivemind2/trunk/doap_Hivemind.rdf > > The command `riot --quiet

Re: construct with jena jdbc driver

2017-04-18 Thread Rob Vesse
You can set the compatibility level on the connection which will try to sniff the results and set an appropriate column type, however if the results are very mixed the sniffing can/will be inaccurate. http://jena.apache.org/documentation/jdbc/drivers.html#jdbc-compatibility-level You can also

Re: tdbloader skip bad file

2017-04-18 Thread A. Soroka
One of the several advantages of N-Triples (and this is not an accident) is how easy it is to use standard Posix tools with it, e.g. cut, sed, grep, etc. --- A. Soroka The University of Virginia Library > On Apr 18, 2017, at 11:46 AM, Laura Morales wrote: > >> In the

construct with jena jdbc driver

2017-04-18 Thread Claude Warren
Quick question: I have a construct query that returns various types for the object. example: CONSTRUCT { ?p ?o . } WHERE { ?p ?o } Is there a method in the JDBC driver that will allow me to determine what that type is? Parsing string -vs- URI is rather difficult. :( Thx,

Re: tdbloader skip bad file

2017-04-18 Thread Laura Morales
> In the meantime, you can use something like sed for this, something like: sed > -e "s|\(.*\)|\1 |" ah, right! This is a good suggestion. This seems to work: sed "s/\(.*\) \.$/\1 ./" (all triples have a period at the end). I think I'll use this until RIOT has a --graph option that would be

Re: tdbloader skip bad file

2017-04-18 Thread A. Soroka
In the meantime, you can use something like sed for this, something like: sed -e "s|\(.*\)|\1 |" --- A. Soroka The University of Virginia Library > On Apr 18, 2017, at 10:28 AM, Laura Morales wrote: > >> Convert to something cheaper (preferably stream-able, like N-triples,

Re: tdbloader skip bad file

2017-04-18 Thread A. Soroka
You can file a ticket for that functionality at the Jena JIRA instance: https://issues.apache.org/jira/browse/JENA --- A. Soroka The University of Virginia Library > On Apr 18, 2017, at 10:28 AM, Laura Morales wrote: > >> Convert to something cheaper (preferably

Re: tdbloader skip bad file

2017-04-18 Thread Laura Morales
> Convert to something cheaper (preferably stream-able, like N-triples, as Andy > says) as early as possible. It would be very handy if riot had an "--graph=..." option as well, such that I could immediately output all XML files into n-quads with a graph label (and `cat` all of them into a

tdbloader load all files together vs one at a time

2017-04-18 Thread Laura Morales
I have a folder with about 250 small-size RDF/XML files. It seems to make a huge difference whether I load all files with a single call to tdbloader like this "tdbloader --graph=... --loc=./db files/*" versus calling tdbloader on each single file. This is my database folder in the first case

Re: tdbloader skip bad file

2017-04-18 Thread A. Soroka
If you don't have a specific reason to use RDF/XML inside your workflow, you almost certainly shouldn't. It's one of the most expensive RDF serializations to process. Convert to something cheaper (preferably stream-able, like N-triples, as Andy says) as early as possible. As for the costs of

Participating in a research survey on graph data, processing, and technologies

2017-04-18 Thread Siddhartha Sahu
Hi, My name is Siddhartha Sahu and I am a Master's student at University of Waterloo working on graph processing under Prof. Semih Salihoglu. As part of my research, I am running a survey about the graph data sets, computations, and software used by the companies in the industry and research labs

riot not triggering ERROR on bad IRI

2017-04-18 Thread Laura Morales
This is the RDF/XML: https://svn.apache.org/repos/asf/hivemind/hivemind2/trunk/doap_Hivemind.rdf The command `riot --quiet --output=nt xxx.rdf > xxx.nt` creates the .nt file with the following 2 invalid triples (objects' IRI have a space). No ERRORs risen. _:B4f7ecd79X3A15b80f07bfbX3AX2D7ffe

Re: tdbloader skip bad file

2017-04-18 Thread Andy Seaborne
On 18/04/17 10:19, Laura Morales wrote: riot sets the Unix return code to 0 on success and 1 on failure in the usual Unix fashion. So build up a list of valid files by looping on the input files then load all the valid ones in one go with tdbloader. Thank you. Unfortunately however,

Re: tdbloader skip bad file

2017-04-18 Thread Laura Morales
> riot sets the Unix return code to 0 on success and 1 on failure in the usual Unix fashion. > > So build up a list of valid files by looping on the input files then load all the valid ones in one go with tdbloader. Thank you. Unfortunately however, running "riot --validate" on each file doesn't

Re: A processing instruction is in RDF content. No processing was done.

2017-04-18 Thread Conal Tuohy
An XML processing instruction provides a hint to processing software on how to process an XML file. A processing instruction whose name is "xml-stylesheet" and which includes a type "pseudo-attribute" with value "text/xsl" (as in your example) is used to associate an XSLT stylesheet with the XML,

A processing instruction is in RDF content. No processing was done.

2017-04-18 Thread Laura Morales
What does this warning mean when I execute riot on a .rdf file? $ riot --validate file.rdf WARN  riot :: [line: 2, col: 35] {W119} A processing instruction is in RDF content. No processing was done. Here's the head of the RDF/XML file

Re: tdbloader skip bad file

2017-04-18 Thread Andy Seaborne
On 17/04/17 22:56, Laura Morales wrote: Check the data before loading. This is generally good practice. Call "riot --validate" before loading to check each file. Let's say I've downloaded these RDF files [1]. Some of those files are broken. How can I check-and-load all those files with

Re: Very slow tdbloader2 insertion

2017-04-18 Thread Andy Seaborne
On 17/04/17 23:07, Laura Morales wrote: tdbloader2 builds b+trees from bottom to top, given sorted input. As such blocks are streamed to disk which is disk-efficient. It is a series of java programs scripted together by a shell script. tdbloader is pure java. It builds the b+trees by