Hi Osma,
Thanks for your report. I've added JENA-357 to track this issue. Please feel free to add comments.

https://issues.apache.org/jira/browse/JENA-357

Ian


On 27/11/12 12:00, Osma Suominen wrote:
I'm using the rdfcat utility (via *nix shell script) in Jena to convert
and merge RDF files. In some cases, the input files I use trigger
warnings within Jena, for example due to Unicode encoding issues or bad
language tags. I previously used Jena 2.6.3 and it was easy to redirect
these warnings from stderr to a separate file or /dev/null, like this:

bin/rdfcat -out TTL locations.rdf >locations.ttl 2>warnings.txt

Here I'm converting the NYT Locations [1] file from RDF/XML into Turtle.
This works fine: I get a proper Turtle file and 12965 warnings in the
warnings.txt file apparently due to bad language tags.

However, after I upgraded to Jena 2.7.4, the warnings no longer go into
stderr but to stdout. So the warnings end up at the top of the
locations.ttl file, and it cannot be parsed as Turtle anymore.

I figured out that this is the way the stock jena-log4j.properties file
configures logging output. If I change the file directly and uncomment
this line:
## log4j.appender.stdlog.target=System.err

...I can restore the former behavior, i.e. get warnings back to stderr.

Then I tried defining a custom log4j properties file, because I don't
want to edit files in the Jena distribution every time I upgrade Jena. I
tried setting JVM_ARGS="-Dlog4j.configuration=mylog4j.properties".
But that didn't work, because the bin/rdfcat script overrides the log4j
configuration setting by defining and using its own $LOGGING variable
which cannot itself be overridden.

My questions:

1. Would it not make sense to print warnings on stderr by default
(instead of stdout) at least for rdfcat, if not for the whole Jena?

2. If not, could it at least be possible to specify my own log4j
configuration for rdfcat without editing either the log4j properties
file or the rdfcat invocation script in the Jena distribution? For
example, the variable LOGGING could be defined like this:
LOGGING=${LOGGING:-Dlog4j.configuration=file:$JENA_HOME/jena-log4j.properties}


so it could be overridden in the environment, just like JVM_ARGS.

-Osma

PS. I noticed an undocumented "feature" in rdfcat: if you say "-out ttl"
as suggested by the documentation [2] you actually get N3 output, where
e.g. owl:sameAs is expressed with the shorthand notation "=", which is
not valid Turtle. But if you say "-out TTL" in upper case, you get real
Turtle output without the owl:sameAs shorthand.


[1] http://data.nytimes.com/locations.rdf

[2] http://jena.apache.org/documentation/javadoc/jena/jena/rdfcat.html



--
____________________________________________________________
Ian Dickinson                   Epimorphics Ltd, Bristol, UK
mailto:[email protected]        http://www.epimorphics.com
cell: +44-7786-850536              landline: +44-1275-399069
------------------------------------------------------------
Epimorphics Ltd.  is a limited company registered in England
(no. 7016688). Registered address: Court Lodge, 105 High St,
              Portishead, Bristol BS20 6PT, UK

Reply via email to