Re: encoding problem iterating result set using a fuseki endpoint

Paton, Diego Wed, 19 Feb 2014 03:47:28 -0800

re-sending just to receive all replies that are sent to the list ( I wasn't 
subscribed to the list )



On 19 Feb 2014, at 11:26, Diego Paton 
<[email protected]<mailto:[email protected]>>
 wrote:

Hi,


In my department we have stored one of the lasts dumps of the Freebase ontology 
using TDB and Fuseki.

TDB: 2.11
Fuseki: 1.0

We have a dataset defined ( /freebase/data ) and we execute SPARQL queries 
using Fuseki server and It works fine.


  *   We execute a query, for example:

prefix fb: <http://rdf.freebase.com/ns/>
select ?mid ?e ?nf ?desc
where {
      ?mid fb:type.object.name ?e .
     ?mid fb:common.topic.notable_for ?notab_for .
     ?mid fb:common.topic.description ?desc .
      ?notab_for fb:common.notable_for.display_name ?nf .
     FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en") && 
langMatches(lang(?desc), "en"))
}
We are able to output results as a TSV file that contains 6158452  lines 
without problems.

But when we try to execute the same query from a Java application using ARQ ( 
2.8.7 ), we have problems.


  *   This is the code that I execute :


String ontology_service 
="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";;
String query =
"prefix fb: <http://rdf.freebase.com/ns/>\n"+
" select ?mid ?e ?nf ?desc\n"+
" where {\n"+
"     ?mid fb:type.object.name ?e .\n"+
"     ?mid fb:common.topic.notable_for ?notab_for .\n"+
"     ?mid fb:common.topic.description ?desc .\n"+
"     ?notab_for fb:common.notable_for.display_name ?nf .\n"+
"     FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf), \"en\") 
&& langMatches(lang(?desc), \"en\"))\n"+
" }\n";

QueryExecution queryExecution = 
QueryExecutionFactory.sparqlService(ontology_service, query);

ResultSet resultSet = queryExecution.execSelect();

ResultSetFormatter.outputAsTSV(System.out, resultSet);

while(resultSet.hasNext()){

QuerySolution querySolution = resultSet.next();

querySolution.get("mid");
querySolution.get("e");
querySolution.get("nf");
querySolution.get("desc");

}

  *   After processing around 400k of results, we have this exception:

             com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character 
((CTRL-CHAR, code 2))
  at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at 
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
at 
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at 
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:679)
at 
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution(XMLInputStAX.java:460)
at 
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:216)
at 
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAManager.java:42)
Exception in thread "main" com.hp.hpl.jena.sparql.resultset.ResultSetException: 
XMLStreamException: Illegal character ((CTRL-CHAR, code 2))
   at [row,col {unknown-source}]: [5859511,312]
at 
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLInputStAX.java:510)
at 
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:220)
at 
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAManager.java:42)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character 
((CTRL-CHAR, code 2))
 at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at 
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668)
at 
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at 
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:679)
at 
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution(XMLInputStAX.java:460)
at 
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:216)
... 1 more


  *   I have tried with showing the results using a ResultSetFormatter but I 
obtain the same exception:

 ResultSetFormatter.outputAsTSV(System.out, resultSet);


I would say that the ontology is well formed with the correct encoding in TBD, 
because through Fuseki we are able to obtain correct results.

I have read documentation related to this, but I can't find a way to set the 
correct encoding if it is required. Or It is possible that I am not using the 
correct way to execute it.

I would be grateful if you could help me.

Thanks in advance.


Diego Paton.

Re: encoding problem iterating result set using a fuseki endpoint

Reply via email to