re-sending just to receive all replies that are sent to the list ( I wasn't subscribed to the list )
On 19 Feb 2014, at 11:26, Diego Paton <[email protected]<mailto:[email protected]>> wrote: Hi, In my department we have stored one of the lasts dumps of the Freebase ontology using TDB and Fuseki. TDB: 2.11 Fuseki: 1.0 We have a dataset defined ( /freebase/data ) and we execute SPARQL queries using Fuseki server and It works fine. * We execute a query, for example: prefix fb: <http://rdf.freebase.com/ns/> select ?mid ?e ?nf ?desc where { ?mid fb:type.object.name ?e . ?mid fb:common.topic.notable_for ?notab_for . ?mid fb:common.topic.description ?desc . ?notab_for fb:common.notable_for.display_name ?nf . FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en") && langMatches(lang(?desc), "en")) } We are able to output results as a TSV file that contains 6158452 lines without problems. But when we try to execute the same query from a Java application using ARQ ( 2.8.7 ), we have problems. * This is the code that I execute : String ontology_service ="http://freebase-m01.ihost.aol.com:3030/freebase/data/query"; String query = "prefix fb: <http://rdf.freebase.com/ns/>\n"+ " select ?mid ?e ?nf ?desc\n"+ " where {\n"+ " ?mid fb:type.object.name ?e .\n"+ " ?mid fb:common.topic.notable_for ?notab_for .\n"+ " ?mid fb:common.topic.description ?desc .\n"+ " ?notab_for fb:common.notable_for.display_name ?nf .\n"+ " FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf), \"en\") && langMatches(lang(?desc), \"en\"))\n"+ " }\n"; QueryExecution queryExecution = QueryExecutionFactory.sparqlService(ontology_service, query); ResultSet resultSet = queryExecution.execSelect(); ResultSetFormatter.outputAsTSV(System.out, resultSet); while(resultSet.hasNext()){ QuerySolution querySolution = resultSet.next(); querySolution.get("mid"); querySolution.get("e"); querySolution.get("nf"); querySolution.get("desc"); } * After processing around 400k of results, we have this exception: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 2)) at [row,col {unknown-source}]: [5859511,312] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:679) at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution(XMLInputStAX.java:460) at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:216) at com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAManager.java:42) Exception in thread "main" com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException: Illegal character ((CTRL-CHAR, code 2)) at [row,col {unknown-source}]: [5859511,312] at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLInputStAX.java:510) at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:220) at com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAManager.java:42) Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL-CHAR, code 2)) at [row,col {unknown-source}]: [5859511,312] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4668) at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126) at com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701) at com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:679) at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution(XMLInputStAX.java:460) at com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInputStAX.java:216) ... 1 more * I have tried with showing the results using a ResultSetFormatter but I obtain the same exception: ResultSetFormatter.outputAsTSV(System.out, resultSet); I would say that the ontology is well formed with the correct encoding in TBD, because through Fuseki we are able to obtain correct results. I have read documentation related to this, but I can't find a way to set the correct encoding if it is required. Or It is possible that I am not using the correct way to execute it. I would be grateful if you could help me. Thanks in advance. Diego Paton.
