Re: encoding problem iterating result set using a fuseki endpoint

Rob Vesse Wed, 19 Feb 2014 04:27:40 -0800

Why are you using such an outdated version of ARQ?

2.8.7 is from December 2010 so it is 3 years out of date, please upgrade
to the latest version which is 2.11.1 and let us know if you still have
problems


Rob

On 19/02/2014 11:46, "Paton, Diego" <[email protected]>
wrote:

>
>re-sending just to receive all replies that are sent to the list ( I
>wasn't subscribed to the list )
>
>
>On 19 Feb 2014, at 11:26, Diego Paton
><[email protected]<mailto:diego.paton.villahermosa@team
>aol.com>> wrote:
>
>Hi,
>
>
>In my department we have stored one of the lasts dumps of the Freebase
>ontology using TDB and Fuseki.
>
>TDB: 2.11
>Fuseki: 1.0
>
>We have a dataset defined ( /freebase/data ) and we execute SPARQL
>queries using Fuseki server and It works fine.
>
>
>  *   We execute a query, for example:
>
>prefix fb: <http://rdf.freebase.com/ns/>
>select ?mid ?e ?nf ?desc
>where {
>      ?mid fb:type.object.name ?e .
>     ?mid fb:common.topic.notable_for ?notab_for .
>     ?mid fb:common.topic.description ?desc .
>      ?notab_for fb:common.notable_for.display_name ?nf .
>     FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en")
>&& langMatches(lang(?desc), "en"))
>}
>We are able to output results as a TSV file that contains 6158452  lines
>without problems.
>
>But when we try to execute the same query from a Java application using
>ARQ ( 2.8.7 ), we have problems.
>
>
>  *   This is the code that I execute :
>
>
>String ontology_service
>="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";;
>String query =
>"prefix fb: <http://rdf.freebase.com/ns/>\n"+
>" select ?mid ?e ?nf ?desc\n"+
>" where {\n"+
>"     ?mid fb:type.object.name ?e .\n"+
>"     ?mid fb:common.topic.notable_for ?notab_for .\n"+
>"     ?mid fb:common.topic.description ?desc .\n"+
>"     ?notab_for fb:common.notable_for.display_name ?nf .\n"+
>"     FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf),
>\"en\") && langMatches(lang(?desc), \"en\"))\n"+
>" }\n";
>
>QueryExecution queryExecution =
>QueryExecutionFactory.sparqlService(ontology_service, query);
>
>ResultSet resultSet = queryExecution.execSelect();
>
>ResultSetFormatter.outputAsTSV(System.out, resultSet);
>
>while(resultSet.hasNext()){
>
>QuerySolution querySolution = resultSet.next();
>
>querySolution.get("mid");
>querySolution.get("e");
>querySolution.get("nf");
>querySolution.get("desc");
>
>}
>
>  *   After processing around 400k of results, we have this exception:
>
>             com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
>character ((CTRL-CHAR, code 2))
>  at [row,col {unknown-source}]: [5859511,312]
>at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>at 
>com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
>:4668)
>at 
>com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
>:4126)
>at 
>com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>at 
>com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
>649)
>at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>at 
>com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
>9)
>at 
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
>(XMLInputStAX.java:460)
>at 
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:216)
>at 
>com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
>anager.java:42)
>Exception in thread "main"
>com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException:
>Illegal character ((CTRL-CHAR, code 2))
>   at [row,col {unknown-source}]: [5859511,312]
>at 
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLI
>nputStAX.java:510)
>at 
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:220)
>at 
>com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
>anager.java:42)
>Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
>character ((CTRL-CHAR, code 2))
> at [row,col {unknown-source}]: [5859511,312]
>at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
>at 
>com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
>:4668)
>at 
>com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
>:4126)
>at 
>com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
>at 
>com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
>649)
>at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>at 
>com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
>9)
>at 
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
>(XMLInputStAX.java:460)
>at 
>com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
>utStAX.java:216)
>... 1 more
>
>
>  *   I have tried with showing the results using a ResultSetFormatter
>but I obtain the same exception:
>
> ResultSetFormatter.outputAsTSV(System.out, resultSet);
>
>
>I would say that the ontology is well formed with the correct encoding in
>TBD, because through Fuseki we are able to obtain correct results.
>
>I have read documentation related to this, but I can't find a way to set
>the correct encoding if it is required. Or It is possible that I am not
>using the correct way to execute it.
>
>I would be grateful if you could help me.
>
>Thanks in advance.
>
>
>Diego Paton.
>

Re: encoding problem iterating result set using a fuseki endpoint

Reply via email to