Re: encoding problem iterating result set using a fuseki endpoint

Paton, Diego Thu, 20 Feb 2014 02:51:03 -0800

Hi,

I isolated the project and now it is running with the new version.


I executed the query through a http request using XML output format to see the 
line that causes the error:

<literal xml:lang="en">Next 2008 by COMPATIBLES2&#x0A;Release Date: 
19-Oct-2008&#x0A;UPC: 859701078081&#x0A;Genre: Electronic (primary), New Age 
(secondary)&#x0A;&#x0A;Electronica; Ambient; Downtempo; Chillout; Electronic; 
IDM; Breakbeat; Indie; Alternative; Music; Soundtrack; Mellow; 
Slow.&#x0A;&#x0AAll music Composed, Performed and Recorded by Compatibles2, 
using the KORG M3 + Radius EXB - Music Workstation.Sampler, between Q2 2007 and 
Q4 2008, Den Hoorn, the Netherlands. No other instruments used, 100% Digital 
production.</literal>


After that, I tried to execute the query using QueryEngineHTTP object instead 
of QueryExecution:

QueryEngineHTTP queryEngine = 
QueryExecutionFactory.createServiceRequest(ontology_service, query);
        queryEngine.setSelectContentType("text/tab-separated-values");
        ResultSet resultSet = queryEngine.execSelect();
        queryExecution.execSelect();



And I can iterate all the resultset without problems if i set the 
SelectContentType with text/tab-separated-values but if not, I get the same 
problem.

This is the value of the QuerySolution that corresponds to the line that 
generates the error.

Next 2008 by COMPATIBLES2
Release Date: 19-Oct-2008
UPC: 859701078081
Genre: Electronic (primary), New Age (secondary)

Electronica; Ambient; Downtempo; Chillout; Electronic; IDM; Breakbeat; Indie; 
Alternative; Music; Soundtrack; Mellow; Slow.

All music Composed, Performed and Recorded by Compatibles2, using the KORG M3 + 
Radius EXB - Music Workstation.Sampler, between Q2 2007 and Q4 2008, Den Hoorn, 
the Netherlands. No other instruments used, 100% Digital production.@en

The content looks fine, so what I don't understand where is the problem when I 
use QueryExecution or QueryEngineHTTP with text/tab-separated-values  selected.

Thanks for your help,

Diego.


On 19 Feb 2014, at 13:56, Rob Vesse 
<[email protected]<mailto:[email protected]>> wrote:

This now looks like an Apache HttpClient version clash on your class path

Jena is using HttpClient 4.2.3 and presumably something in your class path
uses a different version of HttpClient because it's picking up the wrong
classes leading to the NoSuchFieldError

Rob

On 19/02/2014 12:45, "Paton, Diego" 
<[email protected]<mailto:[email protected]>>
wrote:

Hi,

After updating the dependency to 2.11.1, I get this error when executes
this line:

Exception in thread "main" java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET
at
org.apache.http.impl.client.DefaultHttpClient.setDefaultHttpParams(Default
HttpClient.java:175)
at
org.apache.http.impl.client.DefaultHttpClient.createHttpParams(DefaultHttp
Client.java:158)
at
org.apache.http.impl.client.AbstractHttpClient.getParams(AbstractHttpClien
t.java:448)
at
com.hp.hpl.jena.sparql.engine.http.HttpQuery.execGet(HttpQuery.java:308)
at com.hp.hpl.jena.sparql.engine.http.HttpQuery.exec(HttpQuery.java:276)
at
com.hp.hpl.jena.sparql.engine.http.QueryEngineHTTP.execSelect(QueryEngineH
TTP.java:345)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:36)

Executing this line:

ResultSet resultSet = queryExecution.execSelect();

It seems I have to set the charset somewhere.

Thanks,

Diego.


On 19 Feb 2014, at 12:25, Rob Vesse
<[email protected]<mailto:[email protected]><mailto:[email protected]>>
wrote:

Why are you using such an outdated version of ARQ?

2.8.7 is from December 2010 so it is 3 years out of date, please upgrade
to the latest version which is 2.11.1 and let us know if you still have
problems

Rob

On 19/02/2014 11:46, "Paton, Diego"
<[email protected]<mailto:[email protected]><mailto:diego.paton.villahermosa@team
aol.com<http://aol.com>>>
wrote:


re-sending just to receive all replies that are sent to the list ( I
wasn't subscribed to the list )


On 19 Feb 2014, at 11:26, Diego Paton
<[email protected]<mailto:[email protected]><mailto:diego.paton.villahermosa@team
aol.com<http://aol.com>><mailto:diego.paton.villahermosa@team
aol.com<http://aol.com><http://aol.com>>> wrote:

Hi,


In my department we have stored one of the lasts dumps of the Freebase
ontology using TDB and Fuseki.

TDB: 2.11
Fuseki: 1.0

We have a dataset defined ( /freebase/data ) and we execute SPARQL
queries using Fuseki server and It works fine.


*   We execute a query, for example:

prefix fb: <http://rdf.freebase.com/ns/>
select ?mid ?e ?nf ?desc
where {
  ?mid fb:type.object.name ?e .
 ?mid fb:common.topic.notable_for ?notab_for .
 ?mid fb:common.topic.description ?desc .
  ?notab_for fb:common.notable_for.display_name ?nf .
 FILTER (langMatches(lang(?e), "en") && langMatches(lang(?nf), "en")
&& langMatches(lang(?desc), "en"))
}
We are able to output results as a TSV file that contains 6158452  lines
without problems.

But when we try to execute the same query from a Java application using
ARQ ( 2.8.7 ), we have problems.


*   This is the code that I execute :


String ontology_service
="http://freebase-m01.ihost.aol.com:3030/freebase/data/query";;
String query =
"prefix fb: <http://rdf.freebase.com/ns/>\n"+
" select ?mid ?e ?nf ?desc\n"+
" where {\n"+
"     ?mid fb:type.object.name ?e .\n"+
"     ?mid fb:common.topic.notable_for ?notab_for .\n"+
"     ?mid fb:common.topic.description ?desc .\n"+
"     ?notab_for fb:common.notable_for.display_name ?nf .\n"+
"     FILTER (langMatches(lang(?e), \"en\") && langMatches(lang(?nf),
\"en\") && langMatches(lang(?desc), \"en\"))\n"+
" }\n";

QueryExecution queryExecution =
QueryExecutionFactory.sparqlService(ontology_service, query);

ResultSet resultSet = queryExecution.execSelect();

ResultSetFormatter.outputAsTSV(System.out, resultSet);

while(resultSet.hasNext()){

QuerySolution querySolution = resultSet.next();

querySolution.get("mid");
querySolution.get("e");
querySolution.get("nf");
querySolution.get("desc");

}

*   After processing around 400k of results, we have this exception:

         com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
9)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
(XMLInputStAX.java:460)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:216)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:42)
Exception in thread "main"
com.hp.hpl.jena.sparql.resultset.ResultSetException: XMLStreamException:
Illegal character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.staxError(XMLI
nputStAX.java:510)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:220)
at
com.aol.search.swap.stuffer.complete_articles.query.JENAManager.main(JENAM
anager.java:42)
Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal
character ((CTRL-CHAR, code 2))
at [row,col {unknown-source}]: [5859511,312]
at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java
:4668)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java
:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3
649)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at
com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:67
9)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.getOneSolution
(XMLInputStAX.java:460)
at
com.hp.hpl.jena.sparql.resultset.XMLInputStAX$ResultSetStAX.hasNext(XMLInp
utStAX.java:216)
... 1 more


*   I have tried with showing the results using a ResultSetFormatter
but I obtain the same exception:

ResultSetFormatter.outputAsTSV(System.out, resultSet);


I would say that the ontology is well formed with the correct encoding in
TBD, because through Fuseki we are able to obtain correct results.

I have read documentation related to this, but I can't find a way to set
the correct encoding if it is required. Or It is possible that I am not
using the correct way to execute it.

I would be grateful if you could help me.

Thanks in advance.


Diego Paton.

Re: encoding problem iterating result set using a fuseki endpoint

Reply via email to