"""
How can i get the Fuseki API via SPARQLWrapper to properly report a
detailed error message e.g. with something like "error in line #
cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is
not a valid triple?
"""
This is a Q about SPARQLWrapper, not Fuseki.
Look in the response body because, for Fuseki, it has the details of the
error in plain text.
You can also print the query out in Python and parse it with Jena
locally. Or send it with curl which prints the body.
Andy
On 19/08/2020 13:18, Wolfgang Fahl wrote:
Dear Apache Jena Users,
you'll find this mail also as
https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err
in the last few weeks i tried out some graph databases in the python
environment. Namely:
- weaviate see http://wiki.bitplan.com/index.php/Weaviate
- dgraph http://wiki.bitplan.com/index.php/Dgraph
- ruruki https://pypi.org/project/ruruki/
and created a test project documented at
http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest and open source at:
https://github.com/WolfgangFahl/DgraphAndWeaviateTest
After some ups and downs in the evaluation process i decided to try out
Apache Jena / Fuseki /SPARQL as an alternative and added:
https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py
and
https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py
to allow for a "round trip" operation between python list of dicts and
Jena/SPARQL based storage.
The approach performs very well for my usecase and after trying it out
for a while i get into more details that need to be addressed.
The stackoverflow question
https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki/63440396#63440396
addresses the initial issues and
https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed
issues 2-5 show some detail problems that were already fixed.
Now I am working with some 180000 records i'd like to import from 6
different data sources and each data source seems to have new exotic records
that make the approach fail.
E.g. one batch of records gives me the following log:
read 45601 events in 0.6 s
storing 45601 events to sparql
batch for 1 - 2000 of 45601 cr:Event in 0.6 s
-> 0.6 s
batch for 2001 - 4000 of 45601 cr:Event in 0.5 s
-> 1.1 s
batch for 4001 - 6000 of 45601 cr:Event in 0.5 s
-> 1.6 s
batch for 6001 - 8000 of 45601 cr:Event in 0.5 s
-> 2.1 s
batch for 8001 - 10000 of 45601 cr:Event in 0.5 s
-> 2.6 s
batch for 10001 - 12000 of 45601 cr:Event in 0.7 s
-> 3.2 s
======================================================================
ERROR: testCrossref (tests.test_Crossref.TestCrossref)
test loading crossref data
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py",
line 1073, in _query
response = urlopener(request)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 222, in urlopen
return opener.open(url, data, timeout)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 531, in open
response = meth(req, response)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 640, in http_response
response = self.parent.error(
File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 569, in error
return self._call_chain(*args)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 502, in _call_chain
result = func(*args)
File
"/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py",
line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad
request has been sent to the endpoint, probably the sparql query is bad
formed.
Response:
b'Error 400: Bad Request\n'
Now since I don't get any details on what the problem is i am working
with a binary search. With the error above i only know the problem
is with a record with a batchIndex between 12000 and 14000 so I am .
setting the limit to 14000 and batchSize to 100 to get closer.
batch for 13301 - 13400 of 14000 cr:Event in 0.0 s
-> 4.3 s
is now the last successful batch. So i am using a binary search: 13450
fail, 13425 fail, 13412 ok, 13418 ok, 13422 fail, 13420 ok, 13421 ok
So record 13422 is the culprit and I switch on debug mode to see the
INSERT Data created for the record:
cr:Event__102140gtm20003 cr:Event_name "Higher local fields".
cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany".
cr:Event__102140gtm20003 cr:Event_source "crossref".
cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3".
cr:Event__102140gtm20003 cr:Event_title "Invitation to higher local
fields".
cr:Event__102140gtm20003 cr:Event_startDate
"1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>.
cr:Event__102140gtm20003 cr:Event_year 1999.
cr:Event__102140gtm20003 cr:Event_month 9.
cr:Event__102140gtm20003 cr:Event_endDate
"1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>.
So the Umlaut-encoding "\\u" in the location "Münster" is the culprit
here. I will work around this issue. The real question is:
*How can i get the Fuseki API via SPARQLWrapper to properly report a
detailed error message e.g. with something like "error in line #
cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is
not a valid triple?**
*
Yours
Wolfgang
--
BITPlan - smart solutions
Wolfgang Fahl
Pater-Delp-Str. 1, D-47877 Willich Schiefbahn
Tel. +49 2154 811-480, Fax +49 2154 811-481
Web:http://www.bitplan.de
BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548,
Geschäftsführer: Wolfgang Fahl