Andy - thx for the response. I tried:
curl -X POST -H "Content-Type:application/sparql-update" -d @error.data localhost:3030/cr/update Error 400: Bad Request and curl -X POST -H "Content-Type:application/sparql-update" -d @insert.data localhost:3030/cr/update cat insert.data PREFIX cr: <http://cr.bitplan.com/> INSERT DATA { cr:version cr:author "Wolfgang Fahl". } cat error.data PREFIX cr: <http://cr.bitplan.com/Event/0.1/> INSERT DATA { cr:Event__BioinformaticsofGenomeRegulationandStructureSystemsBiologyBGRSSB2018 cr:Event_title "“Bioinformatics of Genome Regulation and Structure\Systems Biology” – BGRS\SB-2018". cr:Event__BioinformaticsofGenomeRegulationandStructureSystemsBiologyBGRSSB2018 cr:Event_url "https://thenode.biologists.com/event/11th-international-multiconference-bioinformatics-genome-regulation-structuresystems-biology-bgrssb-2018/". } and followed the hint of Standislav Kralin at https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err to add a new test def testSPARQLErrorMessage(self): ''' test error handling see https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err ''' listOfDicts=[{ 'title': '“Bioinformatics of Genome Regulation and Structure\Systems Biology” – BGRS\SB-2018', 'url': 'https://thenode.biologists.com/event/11th-international-multiconference-bioinformatics-genome-regulation-structuresystems-biology-bgrssb-2018/'}] entityType="cr:Event" primaryKey='title' prefixes="PREFIX cr: <http://cr.bitplan.com/Event/0.1/>" jena=self.getJena(mode='update',typedLiterals=False,debug=True) errors=jena.insertListOfDicts(listOfDicts,entityType,primaryKey,prefixes) self.checkErrors(errors,1) error=errors[0] self.assertTrue("probably the sparql query is bad formed" in error) which gives: Response: b'Error 400: Bad Request\n' ERRORS: QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed. Response: b'Error 400: Bad Request\n' for record 0 The response body of the 400 HttpError doesn't seem to have more data and i would not know how to get extra information via the curl request. The question is IMHO still unsolved and i am not sure whether SPARQLWrapper could do better or how... Cheers Wolfgang Am 19.08.20 um 16:15 schrieb Andy Seaborne: > """ > How can i get the Fuseki API via SPARQLWrapper to properly report a > detailed error message e.g. with something like "error in line # > cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is > not a valid triple? > """ > > This is a Q about SPARQLWrapper, not Fuseki. > > Look in the response body because, for Fuseki, it has the details of > the error in plain text. > > You can also print the query out in Python and parse it with Jena > locally. Or send it with curl which prints the body. > > > Andy > > On 19/08/2020 13:18, Wolfgang Fahl wrote: >> Dear Apache Jena Users, >> >> you'll find this mail also as >> https://stackoverflow.com/questions/63486767/how-can-i-get-the-fuseki-api-via-sparqlwrapper-to-properly-report-a-detailed-err >> >> in the last few weeks i tried out some graph databases in the python >> environment. Namely: >> >> - weaviate see http://wiki.bitplan.com/index.php/Weaviate >> >> - dgraph http://wiki.bitplan.com/index.php/Dgraph >> >> - ruruki https://pypi.org/project/ruruki/ >> >> and created a test project documented at >> http://wiki.bitplan.com/index.php/DgraphAndWeaviateTest and open >> source at: >> https://github.com/WolfgangFahl/DgraphAndWeaviateTest >> >> After some ups and downs in the evaluation process i decided to try >> out Apache Jena / Fuseki /SPARQL as an alternative and added: >> >> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/storage/sparql.py >> >> and >> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/blob/master/tests/testSPARQL.py >> >> >> to allow for a "round trip" operation between python list of dicts >> and Jena/SPARQL based storage. >> >> The approach performs very well for my usecase and after trying it >> out for a while i get into more details that need to be addressed. >> >> The stackoverflow question >> https://stackoverflow.com/questions/63435157/listofdict-to-rdf-conversion-in-python-targeting-apache-jena-fuseki/63440396#63440396 >> addresses the initial issues and >> https://github.com/WolfgangFahl/DgraphAndWeaviateTest/issues?q=is%3Aissue+is%3Aclosed >> issues 2-5 show some detail problems that were already fixed. >> >> Now I am working with some 180000 records i'd like to import from 6 >> different data sources and each data source seems to have new exotic >> records >> that make the approach fail. >> >> E.g. one batch of records gives me the following log: >> >> read 45601 events in 0.6 s >> storing 45601 events to sparql >> batch for 1 - 2000 of 45601 cr:Event in 0.6 s >> -> 0.6 s >> batch for 2001 - 4000 of 45601 cr:Event in 0.5 s >> -> 1.1 s >> batch for 4001 - 6000 of 45601 cr:Event in 0.5 s >> -> 1.6 s >> batch for 6001 - 8000 of 45601 cr:Event in 0.5 s >> -> 2.1 s >> batch for 8001 - 10000 of 45601 cr:Event in 0.5 s >> -> 2.6 s >> batch for 10001 - 12000 of 45601 cr:Event in 0.7 s >> -> 3.2 s >> ====================================================================== >> ERROR: testCrossref (tests.test_Crossref.TestCrossref) >> test loading crossref data >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File >> "/Users/wf/Library/Python/3.8/lib/python/site-packages/SPARQLWrapper/Wrapper.py", >> line 1073, in _query >> response = urlopener(request) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", >> line 222, in urlopen >> return opener.open(url, data, timeout) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", >> line 531, in open >> response = meth(req, response) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", >> line 640, in http_response >> response = self.parent.error( >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", >> line 569, in error >> return self._call_chain(*args) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", >> line 502, in _call_chain >> result = func(*args) >> File >> "/opt/local/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/urllib/request.py", >> line 649, in http_error_default >> raise HTTPError(req.full_url, code, msg, hdrs, fp) >> urllib.error.HTTPError: HTTP Error 400: Bad Request >> >> SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad >> request has been sent to the endpoint, probably the sparql query is >> bad formed. >> >> Response: >> b'Error 400: Bad Request\n' >> >> Now since I don't get any details on what the problem is i am working >> with a binary search. With the error above i only know the problem >> is with a record with a batchIndex between 12000 and 14000 so I am . >> setting the limit to 14000 and batchSize to 100 to get closer. >> >> batch for 13301 - 13400 of 14000 cr:Event in 0.0 s >> -> 4.3 s >> >> is now the last successful batch. So i am using a binary search: >> 13450 fail, 13425 fail, 13412 ok, 13418 ok, 13422 fail, 13420 ok, >> 13421 ok >> So record 13422 is the culprit and I switch on debug mode to see the >> INSERT Data created for the record: >> >> cr:Event__102140gtm20003 cr:Event_name "Higher local fields". >> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". >> cr:Event__102140gtm20003 cr:Event_source "crossref". >> cr:Event__102140gtm20003 cr:Event_eventId "10.2140/gtm.2000.3". >> cr:Event__102140gtm20003 cr:Event_title "Invitation to higher >> local fields". >> cr:Event__102140gtm20003 cr:Event_startDate >> "1999-08-29"^^<http://www.w3.org/2001/XMLSchema#date>. >> cr:Event__102140gtm20003 cr:Event_year 1999. >> cr:Event__102140gtm20003 cr:Event_month 9. >> cr:Event__102140gtm20003 cr:Event_endDate >> "1999-09-05"^^<http://www.w3.org/2001/XMLSchema#date>. >> >> So the Umlaut-encoding "\\u" in the location "Münster" is the culprit >> here. I will work around this issue. The real question is: >> >> *How can i get the Fuseki API via SPARQLWrapper to properly report a >> detailed error message e.g. with something like "error in line # >> cr:Event__102140gtm20003 cr:Event_location "M\\"unster, Germany". is >> not a valid triple?** >> * >> >> >> Yours >> >> Wolfgang >> >> -- >> >> BITPlan - smart solutions >> Wolfgang Fahl >> Pater-Delp-Str. 1, D-47877 Willich Schiefbahn >> Tel. +49 2154 811-480, Fax +49 2154 811-481 >> Web:http://www.bitplan.de >> BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, >> Geschäftsführer: Wolfgang Fahl >> > -- BITPlan - smart solutions Wolfgang Fahl Pater-Delp-Str. 1, D-47877 Willich Schiefbahn Tel. +49 2154 811-480, Fax +49 2154 811-481 Web: http://www.bitplan.de BITPlan GmbH, Willich - HRB 6820 Krefeld, Steuer-Nr.: 10258040548, Geschäftsführer: Wolfgang Fahl
signature.asc
Description: OpenPGP digital signature
