Hi Andy and Lorenz, thanks for your quick replies. I am not trying to parse
full SPARQL, but actually only the Basic Graph Pattern part of a query. Is the
org.apache.jena.sparql.lang.arq.ARQParser class not parsing SPARQL 1.1?
Based on Andy's directions it seems like doing the following additional check
after parsing the string works for detecting graph patterns with dashes in
their variable name at the object location.
if (parser1.token.next.kind == ARQParser.EOF) {
// found valid graph pattern
System.out.println("Graph pattern parse successful!");
} else {
// stream not empty, so not a valid graph pattern
System.out.println("Graph pattern parse failed!");
}
Thanks for your help!
Best regards,
Barry
-----Original Message-----
From: Andy Seaborne <[email protected]>
Sent: dinsdag 5 april 2022 14:07
To: [email protected]
Subject: Re: ARQ variables with dashes
Inline.
Summary : it didn't consume the whole input, only up to the end of the legal
part.
On 05/04/2022 12:43, Lorenz Buehmann wrote:
> Hi Barry,
>
>
> Did you try SPARQL1.1 parser instead? Afaik, ARQ was always beyond
> SPARQL 1.1 or better said, already before SPARQL 1.1 with some extensions.
>
> Indeed, Andy will correct me soon :D
>
> The grammar files for JavaCC are here:
>
> https://github.com/apache/jena/tree/main/jena-arq/Grammar
>
> You can check arq.jj and sparql_11.jj
>
>
> Or just wait for Andy's response ...
>
>
> Cheers,
>
> Lorenz
>
>
>
> On 05.04.22 13:21, Nouwt, B. (Barry) wrote:
>> Hi everyone,
>>
>> We are using ARQ's SPARQL parser to parse graph patterns and noticed
>> that it allows dashes in variable names if these variables occur as
>> the *object* location of a triple pattern. If the variable names at
>> the *subject* location of a triple pattern contains dashes, it fails
>> with a ParseException. As far as we could tell the SPARQL
>> specification does not allow dashes in variable names at all
>> (https://www.w3.org/TR/sparql11-query/#rVARNAME). The pattern1 and
>> pattern2 below should both fail, but the first one does not fail and
>> the second does fail.
>>
>> String pattern1 = "<test> https://www.tno.nl/example/b ?community-ID
>> ."; ARQParser parser1 = new ARQParser(new StringReader(pattern1));
>> parser1.GroupGraphPatternSub();
Calling into the middle of the parse doesn't work so easily.
It has parsed up to the end of legal triple pattern.
"<test> https://www.tno.nl/example/b ?community"
when it sees the "-" the variable name has ended and (because the "." is not
required) it is a legal GroupGraphPatternSub
The "-ID ." is left in the token input stream.
You have to test whether end-of-input has been reached.
try
qparse 'SELECT * { <test> <p> ?o-1 }'
Parse error because "-1", the next token (tokenizing is done ahead of where the
parser grammar is the 1 in LL(1)) is not legal.
This is illegal because there is check for end of input:
qparse 'SELECT * { <test> <p> ?o } XXX'
The top level entry point is
void QueryUnit(): { }
{
ByteOrderMark()
Query()
<EOF>
}
so the parser must see <EOF> to be valid and exit without error.
Andy
>>
>> String pattern2 = "?community-ID https://www.tno.nl/example/b <test>
>> ."; ARQParser parser2 = new ARQParser(new StringReader(pattern2));
>> parser2.GroupGraphPatternSub();
>>
>> Is this a bug?
>>
>> Best regards,
>>
>> Barry
>> This message may contain information that is not intended for you. If
>> you are not the addressee or if this message was sent to you by
>> mistake, you are requested to inform the sender and delete the
>> message. TNO accepts no liability for the content of this e-mail, for
>> the manner in which you use it and for damage of any kind resulting
>> from the risks inherent to the electronic transmission of messages.
>>