On 08/04/2022 14:00, Nouwt, B. (Barry) wrote:
Hi Andy and Lorenz, thanks for your quick replies. I am not trying to parse full SPARQL, but actually only the Basic Graph Pattern part of a query.

The most robust way is to wrap in enough query text, "SELECT * {"+ + "}" and pull out the WHERE clause with getQueryPattern().

Is the org.apache.jena.sparql.lang.arq.ARQParser class not parsing SPARQL 1.1?

Yes, it is.  ARQ is a superset of SPARQL 1.1

Based on Andy's directions it seems like doing the following additional check 
after parsing the string works for detecting graph patterns with dashes in 
their variable name at the object location.

if (parser1.token.next.kind == ARQParser.EOF) {
        // found valid graph pattern
        System.out.println("Graph pattern parse successful!");
} else {
        // stream not empty, so not a valid graph pattern
        System.out.println("Graph pattern parse failed!");
}

That assumes the parser did not peek ahead.

token.next may be null and you need to call getNextToken() -- from looking at the parser have code. Or maybe check parser.token.endColumn / parser.token.endLine ;

Best is to use a parser rule - either full query or modify the parser.


Thanks for your help!

Best regards,

Barry

-----Original Message-----
From: Andy Seaborne <[email protected]>
Sent: dinsdag 5 april 2022 14:07
To: [email protected]
Subject: Re: ARQ variables with dashes

Inline.

Summary : it didn't consume the whole input, only up to the end of the legal 
part.

On 05/04/2022 12:43, Lorenz Buehmann wrote:
Hi Barry,


Did you try SPARQL1.1 parser instead? Afaik, ARQ was always beyond
SPARQL 1.1 or better said, already before SPARQL 1.1 with some extensions.

Indeed, Andy will correct me soon :D

The grammar files for JavaCC are here:

https://github.com/apache/jena/tree/main/jena-arq/Grammar

You can check arq.jj and sparql_11.jj


Or just wait for Andy's response ...


Cheers,

Lorenz



On 05.04.22 13:21, Nouwt, B. (Barry) wrote:
Hi everyone,

We are using ARQ's SPARQL parser to parse graph patterns and noticed
that it allows dashes in variable names if these variables occur as
the *object* location of a triple pattern. If the variable names at
the *subject* location of a triple pattern contains dashes, it fails
with a ParseException. As far as we could tell the SPARQL
specification does not allow dashes in variable names at all
(https://www.w3.org/TR/sparql11-query/#rVARNAME). The pattern1 and
pattern2 below should both fail, but the first one does not fail and
the second does fail.

String pattern1 = "<test> https://www.tno.nl/example/b ?community-ID
."; ARQParser parser1 = new ARQParser(new StringReader(pattern1));
parser1.GroupGraphPatternSub();

Calling into the middle of the parse doesn't work so easily.

It has parsed up to the end of legal triple pattern.

"<test> https://www.tno.nl/example/b ?community"

when it sees the "-" the variable name has ended and (because the "." is not 
required) it is a legal GroupGraphPatternSub

The "-ID ." is left in the token input stream.

You have to test whether end-of-input has been reached.


try

qparse 'SELECT * { <test> <p> ?o-1 }'

Parse error because "-1", the next token (tokenizing is done ahead of where the 
parser grammar is the 1 in LL(1)) is not legal.

This is illegal because there is check for end of input:

qparse 'SELECT * { <test> <p> ?o } XXX'

The top level entry point is

void QueryUnit(): { }
{
    ByteOrderMark()
    Query()
    <EOF>
}

so the parser must see <EOF> to be valid and exit without error.

      Andy


String pattern2 = "?community-ID https://www.tno.nl/example/b <test>
."; ARQParser parser2 = new ARQParser(new StringReader(pattern2));
parser2.GroupGraphPatternSub();

Is this a bug?

Best regards,

Barry
This message may contain information that is not intended for you. If
you are not the addressee or if this message was sent to you by
mistake, you are requested to inform the sender and delete the
message. TNO accepts no liability for the content of this e-mail, for
the manner in which you use it and for damage of any kind resulting
from the risks inherent to the electronic transmission of messages.

Reply via email to