Re: How to remove consistently a triple pattern given a SPARQL query?

Andy Seaborne Tue, 02 Feb 2016 14:34:28 -0800

On 02/02/16 19:29, Paul Houle wrote:

Carlo,  Andy,


    I like the Iterator<> interfaces in the Jena framework for getting data
out,  but I make a habit of always putting results in a List  or Queue or
something before putting them back into the same Jena model because i get
less BS per mile that way in terms of Exceptions and other exceptional
events.

    Does Jena have an official policy on being reenterable in that way?


Carlo's issues are nothing to do with iterator policy.

Carlo - Use arg1 else you will not see your changes so far.

In

   public Element transform(ElementGroup arg0, List<Element> arg1)

arg0 is the element from the AST before modification and

arg1 is the new elements to go in after modification by lower levels ofthe bottom-up rewrite.

So if you rewrite a ElementFilter by making a new one, it will appear inarg1 not in arg0.


Do not modify the Element* arguments in place.


See ElementTransformCopyBase

The default implementation is:

    @Override
    public Element transform(ElementGroup el, List<Element> elts) {
        if ( el.getElements() == elts )
            return el ;
        ElementGroup el2 = new ElementGroup() ;
        el2.getElements().addAll(elts) ;
        return el2 ;
    }

i.e if any change, detected by being not the exact identical list, thendo a copy of the structure. This saves object churn.


        Andy


On Tue, Feb 2, 2016 at 2:13 PM, Carlo.Allocca <[email protected]>
wrote:


Dear Andy and All,

while I was extending and testing the code that I wrote so far concerning
the removing a triple from a given SPARQL query,
I realised that I get different outputs depending on how I start the
implementation of the public Element transform(ElementGroup arg0,
List<Element> arg1).
In particular, if I start with (1) I obtain some results, if I start with
(2) I obtain something different (you can see below the details).

I have also used ElementTransformCleanGroupsOfOne when ElementGroup is
empty
         ElementTransform transform = new
ElementTransformCleanGroupsOfOne();
         Element el2 = ElementTransformer.transform(eg, transform);
         return el2;

but no difference in results. I am sure I am doing something wrong.
Moreover, my questions are: what is the main difference between the two
approaches? and when I should use ElementGroup arg0 and when List<Element>
arg1?


(1) public Element transform(ElementGroup arg0, List<Element> arg1) {
List<Element> elemList = arg0.getElements();
         Iterator<Element> itr = elemList.iterator();
while (itr.hasNext()) {


}
…
…
}


(2)     public Element transform(ElementGroup arg0, List<Element> arg1) {



Iterator<Element> itr = arg1.iterator();
         while (itr.hasNext()) {


}
…
…
}

I know that it may be related to the little knowledge about Jena.
Many Thanks in advice for your clarification on the above.

Best Regards,
Carlo


=======

Below, I reported the used code (at very bottom), the two used scenario
with test-cases and results. In practice, you can notice that:



==== TESTING:

Scenario A:

     public Element transform(ElementGroup arg0, List<Element> arg1) {



List<Element> elemList = arg0.getElements();
         Iterator<Element> itr = elemList.iterator();
while (itr.hasNext()) {


}
…
…
}


Test 1:

The triple to remove is (?x  foaf:mbox  ?mbox ) using the below query Q1:

=========== BEFORE Q1

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT  ?name ?mbox
WHERE
   { ?x  foaf:name  ?name
     OPTIONAL
       { ?x  foaf:mbox  ?mbox }
   }


============= AFTER Q1

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT  ?name ?mbox
WHERE
   { ?x  foaf:name  ?name }


Test2:

The triple to remove is (?boss1  ex:isBossOf1  ?ind ) using the below
query Q2:



=========== BEFORE Q2

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  ex:   <http://www.semanticweb.org/dataset1/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT  ?ind ?boss ?g
WHERE
   {   { ?ind  rdf:type  ?z
         OPTIONAL
           { ?boss1  ex:isBossOf1  ?ind }
       }
     UNION
       {   { ?boss  ex:isBossOf1  ?ind }
         UNION
           { ?boss  ex:isBossOf  ?ind
             FILTER ( ?boss = "mathieu" )
           }
       }
   }

============= AFTER Q2: it does not remove the triple.

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  ex:   <http://www.semanticweb.org/dataset1/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT  ?ind ?boss ?g
WHERE
   {   { ?ind  rdf:type  ?z }
     UNION
       {   { ?boss  ex:isBossOf1  ?ind }
         UNION
           { ?boss  ex:isBossOf  ?ind
             FILTER ( ?boss = "mathieu" )
           }
       }
   }


Test 3: The triple to remove is (?ind  rdf:type  ?z) using the below query
Q3:

=========== BEFORE Q3:

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  ex:   <http://www.semanticweb.org/dataset1/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT  ?ind ?boss ?g
WHERE
   { ?ind  rdf:type  ?z
     FILTER ( ?ind = "mathieu" )
   }

============= AFTER Q3: There is still an empty BGP present.

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  ex:   <http://www.semanticweb.org/dataset1/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT  ?ind ?boss ?g
WHERE
   { # Empty BGP



   }








Scenario B:

     public Element transform(ElementGroup arg0, List<Element> arg1) {



Iterator<Element> itr = arg1.iterator();
         while (itr.hasNext()) {


}
…
…
}


Test 1:

The triple to remove is (?x  foaf:mbox  ?mbox ) using the below query Q1:

=========== BEFORE Q1

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT  ?name ?mbox
WHERE
   { ?x  foaf:name  ?name
     OPTIONAL
       { ?x  foaf:mbox  ?mbox }
   }


============= AFTER Q1: there is still the OPTION (with a ElementGroup
empty) clause.

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT  ?name ?mbox
WHERE
   { ?x  foaf:name  ?name
     OPTIONAL
       { # Empty BGP



       }
   }




Test 2:

The triple to remove is (?boss1  ex:isBossOf1  ?ind ) using the below
query Q2:



=========== BEFORE Q2

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  ex:   <http://www.semanticweb.org/dataset1/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT  ?ind ?boss ?g
WHERE
   {   { ?ind  rdf:type  ?z
         OPTIONAL
           { ?boss1  ex:isBossOf1  ?ind }
       }
     UNION
       {   { ?boss  ex:isBossOf1  ?ind }
         UNION
           { ?boss  ex:isBossOf  ?ind
             FILTER ( ?boss = "mathieu" )
           }
       }
   }

============= AFTER Q2: it does not remove the OPTION and it leaves an
empty BGP.

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  ex:   <http://www.semanticweb.org/dataset1/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT  ?ind ?boss ?g
WHERE
   {   { ?ind  rdf:type  ?z
         OPTIONAL
           { # Empty BGP



           }
       }
     UNION
       {   { ?boss  ex:isBossOf1  ?ind }
         UNION
           { ?boss  ex:isBossOf  ?ind
             FILTER ( ?boss = "mathieu" )
           }
       }
   }

Test 3: The triple to remove is (?ind  rdf:type  ?z) using the below query
Q3:

=========== BEFORE Q3

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  ex:   <http://www.semanticweb.org/dataset1/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT  ?ind ?boss ?g
WHERE
   { ?ind  rdf:type  ?z
     FILTER ( ?ind = "mathieu" )
   }

============= AFTER Q3: It does not remove the FILTER, but just the triple.

PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  ex:   <http://www.semanticweb.org/dataset1/>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT  ?ind ?boss ?g
WHERE
   {   { ?ind  rdf:type  ?z }
     UNION
       { # Empty BGP



         FILTER ( ?boss = "mathieu" )
       }
   }







=== FULL CODE used with     public Element transform(ElementPathBlock
eltPB)

@Override
     public Element transform(ElementPathBlock eltPB) {
         if (eltPB.isEmpty()) {

//System.out.println("[RemoveOpTransform::transform(ElementPathBlock arg0)]
ElementPathBlock IS EMPTY:: " + eltPB.toString());
             return eltPB;
         }
         System.out.println("[RemoveOpTransform::transform(ElementPathBlock
arg0)] ElementPathBlock:: " + eltPB.toString());
         Iterator<TriplePath> l = eltPB.patternElts();
         while (l.hasNext()) {
             TriplePath tp = l.next();
             if (tp.asTriple().matches(this.triple)) {
                 l.remove();

System.out.println("[RemoveOpTransform::transform(ElementPathBlock arg0)]
ElementPathBlock:: " + tp.toString() + " TRIPLE JUST REMOVED!!!");

//System.out.println("[RemoveOpTransform::transform(ElementPathBlock arg0)]
TRIPLE JUST REMOVED!!! ");
                 System.out.println("");
                 return this.transform(eltPB);//eltPB;
             }
         }
         return eltPB;
     }


=== FULL CODE used with public Element transform(ElementGroup arg0,
List<Element> arg1)

@Override
     public Element transform(ElementGroup arg0, List<Element> arg1) {



         List<Element> elemList = arg0.getElements();
         Iterator<Element> itr = elemList.iterator();
         //Iterator<Element> itr = arg1.iterator();
         while (itr.hasNext()) {
             Element elem = itr.next();
             if (elem instanceof ElementOptional) {
                 boolean isElementOptionalEmpty =
isElementOptionalEmpty((ElementOptional) elem);
                 if (isElementOptionalEmpty) {
                     itr.remove();
                 }
             }

             else if (elem instanceof ElementGroup) {
                 boolean isElementGroupEmpty =
isElementGroupEmpty((ElementGroup) elem);
                 if (isElementGroupEmpty) {
                     itr.remove();
                 }
             }
             else if (elem instanceof ElementFilter) {
                 //... check if this filter is the one that we should remove
                 //...get the variables of the triple pattern that we want
to delete
                 Set<Var> tpVars = new HashSet();
                 Node subj = this.triple.getSubject();
                 if (subj.isVariable()) {
                     tpVars.add((Var) subj);
                 }
                 Node pred = this.triple.getPredicate();
                 if (pred.isVariable()) {
                     tpVars.add((Var) pred);
                 }
                 Node obj = this.triple.getObject();
                 if (obj.isVariable()) {
                     tpVars.add((Var) obj);
                 }
                 //...get the variables of the FILTER expression
                 Set<Var> expVars = ((ElementFilter)
elem).getExpr().getVarsMentioned();
                 //...check whether the FILTER expression contains any of
the triple pattern variable
                 for (Var var : expVars) {
                     //..if it does then we have to delete the entire
FILTER expression
                     if (tpVars.contains(var)) {
                         itr.remove();
                     }
                 }
             }
             else if (elem instanceof ElementUnion) {
                 boolean isUnionBothSidesEmpty =
isUnionBothSidesEmpty1((ElementUnion) elem);
                 if (isUnionBothSidesEmpty) {
                     itr.remove();
                 }
             }

         }
         return arg0;
     }








On 2 Feb 2016, at 10:54, Carlo.Allocca <[email protected]<mailto:
[email protected]>> wrote:

Dear Andy,

Thank you for your time. Very appreciated.
Some comments follow in lines.

On 2 Feb 2016, at 09:36, Andy Seaborne <[email protected]<mailto:
[email protected]>> wrote:


when removing the triple (?boss ex:isBossOf ?ind .”), I get

SELECT DISTINCT  ?ind ?boss ?g
WHERE
  {   { ?ind  rdf:type  ?z }
    UNION
      {   { ?boss  ex:isBossOf1  ?ind }
        UNION
          { # Empty BGP

          }
      }
  }

which is OK.
I just need to find out how to remove an ElementGroup which contains only
one element which is the EMPTY one.
Of course, I need to do the same for the other case, e.g. OPTION,
SUBquery, etc.

Do note that evaluating {} (empty syntax group) yields one row of zero
columns - it contributes to the overall results (it's the join identity).

I see. To avoid this I am going to apply a
ElementTransformCleanGroupsOfOne as you suggested.


Now you have to look at all the elements that have a group in
ElementUnion, ElementOptional, ElementMinus, …
Yes, I need to cover all the SPARQL language from the “public Element
transform(ElementGroup arg0, List<Element> arg1)” call.
At least this is my understanding so far.



That is what ElementTransformCleanGroupsOfOne does, except it looks for
"groups of one"

..  UNION { { stuff } }

and isn't to fussy about finding them all (it's an optimization, more a
tidying of the tree, not a change in the effect of a query which is what
removing triple patterns is).

And of course changes from the bottom could potentially cause change all
the way up to the top of the syntax tree.

also: they maybe be original, legal empty groups in the tree.

Thanks for the detailed clarifications. Indeed, I will consider them.

Many Thanks,
Best Regards,
Carlo



   Andy






-- The Open University is incorporated by Royal Charter (RC 000391), an
exempt charity in England & Wales and a charity registered in Scotland (SC
038302). The Open University is authorised and regulated by the Financial
Conduct Authority.

Re: How to remove consistently a triple pattern given a SPARQL query?

Reply via email to