[jira] [Commented] (JENA-1534) Variables in EXISTS must be considered for the join strategy

2018-04-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458042#comment-16458042
 ] 

ASF GitHub Bot commented on JENA-1534:
--

Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/409


> Variables in EXISTS must be considered for the join strategy
> 
>
> Key: JENA-1534
> URL: https://issues.apache.org/jira/browse/JENA-1534
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 3.7.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.8.0
>
>
> This query has a join between the GRAPH and the pattern before it.
> {noformat}
> SELECT  *
> WHERE
>   { ?s  ?p  ?V0
> GRAPH ?g 
>   { ?sx  ?p  ?ox
> FILTER EXISTS { _:b0  ?p  ?V0 }
>   }
>   }
> {noformat}
> The fact {{?V0}} occurs in the LHS of the join in {{?s  ?p  ?V0}} and in the 
> FILTER but not in the rest of the RHS means the "sequence" transform can not 
> be used.
> Contrast to:
> {noformat}
> SELECT  *
> WHERE
>   { ?s  ?p  ?V1
> GRAPH ?g 
>   { ?sx  ?p  ?ox
> FILTER EXISTS { _:b0  ?p  ?V2 }
>   }
>   }
> {noformat}
> Now {{?V2}} is only in the FILTER so it is safe to transform the join.
> Note that {{?p}} appears in LHS so making it defined in the EXISTS and the 
> sequence transfrom is possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1534) Variables in EXISTS must be considered for the join strategy

2018-04-29 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458036#comment-16458036
 ] 

ASF subversion and git services commented on JENA-1534:
---

Commit 1bd5fdcf3ac0694e0bbabd2f8e63deb92d3c2bff in jena's branch 
refs/heads/master from [~andy.seaborne]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=1bd5fdc ]

JENA-1534: Test for filter-only variables in EXISTS.

Tighten the classifer, assumes proper use of (sequence)
Relates to JENA-1167, JENA-1280.


> Variables in EXISTS must be considered for the join strategy
> 
>
> Key: JENA-1534
> URL: https://issues.apache.org/jira/browse/JENA-1534
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 3.7.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.8.0
>
>
> This query has a join between the GRAPH and the pattern before it.
> {noformat}
> SELECT  *
> WHERE
>   { ?s  ?p  ?V0
> GRAPH ?g 
>   { ?sx  ?p  ?ox
> FILTER EXISTS { _:b0  ?p  ?V0 }
>   }
>   }
> {noformat}
> The fact {{?V0}} occurs in the LHS of the join in {{?s  ?p  ?V0}} and in the 
> FILTER but not in the rest of the RHS means the "sequence" transform can not 
> be used.
> Contrast to:
> {noformat}
> SELECT  *
> WHERE
>   { ?s  ?p  ?V1
> GRAPH ?g 
>   { ?sx  ?p  ?ox
> FILTER EXISTS { _:b0  ?p  ?V2 }
>   }
>   }
> {noformat}
> Now {{?V2}} is only in the FILTER so it is safe to transform the join.
> Note that {{?p}} appears in LHS so making it defined in the EXISTS and the 
> sequence transfrom is possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1534) Variables in EXISTS must be considered for the join strategy

2018-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16457267#comment-16457267
 ] 

ASF GitHub Bot commented on JENA-1534:
--

Github user kinow commented on the issue:

https://github.com/apache/jena/pull/409
  
Learning a bit more of Jena internals (weather is really bad here in the 
antipodes :-) ). So picked this PR to review, so that I could learn more about 
Jena internals.

Had a look at the previous tickets, and from JENA-1534, looks like the 
query was not considering a variable for a join, I think.

So created an empty persistent dataset in TDB, loaded the `books.ttl` from 
Jena code, and the `yso.ttl` from SKOSMOS in the yso graph (suspected it would 
have some blank nodes, etc).

Tried the query 

```sql
SELECT * WHERE { ?s ?p ?V0 GRAPH ?g { ?sx ?p ?ox FILTER EXISTS { _:b0 ?p 
?V0 } } }
```

And it returned few results, quite quickly. So went and tried `tdbquery` 
from TDB (not TDB2). With following command line (actually in Eclipse, but same 
as):

```shell
tdbquery --explain 
--loc=/home/kinow/Development/java/jena/jena/jena-fuseki2/jena-fuseki-core/run/databases/p1/
 "SELECT * WHERE { ?s ?p ?V0 GRAPH ?g { ?sx ?p ?ox FILTER EXISTS { _:b0 ?p ?V0 
} } }"
```

Adding also `-Dlog4j.configuration=file:///tmp/log4j.properties 
-Dlog4j.debug=true` to see the explain output.

With the code from master, it took <2 seconds to execute the query, and 
produced the following algebra:

```sql
12:36:03 INFO  exec  :: QUERY
  SELECT  *
  WHERE
{ ?s  ?p  ?V0
  GRAPH ?g
{ ?sx  ?p  ?ox
  FILTER EXISTS { _:b0  ?p  ?V0 }
}
}
12:36:03 INFO  exec  :: ALGEBRA
  (sequence
(quadpattern (quad  ?s ?p ?V0))
(filter (exists
   (quadpattern (quad ?g ??0 ?p ?V0)))
  (quadpattern (quad ?g ?sx ?p ?ox
```

So that was a sequence (related to `OpSequence` in Jena, which I'm using to 
search in Eclipse for occurrences to see how it's used). Checked out the branch.

```shell
$ git fetch --all
$ git fetch github refs/pull/409/head:pr-409
$ git checkout pr-409
$ mvn clean install -Pdev -DskipTests
```

Did the `tdbquery` command again, now the algebra became:

```sql
12:41:48 INFO  exec  :: QUERY
  SELECT  *
  WHERE
{ ?s  ?p  ?V0
  GRAPH ?g
{ ?sx  ?p  ?ox
  FILTER EXISTS { _:b0  ?p  ?V0 }
}
}
12:41:48 INFO  exec  :: ALGEBRA
  (join
(quadpattern (quad  ?s ?p ?V0))
(filter (exists
   (quadpattern (quad ?g ??0 ?p ?V0)))
  (quadpattern (quad ?g ?sx ?p ?ox
12:41:48 INFO  exec  :: TDB
  (join
(quadpattern (quad  ?s ?p ?V0))
(filter (exists
   (quadpattern (quad ?g ??0 ?p ?V0)))
  (quadpattern (quad ?g ?sx ?p ?ox
```

A join ! If I understood the tickets, that's the exactly intended 
behaviour, as before the variable in the exist was not being taken into 
consideration to produce a `JOIN` (`OpJoin`). The query also took much longer, 
>10 seconds (I guess SKOSMOS' YSO vocab has something like ~220K triples? The 
Harry Potter books dataset should have like 5? anywho).

So +1 ! LGTM :tada: 

Will probably spend some time reading more of the code base to see if I can 
learn a bit more. And found two posts 
([1](https://gregheartsfield.com/2012/08/26/jena-arq-query-performance.html) 
[2](https://www.slideshare.net/olafhartig/the-semantics-of-sparql) with some 
interesting content. In case you have any other pointers to learn more about 
it, or if I said anything silly, feel free to correct/share, please :-)

*ps: while reading the javadocs of the `Op*` classes, noticed some typos in 
OpSequence. Should I open a pull request for that, or just commit to master?*

```java
-/** A "sequence" is a join-like operation where it is know that the 
- * the output of one step can be fed into the input of the next 
+/** A "sequence" is a join-like operation where it is known that
+ * the output of one step can be fed into the input of the next
  * (that is, no scoping issues arise). */
 
 public class OpSequence extends OpN
```


> Variables in EXISTS must be considered for the join strategy
> 
>
> Key: JENA-1534
> URL: https://issues.apache.org/jira/browse/JENA-1534
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 3.7.0
>Reporter: Andy Seaborne
>

[jira] [Commented] (JENA-1534) Variables in EXISTS must be considered for the join strategy

2018-04-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456867#comment-16456867
 ] 

ASF GitHub Bot commented on JENA-1534:
--

GitHub user afs opened a pull request:

https://github.com/apache/jena/pull/409

JENA-1534: Test for filter-only variables in EXISTS.

Tighten the classifer, assumes proper use of (sequence)
Relates to JENA-1167, JENA-1280.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/afs/jena vars-not-exists

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/409.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #409


commit 1bd5fdcf3ac0694e0bbabd2f8e63deb92d3c2bff
Author: Andy Seaborne 
Date:   2018-04-27T18:12:37Z

JENA-1534: Test for filter-only variables in EXISTS.

Tighten the classifer, assumes proper use of (sequence)
Relates to JENA-1167, JENA-1280.




> Variables in EXISTS must be considered for the join strategy
> 
>
> Key: JENA-1534
> URL: https://issues.apache.org/jira/browse/JENA-1534
> Project: Apache Jena
>  Issue Type: Bug
>  Components: ARQ
>Affects Versions: Jena 3.7.0
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.8.0
>
>
> This query has a join between the GRAPH and the pattern before it.
> {noformat}
> SELECT  *
> WHERE
>   { ?s  ?p  ?V0
> GRAPH ?g 
>   { ?sx  ?p  ?ox
> FILTER EXISTS { _:b0  ?p  ?V0 }
>   }
>   }
> {noformat}
> The fact {{?V0}} occurs in the LHS of the join in {{?s  ?p  ?V0}} and in the 
> FILTER but not in the rest of the RHS means the "sequence" transform can not 
> be used.
> Contrast to:
> {noformat}
> SELECT  *
> WHERE
>   { ?s  ?p  ?V1
> GRAPH ?g 
>   { ?sx  ?p  ?ox
> FILTER EXISTS { _:b0  ?p  ?V2 }
>   }
>   }
> {noformat}
> Now {{?V2}} is only in the FILTER so it is safe to transform the join.
> Note that {{?p}} appears in LHS so making it defined in the EXISTS and the 
> sequence transfrom is possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)