Sigh. One more time:
PREFIX example: <http://example.org/>
SELECT ?subj1 ?subj2
WHERE
{
?subj1 example:pred ?obj1 .
?subj2 example:pred ?obj1 .
FILTER (?subj1 != ?subj2)
MINUS
{
{
SELECT ?obj1 (COUNT(?obj1) as ?objOccurrences)
WHERE
{
?s example:pred ?obj1 .
}
GROUP BY ?obj1
}
FILTER (?objOccurrences > 100)
}
}
On Thu, Sep 6, 2012 at 6:03 PM, Stephen Allen <[email protected]> wrote:
> Oops, typo in the query I gave you. You need to share the variable!
> Corrected query:
>
> PREFIX example: <http://example.org/>
>
> SELECT ?subj1 ?subj2
> WHERE
> {
> ?subj1 example:pred ?obj1 .
> ?subj2 example:pred ?obj1 .
> FILTER (?subj1 != ?subj2)
>
> MINUS
> {
> {
> SELECT ?obj1 (COUNT(?obj1) as ?objOccurrences)
> WHERE
> {
> ?s example:pred ?obj1 .
> }
> GROUP BY ?obj1
> }
> FILTER (?objOccurrences > 100)
> }
> }
>
>
>
> On Thu, Sep 6, 2012 at 5:58 PM, Stephen Allen <[email protected]> wrote:
>> On Thu, Sep 6, 2012 at 3:21 PM, Rob Stewart <[email protected]> wrote:
>>> Hi,
>>>
>>> Firstly, I'm having trouble finding any *full* examples of SPARQL 1.1
>>> queries that FILTER on "NOT IN". I also cannot find any documentation
>>> on the ARQ engine support for "NOT IN", or indeed the fuseki support
>>> for "NOT IN". Could someone point me to various canonical examples of
>>> such "NOT IN" queries that fuseki supports?
>>>
>>> I've come up with my own for now. Would people mind commenting on
>>> whether they believe that fuseki would support the query? It doesn't
>>> seem to be negating the commonly occurring objects. I'm using Fuseki
>>> 0.2.4 and the tdbloader from "apache-jena-2.7.4-SNAPSHOT". The
>>> intention is to find two distinct subjects that share the same objects
>>> for a given predicate, negating the most common objects. I deem
>>> "common" to be more than 100 occurrences in the TDB store.
>>>
>>> -----
>>>
>>> SELECT ?subj1 subj2
>>> WHERE
>>> {
>>>
>>> ?subj1 example:pred ?obj1 .
>>> ?subj2 example:pred ?obj1 .
>>> FILTER (?subj1 != ?subj2)
>>>
>>> {
>>> SELECT ?veryPopularObj
>>> WHERE
>>> {
>>> {
>>> SELECT ?veryPopularObj (COUNT(?veryPopularObj) as ?objOccurrences)
>>> WHERE
>>> {
>>> ?s example:pred ?veryPopularObj .
>>> }
>>> GROUP BY ?veryPopularObj
>>> }
>>> FILTER (?objOccurrences > 100)
>>> }
>>> }
>>>
>>> FILTER ( ?obj1 NOT IN (?veryPopularObj) )
>>>
>>> }
>>
>>
>> Rob,
>>
>> IN and NOT IN evaluate expressions. In your query, you are performing
>> a cross product between the binding (?subj1, ?subj2, ?obj1) and the
>> binding (?veryPopularObj). This occurs because there are no shared
>> variables. Your NOT IN filter will then pass for most rows.
>>
>> Instead, you should use SPARQL's negation feature [1]. Here is your
>> query rewritten to use MINUS:
>>
>> PREFIX example: <http://example.org/>
>>
>> SELECT ?subj1 ?subj2
>> WHERE
>> {
>> ?subj1 example:pred ?obj1 .
>> ?subj2 example:pred ?obj1 .
>> FILTER (?subj1 != ?subj2)
>>
>> MINUS
>> {
>> {
>> SELECT ?veryPopularObj (COUNT(?veryPopularObj) as ?objOccurrences)
>> WHERE
>> {
>> ?s example:pred ?veryPopularObj .
>> }
>> GROUP BY ?veryPopularObj
>> }
>> FILTER (?objOccurrences > 100)
>> }
>> }
>>
>> -Stephen
>>
>> [1] http://www.w3.org/TR/sparql11-query/#negation