Oops, typo in the query I gave you.  You need to share the variable!
Corrected query:

PREFIX example: <http://example.org/>

SELECT ?subj1 ?subj2
WHERE
{
  ?subj1 example:pred ?obj1 .
  ?subj2 example:pred ?obj1 .
  FILTER (?subj1 != ?subj2)

  MINUS
  {
    {
      SELECT ?obj1 (COUNT(?obj1) as ?objOccurrences)
      WHERE
      {
        ?s example:pred ?obj1 .
      }
      GROUP BY ?obj1
    }
    FILTER (?objOccurrences > 100)
  }
}



On Thu, Sep 6, 2012 at 5:58 PM, Stephen Allen <[email protected]> wrote:
> On Thu, Sep 6, 2012 at 3:21 PM, Rob Stewart <[email protected]> wrote:
>> Hi,
>>
>> Firstly, I'm having trouble finding any *full* examples of SPARQL 1.1
>> queries that FILTER on "NOT IN". I also cannot find any documentation
>> on the ARQ engine support for "NOT IN", or indeed the fuseki support
>> for "NOT IN". Could someone point me to various canonical examples of
>> such "NOT IN" queries that fuseki supports?
>>
>> I've come up with my own for now. Would people mind commenting on
>> whether they believe that fuseki would support the query? It doesn't
>> seem to be negating the commonly occurring objects. I'm using Fuseki
>> 0.2.4 and the tdbloader from "apache-jena-2.7.4-SNAPSHOT". The
>> intention is to find two distinct subjects that share the same objects
>> for a given predicate, negating the most common objects. I deem
>> "common" to be more than 100 occurrences in the TDB store.
>>
>> -----
>>
>> SELECT ?subj1 subj2
>> WHERE
>>  {
>>
>>  ?subj1 example:pred ?obj1 .
>>  ?subj2 example:pred ?obj1 .
>>  FILTER (?subj1 != ?subj2)
>>
>>  {
>>   SELECT ?veryPopularObj
>>   WHERE
>>    {
>>      {
>>      SELECT ?veryPopularObj (COUNT(?veryPopularObj) as ?objOccurrences)
>>      WHERE
>>       {
>>       ?s example:pred ?veryPopularObj .
>>       }
>>       GROUP BY ?veryPopularObj
>>      }
>>    FILTER (?objOccurrences > 100)
>>   }
>>  }
>>
>>  FILTER ( ?obj1 NOT IN (?veryPopularObj) )
>>
>> }
>
>
> Rob,
>
> IN and NOT IN evaluate expressions.  In your query, you are performing
> a cross product between the binding (?subj1, ?subj2, ?obj1) and the
> binding (?veryPopularObj).  This occurs because there are no shared
> variables.  Your NOT IN filter will then pass for most rows.
>
> Instead, you should use SPARQL's negation feature [1].  Here is your
> query rewritten to use MINUS:
>
> PREFIX example: <http://example.org/>
>
> SELECT ?subj1 ?subj2
> WHERE
> {
>   ?subj1 example:pred ?obj1 .
>   ?subj2 example:pred ?obj1 .
>   FILTER (?subj1 != ?subj2)
>
>   MINUS
>   {
>     {
>       SELECT ?veryPopularObj (COUNT(?veryPopularObj) as ?objOccurrences)
>       WHERE
>       {
>         ?s example:pred ?veryPopularObj .
>       }
>       GROUP BY ?veryPopularObj
>     }
>     FILTER (?objOccurrences > 100)
>   }
> }
>
> -Stephen
>
> [1] http://www.w3.org/TR/sparql11-query/#negation

Reply via email to