Re: group by is very slow

Yuhan Zhang Wed, 19 Sep 2012 13:04:03 -0700

oh! the query went through after a few minutes when I tested with a smaller
dataset. (7M triples)


Yuhan

On Wed, Sep 19, 2012 at 1:00 PM, Yuhan Zhang <[email protected]> wrote:

> Hi Andy,
>
> looks like the fuseki server accepted that query without syntax error in
> my case.. I'm running fuseki 0.2.4.
>
>
> The other two queries returns fast within a few seconds:
>
>
> select (count(distinct ?p) AS ?pCount) { ?s ?p ?o }
>
> ----------
> | pCount |
> ==========
> | 10401  |
> ----------
>
>
> select distinct ?p { ?s ?p ?o } limit 10
>
> -----------------------------------------------------
> | p                                                 |
> =====================================================
> | </award>                                          |
> | </award/award_nominated_work>                     |
> | </award/ranked_item>                              |
> | </base>                                           |
> | </base/animemanga>                                |
> | </base/animemanga/anime_manga_character>          |
> | </base/animemanga/topic>                          |
> | </base/argumentmaps>                              |
> | </base/argumentmaps/possibly_correlated_thing>    |
> -----------------------------------------------------
>
>
> Might be the query is too big.. it is trying to pair the given item with
> other item and count the number of categories in common.
>
> Thanks.
>
> Yuhan
>
>
> On Wed, Sep 19, 2012 at 12:37 PM, Andy Seaborne <[email protected]> wrote:
>
>> On 19/09/12 18:51, Yuhan Zhang wrote:
>>
>>> Hi all,
>>>
>>> I kept categories of videos as triples in a tdb in the format of
>>> (?video_id
>>> ?category ?score)
>>> I'd like to find videos with similar categories given one video id.
>>>
>>> select ?video_2 COUNT(*)
>>> where {
>>>   <http://onescreen.com/video/**2901760<http://onescreen.com/video/2901760>>
>>> ?c ?score_1 .
>>>   ?video_2 ?c ?score_2 .
>>> }
>>> group by ?video_2
>>>   limit 100
>>>
>>
>> Illegal syntax?
>>
>>
>>  However, this query with a group by was really slow and never completed.
>>> There are about 21M triples in the same tdb.
>>> The response was pretty fast when querying without a group by.
>>>
>>> How could I make thie query faster? Is SPARQL the right tool for this?
>>>
>>
>> You data modelling looks somewhat unusual.  A join across the predicate
>> (?c) is likely to cause an explosion in possibilities.
>>
>> The LIMIT 100 applies after grouping - and the groups are likely huge.
>>
>> What is
>>
>> select (count(distinct ?p) AS ?pCount) { ?s ?p ?o }
>>
>> select distinct ?p { ?s ?p ?o } limit 10
>>
>>         Andy
>>
>>>
>>>
>>> Thank you.
>>>
>>> Yuhan
>>>
>>>
>>
>
>
> --
> Yuhan Zhang
> Senior Software Engineer
> OneScreen Inc.
> [email protected] <[email protected]>
> www.onescreen.com
> (949) 525-4825 Ext: 177
>
>
> The information contained in this e-mail is for the exclusive use of the
> intended recipient(s) and may be confidential, proprietary, and/or legally
> privileged. Inadvertent disclosure of this message does not constitute a
> waiver of any privilege.  If you receive this message in error, please do
> not directly or indirectly print, copy, retransmit, disseminate, or
> otherwise use the information. In addition, please delete this e-mail and
> all copies and notify the sender.
>



-- 
Yuhan Zhang
Senior Software Engineer
OneScreen Inc.
[email protected] <[email protected]>
www.onescreen.com
(949) 525-4825 Ext: 177


The information contained in this e-mail is for the exclusive use of the
intended recipient(s) and may be confidential, proprietary, and/or legally
privileged. Inadvertent disclosure of this message does not constitute a
waiver of any privilege.  If you receive this message in error, please do
not directly or indirectly print, copy, retransmit, disseminate, or
otherwise use the information. In addition, please delete this e-mail and
all copies and notify the sender.

Re: group by is very slow

Reply via email to