Re: [sqlite] Speed up count(distinct col)

Yuzem Sat, 12 Feb 2011 06:05:10 -0800


BareFeetWare-2 wrote:
> 
> What is your source for the data? If I have that, I can test my schema
> properly, rather than sitting here entering a pile of dummy data. I looked
> at IMDB, but they only seem to have a paid license download.
> 
I am grabbing the data from the each movie imdb webpage. The script is
written in bash and I can give you the code but I think it would be very
complicated to adapt it to your schema.



BareFeetWare-2 wrote:
> 
> No. Sorry, I should have included that. You count directors like this:
> 
> select sum(Count) from "Capacity People Statistics"
> where Capacity_ID = (select ID from Capacity where Name = 'director')
> ;
> 
Are you sure that this count how many directors there are?
If I understand it correctly (probably I don't) you have for example table
"Movie People":
1|director|1
2|director|1
3|director|2

There are 2 directors and then in "Capacity People Statistics" you should
have:
director|1|2
director|2|1

If you use the previous code:
select sum(Count) from "Capacity People Statistics"
where Capacity_ID = (select ID from Capacity where Name = 'director')
;

I think it will  return 3 but there are only 2 directors.


BareFeetWare-2 wrote:
> 
>> Another thing: I don't understand the purpose of the Count column in
>> table
>> "Capacity People Statistics"
> 
> It hopefully now makes sense with my correction and example selects above.
> 
Sorry, but I still don't understands it because I don't understand that you
can count directors that way.


BareFeetWare-2 wrote:
> 
> I expect the above to be about the same speed or faster (since part of the
> counting is already done) than separate tables, but far more flexible (eg
> no need to add a table to accommodate a new capacity), and better
> normalized.
> 
But using different tables provides an instant result, you can try it with
any table:
SELECT count(*) from table;

In the other hand you are right, it is less flexible.
A more flexible and faster approach would be a table with total directors,
writers, etc... but that is very complicated because I would have to count
distinct directors, writers, etc... in every insert.

Oh wait... I think I got it. I need two more tables:
Tables:
"Movie People" (movie_id capacity people_id)
"Capacity People" (capacity_id people_id)
"Capacity Count" (capacity count)

On every insert into "Movie People" I trigger an insert into "Capacity
People" and on every insert/delete from "Capacity People" I add/substract 1
from the corresponding capacity in "Capacity Count"

Then I can count directors with:
SELECT count FROM "Capacity Count" WHERE capacity = 'directors';

What do you think? (I didn't test it)


BareFeetWare-2 wrote:
> 
>> Another thing: I don't understand the purpose of the Count column in
>> table
>> "Capacity People Statistics"
> 
> It hopefully now makes sense with my correction and example selects above.
> 
Sorry, but I still don't understands it, of-course, this is because I don't
understand that you can count directors that way.
-- 
View this message in context: 
http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30908962.html
Sent from the SQLite mailing list archive at Nabble.com.

_______________________________________________
sqlite-users mailing list
[email protected]
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: [sqlite] Speed up count(distinct col)

Reply via email to