BareFeetWare-2 wrote: > > What is your source for the data? If I have that, I can test my schema > properly, rather than sitting here entering a pile of dummy data. I looked > at IMDB, but they only seem to have a paid license download. > I am grabbing the data from the each movie imdb webpage. The script is written in bash and I can give you the code but I think it would be very complicated to adapt it to your schema.
BareFeetWare-2 wrote: > > No. Sorry, I should have included that. You count directors like this: > > select sum(Count) from "Capacity People Statistics" > where Capacity_ID = (select ID from Capacity where Name = 'director') > ; > Are you sure that this count how many directors there are? If I understand it correctly (probably I don't) you have for example table "Movie People": 1|director|1 2|director|1 3|director|2 There are 2 directors and then in "Capacity People Statistics" you should have: director|1|2 director|2|1 If you use the previous code: select sum(Count) from "Capacity People Statistics" where Capacity_ID = (select ID from Capacity where Name = 'director') ; I think it will return 3 but there are only 2 directors. BareFeetWare-2 wrote: > >> Another thing: I don't understand the purpose of the Count column in >> table >> "Capacity People Statistics" > > It hopefully now makes sense with my correction and example selects above. > Sorry, but I still don't understands it because I don't understand that you can count directors that way. BareFeetWare-2 wrote: > > I expect the above to be about the same speed or faster (since part of the > counting is already done) than separate tables, but far more flexible (eg > no need to add a table to accommodate a new capacity), and better > normalized. > But using different tables provides an instant result, you can try it with any table: SELECT count(*) from table; In the other hand you are right, it is less flexible. A more flexible and faster approach would be a table with total directors, writers, etc... but that is very complicated because I would have to count distinct directors, writers, etc... in every insert. Oh wait... I think I got it. I need two more tables: Tables: "Movie People" (movie_id capacity people_id) "Capacity People" (capacity_id people_id) "Capacity Count" (capacity count) On every insert into "Movie People" I trigger an insert into "Capacity People" and on every insert/delete from "Capacity People" I add/substract 1 from the corresponding capacity in "Capacity Count" Then I can count directors with: SELECT count FROM "Capacity Count" WHERE capacity = 'directors'; What do you think? (I didn't test it) BareFeetWare-2 wrote: > >> Another thing: I don't understand the purpose of the Count column in >> table >> "Capacity People Statistics" > > It hopefully now makes sense with my correction and example selects above. > Sorry, but I still don't understands it, of-course, this is because I don't understand that you can count directors that way. -- View this message in context: http://old.nabble.com/Speed-up-count%28distinct-col%29-tp30864622p30908962.html Sent from the SQLite mailing list archive at Nabble.com. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users