Re: [sqlite] Speed up count(distinct col)

2011-02-17 Thread Yuzem
BareFeetWare-2 wrote: > > Oh, I see, so the "complication" is that you would have to change the > columns used in your bash script. That shouldn't be a big problem, but > I'll leave the bash script to you. > It isn't a big problem, the "complication" was to adapt all the tables and inserts and

Re: [sqlite] Speed up count(distinct col)

2011-02-15 Thread BareFeetWare
> IMDb Pro offers additional information and services, all the information I > am grabbing is already available to the public and I am not publishing this > information, there are a lot of programs that do this. OK, good to know, thanks. >>> The script is written in bash and I can give you the

Re: [sqlite] Speed up count(distinct col)

2011-02-15 Thread Yuzem
BareFeetWare-2 wrote: > > Does IMDB allow use of their data this way? After my brief reading of > their site, I thought they charge a $15k minimum per year for data. > IMDb Pro offers additional information and services, all the information I am grabbing is already available to the public and

Re: [sqlite] Speed up count(distinct col)

2011-02-13 Thread Random Coder
On Feb 13, 2011, at 3:38 PM, BareFeetWare wrote: > On 13/02/2011, at 1:04 AM, Yuzem wrote: > >> I am grabbing the data from the each movie imdb webpage. > > Does IMDB allow use of their data this way? After my brief reading of their > site, I thought they charge a

Re: [sqlite] Speed up count(distinct col)

2011-02-13 Thread BareFeetWare
On 13/02/2011, at 1:04 AM, Yuzem wrote: > I am grabbing the data from the each movie imdb webpage. Does IMDB allow use of their data this way? After my brief reading of their site, I thought they charge a $15k minimum per year for data. > The script is written in bash and I can give you the

Re: [sqlite] Speed up count(distinct col)

2011-02-12 Thread Yuzem
BareFeetWare-2 wrote: > > What is your source for the data? If I have that, I can test my schema > properly, rather than sitting here entering a pile of dummy data. I looked > at IMDB, but they only seem to have a paid license download. > I am grabbing the data from the each movie imdb

Re: [sqlite] Speed up count(distinct col)

2011-02-11 Thread BareFeetWare
On 11/02/2011, at 11:51 PM, Yuzem wrote: >> you want the number of each unique (capacity, person). No problem. We'll >> just add a People column to the statistics, and change the uniqueness >> constraint. Hmm, seems I didn't quite change the uniqueness constraint. I meant to allow multiple

Re: [sqlite] Speed up count(distinct col)

2011-02-11 Thread Yuzem
BareFeetWare-2 wrote: > > Oh, I see. I was counting the total number of each unique capacity > (including director, writers), but you want the number of each unique > (capacity, person). No problem. We'll just add a People column to the > statistics, and change the uniqueness constraint. > >

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread Simon Slavin
On 11 Feb 2011, at 2:37am, Yuzem wrote: > Simon Slavin-3 wrote: >> > >> By looking at the file on disk ? Are you taking into account the journal >> file ? >> > > Yes, I do all the counts So if I deleted one record and created another you wouldn't spot it ? > and save the data to a file

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread BareFeetWare
On 11/02/2011, at 1:37 PM, Yuzem wrote: > For example lets say I have the following data: > 1|director|1 > 2|director|2 > 3|director|1 > > In this example the total count for directors is 2, I have two distinct > directors. > In the table "Capacity Statistics" I will have: > director|2 > > The

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread Yuzem
Simon Slavin-3 wrote: > > By looking at the file on disk ? Are you taking into account the journal > file ? > Yes, I do all the counts and save the data to a file and then if the file is newer then the database I use the file else I count again. No, I am not taking the journal file into

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread BareFeetWare
On 11/02/2011, at 11:30 AM, Simon Slavin wrote: > Come to think of it, what's really happening here is a FOREIGN KEY situation. Yes, that's why I have the foreign keys (ie "references") in the schema. So, for instance, if you delete a movie, all of the actors, directors etc associated with

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread Simon Slavin
On 11 Feb 2011, at 12:26am, BareFeetWare wrote: > On 11/02/2011, at 11:11 AM, Simon Slavin wrote: > >> Erm ... there are also movies which have more than one person directing. >> You need to copy across the rowid from the MoviePeople table, and delete >> just based on that particular record.

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread BareFeetWare
On 11/02/2011, at 11:11 AM, Simon Slavin wrote: > Erm ... there are also movies which have more than one person directing. You > need to copy across the rowid from the MoviePeople table, and delete just > based on that particular record. Adding to what Simon said: The schema I posted allows

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread Simon Slavin
On 10 Feb 2011, at 11:55pm, Yuzem wrote: > That's the beauty of using a additional table. I make the column unique in > the the extra table and then on any insert in "movies people" I insert in > for example the directors table and any duplicate will be automatically > rejected. > > The same

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread BareFeetWare
> BareFeetWare-2 wrote: >> >> In that case, you should cache the counts in a separate table or two. That >> has a negligible overhead when you add a movie (which is infrequent), and >> basically no overhead when viewing (which is frequent). > I am doing that but in the application level, the down

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread Simon Slavin
On 10 Feb 2011, at 11:55pm, Yuzem wrote: > I check the modified time of the database. By looking at the file on disk ? Are you taking into account the journal file ? Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread Yuzem
BareFeetWare-2 wrote: > > In that case, you should cache the counts in a separate table or two. That > has a negligible overhead when you add a movie (which is infrequent), and > basically no overhead when viewing (which is frequent). I am doing that but in the application level, the down side

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread BareFeetWare
Oops, I should have said old instead of new in a couple of places: > begin immediate > ; > create table "Capacity Statistics" > ( ID integer primary key unique references "Capacity" (ID) on delete > cascade > , Count integer not null > ) > ; > insert into "Capacity Statistics" (ID,

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread BareFeetWare
On 11/02/2011, at 9:40 AM, Yuzem wrote: > Yes, in my application I have in the sidebar all those sections (movies, > years, tags, keywords, actors, directors, writers, etc...) and I count each > one, how many movies, how many years, etc... > It isn't very slow if I update only one item but the

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread Yuzem
Simon Slavin-3 wrote: > > How much slower. Did you make an index SQLite could use for that query ? > Using distinct isn't slow, it is what I would expect but count(*) is incredibly fast, it is instantaneous no matter how large is the table. Yes, I tried with an index. BareFeetWare-2 wrote:

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread Petite Abeille
On Feb 10, 2011, at 5:00 PM, Yuzem wrote: > The only thing I can think of is to have additional tables for the ids of > all directors, all writers, etc... > Tables "movies", "people", "capacity" and then tables "directors", > "writers", etc.. with only the IDs so I can count using count(*) which

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread BareFeetWare
> Thanks you but what I want to do is to count without using count(distinct > col) because it is much slower than count(*). I think you'll find the schema I posted very fast, since it's running everything, including distinct, on primary key columns. Or you can try this: select count(*) from

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread Simon Slavin
On 10 Feb 2011, at 4:00pm, Yuzem wrote: > Thanks you but what I want to do is to count without using count(distinct > col) because it is much slower than count(*). How much slower. Did you make an index SQLite could use for that query ? A good index for that might be People_ID,Capacity_ID

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread Yuzem
BareFeetWare-2 wrote: > > Then you can count the directors like this: > > select count(distinct People_ID) from "Movie People" join Capacity on > "Movie People".Capacity_ID = Capacity.ID where Capacity.Name = 'director'; > > or: > > select count(distinct People_ID) from "Movie People" where

Re: [sqlite] Speed up count(distinct col)

2011-02-10 Thread BareFeetWare
On 08/02/2011, at 10:19 AM, Yuzem wrote: > I have the tables "movies" and "people" > Those two tables are related by tables "directors", "writers", etc... > movies: id, title > people: id, name > > directors: movieId, peopleId > writers: movieId, peopleId > etc... > > How can I normalize that

Re: [sqlite] Speed up count(distinct col)

2011-02-08 Thread BareFeetWare
You have: CREATE TABLE movies(movie_id INTEGER, title TEXT, unique(movie_id)); CREATE TABLE tags(movie_id INTEGER, tag TEXT, unique(movie_id,tag)); You can solve your problem, using pure SQL. No need to resort to the application layer. Just execute the SQL transaction below. It takes care of

Re: [sqlite] Speed up count(distinct col)

2011-02-07 Thread Petite Abeille
On Feb 7, 2011, at 9:32 PM, Yuzem wrote: >> Hey... sounds like IMdb :P >> > Yes, I'm coding a http://yuzem.blogspot.com/p/figuritas-screenshots.html > movie manager that grabs the info from imdb. A fine hobby :) > I thought sqlite didn't handle foreign keys correctly, I thought wrong :)

Re: [sqlite] Speed up count(distinct col)

2011-02-07 Thread Josh Marell
I would create a tagList table (integer tagID, string tagName, unique(tagName)) Before performing your batch of inserts, query the tagList table (integer tagID, string tagName), and generate a map (key on tagName, value of tagID). For each tag you want to insert, see if it exists in the map. If

Re: [sqlite] Speed up count(distinct col)

2011-02-07 Thread Eric Smith
Yuzem wrote: > I thought sqlite didn't handle foreign keys correctly, I am using > triggers to automatically delete tags when a movie gets deleted. There's a new thing, 'PRAGMA foreign_keys=1', that causes SQLite to enforce them. Check out the docs on that. > Anyway, to use integers in

Re: [sqlite] Speed up count(distinct col)

2011-02-07 Thread Yuzem
Petite Abeille-2 wrote: > > Hey... sounds like IMdb :P > Yes, I'm coding a http://yuzem.blogspot.com/p/figuritas-screenshots.html movie manager that grabs the info from imdb. Petite Abeille-2 wrote: > > In any case, as mentioned by Michael Black, you might benefit greatly by > normalizing

Re: [sqlite] Speed up count(distinct col)

2011-02-07 Thread Petite Abeille
On Feb 7, 2011, at 4:26 PM, Yuzem wrote: > > I have many movies by tag and many tables like "tags" (keywords, countries, > languages, genres, etc..) Hey... sounds like IMdb :P > This solution implies one additional table and two triggers by each table. To paraphrase: Some people, when

[sqlite] Speed up count(distinct col)

2011-02-07 Thread Yuzem
Hi. I have this tables: CREATE TABLE movies(movie_id INTEGER, title TEXT, unique(movie_id)); CREATE TABLE tags(movie_id INTEGER, tag TEXT, unique(movie_id,tag)); I have many movies by tag and many tables like "tags" (keywords, countries, languages, genres, etc..) I can count different movies