BareFeetWare-2 wrote:
>
> Oh, I see, so the "complication" is that you would have to change the
> columns used in your bash script. That shouldn't be a big problem, but
> I'll leave the bash script to you.
>
It isn't a big problem, the "complication" was to adapt all the tables and
inserts and
> IMDb Pro offers additional information and services, all the information I
> am grabbing is already available to the public and I am not publishing this
> information, there are a lot of programs that do this.
OK, good to know, thanks.
>>> The script is written in bash and I can give you the
BareFeetWare-2 wrote:
>
> Does IMDB allow use of their data this way? After my brief reading of
> their site, I thought they charge a $15k minimum per year for data.
>
IMDb Pro offers additional information and services, all the information I
am grabbing is already available to the public and
On Feb 13, 2011, at 3:38 PM, BareFeetWare wrote:
> On 13/02/2011, at 1:04 AM, Yuzem wrote:
>
>> I am grabbing the data from the each movie imdb webpage.
>
> Does IMDB allow use of their data this way? After my brief reading of their
> site, I thought they charge a
On 13/02/2011, at 1:04 AM, Yuzem wrote:
> I am grabbing the data from the each movie imdb webpage.
Does IMDB allow use of their data this way? After my brief reading of their
site, I thought they charge a $15k minimum per year for data.
> The script is written in bash and I can give you the
BareFeetWare-2 wrote:
>
> What is your source for the data? If I have that, I can test my schema
> properly, rather than sitting here entering a pile of dummy data. I looked
> at IMDB, but they only seem to have a paid license download.
>
I am grabbing the data from the each movie imdb
On 11/02/2011, at 11:51 PM, Yuzem wrote:
>> you want the number of each unique (capacity, person). No problem. We'll
>> just add a People column to the statistics, and change the uniqueness
>> constraint.
Hmm, seems I didn't quite change the uniqueness constraint. I meant to allow
multiple
BareFeetWare-2 wrote:
>
> Oh, I see. I was counting the total number of each unique capacity
> (including director, writers), but you want the number of each unique
> (capacity, person). No problem. We'll just add a People column to the
> statistics, and change the uniqueness constraint.
>
>
On 11 Feb 2011, at 2:37am, Yuzem wrote:
> Simon Slavin-3 wrote:
>>
>
>> By looking at the file on disk ? Are you taking into account the journal
>> file ?
>>
>
> Yes, I do all the counts
So if I deleted one record and created another you wouldn't spot it ?
> and save the data to a file
On 11/02/2011, at 1:37 PM, Yuzem wrote:
> For example lets say I have the following data:
> 1|director|1
> 2|director|2
> 3|director|1
>
> In this example the total count for directors is 2, I have two distinct
> directors.
> In the table "Capacity Statistics" I will have:
> director|2
>
> The
Simon Slavin-3 wrote:
>
> By looking at the file on disk ? Are you taking into account the journal
> file ?
>
Yes, I do all the counts and save the data to a file and then if the file is
newer then the database I use the file else I count again.
No, I am not taking the journal file into
On 11/02/2011, at 11:30 AM, Simon Slavin wrote:
> Come to think of it, what's really happening here is a FOREIGN KEY situation.
Yes, that's why I have the foreign keys (ie "references") in the schema. So,
for instance, if you delete a movie, all of the actors, directors etc
associated with
On 11 Feb 2011, at 12:26am, BareFeetWare wrote:
> On 11/02/2011, at 11:11 AM, Simon Slavin wrote:
>
>> Erm ... there are also movies which have more than one person directing.
>> You need to copy across the rowid from the MoviePeople table, and delete
>> just based on that particular record.
On 11/02/2011, at 11:11 AM, Simon Slavin wrote:
> Erm ... there are also movies which have more than one person directing. You
> need to copy across the rowid from the MoviePeople table, and delete just
> based on that particular record.
Adding to what Simon said:
The schema I posted allows
On 10 Feb 2011, at 11:55pm, Yuzem wrote:
> That's the beauty of using a additional table. I make the column unique in
> the the extra table and then on any insert in "movies people" I insert in
> for example the directors table and any duplicate will be automatically
> rejected.
>
> The same
> BareFeetWare-2 wrote:
>>
>> In that case, you should cache the counts in a separate table or two. That
>> has a negligible overhead when you add a movie (which is infrequent), and
>> basically no overhead when viewing (which is frequent).
> I am doing that but in the application level, the down
On 10 Feb 2011, at 11:55pm, Yuzem wrote:
> I check the modified time of the database.
By looking at the file on disk ? Are you taking into account the journal file ?
Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
BareFeetWare-2 wrote:
>
> In that case, you should cache the counts in a separate table or two. That
> has a negligible overhead when you add a movie (which is infrequent), and
> basically no overhead when viewing (which is frequent).
I am doing that but in the application level, the down side
Oops, I should have said old instead of new in a couple of places:
> begin immediate
> ;
> create table "Capacity Statistics"
> ( ID integer primary key unique references "Capacity" (ID) on delete
> cascade
> , Count integer not null
> )
> ;
> insert into "Capacity Statistics" (ID,
On 11/02/2011, at 9:40 AM, Yuzem wrote:
> Yes, in my application I have in the sidebar all those sections (movies,
> years, tags, keywords, actors, directors, writers, etc...) and I count each
> one, how many movies, how many years, etc...
> It isn't very slow if I update only one item but the
Simon Slavin-3 wrote:
>
> How much slower. Did you make an index SQLite could use for that query ?
>
Using distinct isn't slow, it is what I would expect but count(*) is
incredibly fast, it is instantaneous no matter how large is the table.
Yes, I tried with an index.
BareFeetWare-2 wrote:
On Feb 10, 2011, at 5:00 PM, Yuzem wrote:
> The only thing I can think of is to have additional tables for the ids of
> all directors, all writers, etc...
> Tables "movies", "people", "capacity" and then tables "directors",
> "writers", etc.. with only the IDs so I can count using count(*) which
> Thanks you but what I want to do is to count without using count(distinct
> col) because it is much slower than count(*).
I think you'll find the schema I posted very fast, since it's running
everything, including distinct, on primary key columns.
Or you can try this:
select count(*) from
On 10 Feb 2011, at 4:00pm, Yuzem wrote:
> Thanks you but what I want to do is to count without using count(distinct
> col) because it is much slower than count(*).
How much slower. Did you make an index SQLite could use for that query ? A
good index for that might be
People_ID,Capacity_ID
BareFeetWare-2 wrote:
>
> Then you can count the directors like this:
>
> select count(distinct People_ID) from "Movie People" join Capacity on
> "Movie People".Capacity_ID = Capacity.ID where Capacity.Name = 'director';
>
> or:
>
> select count(distinct People_ID) from "Movie People" where
On 08/02/2011, at 10:19 AM, Yuzem wrote:
> I have the tables "movies" and "people"
> Those two tables are related by tables "directors", "writers", etc...
> movies: id, title
> people: id, name
>
> directors: movieId, peopleId
> writers: movieId, peopleId
> etc...
>
> How can I normalize that
You have:
CREATE TABLE movies(movie_id INTEGER, title TEXT, unique(movie_id));
CREATE TABLE tags(movie_id INTEGER, tag TEXT, unique(movie_id,tag));
You can solve your problem, using pure SQL. No need to resort to the
application layer. Just execute the SQL transaction below. It takes care of
On Feb 7, 2011, at 9:32 PM, Yuzem wrote:
>> Hey... sounds like IMdb :P
>>
> Yes, I'm coding a http://yuzem.blogspot.com/p/figuritas-screenshots.html
> movie manager that grabs the info from imdb.
A fine hobby :)
> I thought sqlite didn't handle foreign keys correctly,
I thought wrong :)
I would create a tagList table (integer tagID, string tagName,
unique(tagName))
Before performing your batch of inserts, query the tagList table (integer
tagID, string tagName), and generate a map (key on tagName, value of tagID).
For each tag you want to insert, see if it exists in the map. If
Yuzem wrote:
> I thought sqlite didn't handle foreign keys correctly, I am using
> triggers to automatically delete tags when a movie gets deleted.
There's a new thing, 'PRAGMA foreign_keys=1', that causes SQLite to
enforce them. Check out the docs on that.
> Anyway, to use integers in
Petite Abeille-2 wrote:
>
> Hey... sounds like IMdb :P
>
Yes, I'm coding a http://yuzem.blogspot.com/p/figuritas-screenshots.html
movie manager that grabs the info from imdb.
Petite Abeille-2 wrote:
>
> In any case, as mentioned by Michael Black, you might benefit greatly by
> normalizing
On Feb 7, 2011, at 4:26 PM, Yuzem wrote:
>
> I have many movies by tag and many tables like "tags" (keywords, countries,
> languages, genres, etc..)
Hey... sounds like IMdb :P
> This solution implies one additional table and two triggers by each table.
To paraphrase:
Some people, when
Hi.
I have this tables:
CREATE TABLE movies(movie_id INTEGER, title TEXT, unique(movie_id));
CREATE TABLE tags(movie_id INTEGER, tag TEXT, unique(movie_id,tag));
I have many movies by tag and many tables like "tags" (keywords, countries,
languages, genres, etc..)
I can count different movies
33 matches
Mail list logo