Re: [Imdbpy-help] imdbpy to mysql help

2013-02-24 Thread D L



> Date: Sun, 24 Feb 2013 11:52:45 +0100
> Subject: Re: [Imdbpy-help] imdbpy to mysql help
> From: davide.alber...@gmail.com
> To: dlm...@hotmail.com
> CC: imdbpy-help@lists.sourceforge.net
> 
> On Sun, Feb 24, 2013 at 12:32 AM, D L  wrote:
> >
> > Ok, well here's an update. I just let the foreign keys run for a little over
> > a full day and it actually completed for mysql:
> > # TIME FINAL : 1883min, 1sec (wall) 23min, 57sec (user) 0min, 5sec (system)
> 
> I see.
> I've just run it with a subset of the db (1% taken from each file) and
> my numbers are:
> # TIME TOTAL TIME TO INSERT/WRITE DATA : 12min, 18sec (wall) 5min,
> 23sec (user) 0min, 43sec (system)
> building database indexes (this may take a while)
> # TIME createIndexes() : 1min, 25sec (wall) 0min, 0sec (user) 0min,
> 0sec (system)
> adding foreign keys (this may take a while)
> # TIME createForeignKeys() : 10min, 2sec (wall) 0min, 0sec (user)
> 0min, 0sec (system)
> RESTORING imdbIDs values for movies... DONE! (restored 0 entries out of 0)
> # TIME restore movies : 0min, 0sec (wall) 0min, 0sec (user) 0min, 0sec 
> (system)
> RESTORING imdbIDs values for people... DONE! (restored 0 entries out of 0)
> # TIME restore people : 0min, 0sec (wall) 0min, 0sec (user) 0min, 0sec 
> (system)
> RESTORING imdbIDs values for characters... DONE! (restored 0 entries out of 0)
> # TIME restore characters : 0min, 0sec (wall) 0min, 0sec (user) 0min,
> 0sec (system)
> RESTORING imdbIDs values for companies... DONE! (restored 0 entries out of 0)
> # TIME restore companies : 0min, 0sec (wall) 0min, 0sec (user) 0min,
> 0sec (system)
> # TIME FINAL : 23min, 45sec (wall) 5min, 23sec (user) 0min, 43sec (system)
> 
> What kind of CPU/RAM/disk have you used?

I'm doing it on a laptop that has an i3 2.53GHZ, 4GB DDR3, with about 200gb of 
space (not SSD). I'm planning to just try to get everything working on my 
laptop before I buy web hosting to put it on there.


> > One of my main questions  right now is the difference in results between the
> > web search and the sql search. For example, if I ran a search on all the
> > movies that Denzel Washington has acted in via the web search, it basically
> > outputs all  the main ones,
> 
> Yep, they are just grouped in a different way.
> It would be not easy for us (even if it's not impossible, I guess) to identify
> alle the various categories used on the web and the rules used to categorize
> the movies, but...
> 
> For the moment, I think you could take the whole filmography and search
> for tv series and/or movies in which an actor is playing Himself (or anything
> that starts with Himself/Herself/Themselves)

Yeah, but filtering that may require even more processing time..?

> > And I haven't tested it that much, but it appears that sqlite and mysql have
> > roughly the same speeds in running these queries, but I'm not completely
> > sure yet.
> 
> I expect them to be comparable in speed, but not to be slower than a
> web search. :-/

They may (hopefully) be faster once I get it up on a web hosting machine 
instead of my laptop. 

> --
> Davide Alberani   [PGP KeyID: 0x465BFD47]
> http://www.mimante.net/
  --
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy to mysql help

2013-02-23 Thread D L

Ok, well here's an update. I just let the foreign keys run for a little over a 
full day and it actually completed for mysql: 
# TIME FINAL : 1883min, 1sec (wall) 23min, 57sec (user) 0min, 5sec (system)

One of my main questions  right now is the difference in results between the 
web search and the sql search. For example, if I ran a search on all the movies 
that Denzel Washington has acted in via the web search, it basically outputs 
all  the main ones, whereas if I do it via the sql search it will include a lot 
of random stuff like award ceremonies and random tv shows that he may have had 
a cameo on. How would I make the sql search more like the web search so that it 
excludes stuff like award ceremonies and only outputs the main movies?
And I haven't tested it that much, but it appears that sqlite and mysql have 
roughly the same speeds in running these queries, but I'm not completely sure 
yet. 

From: dlm...@hotmail.com
To: davide.alber...@gmail.com; imdbpy-help@lists.sourceforge.net
Subject: RE: [Imdbpy-help] imdbpy to mysql help
Date: Fri, 22 Feb 2013 00:10:28 -0800





So after updating those dependencies, the MySQL still gets stuck on the foreign 
keys section, however sqlite actually manages to finish. But one of my concerns 
is that even the requests with sqlite can be slow the first time, and on 
occasion the web access was a lot faster than using the sqlite. For example, 
the search_person script is faster via the web, but if I run it twice 
(searching the same person) using the sql database, the 2nd time is noticeably 
much faster, most likely due to the data already being cached. My question is 
how fast does something like search_person take on MySQL (if I can eventually 
get it to work), since using sqlite seems like it's slower than just going the 
web route so far.

From: dlm...@hotmail.com
To: davide.alber...@gmail.com
Subject: RE: [Imdbpy-help] imdbpy to mysql help
Date: Tue, 19 Feb 2013 17:54:16 -0800










> Date: Tue, 19 Feb 2013 21:28:18 +0100
> Subject: Re: [Imdbpy-help] imdbpy to mysql help
> From: davide.alber...@gmail.com
> To: dlm...@hotmail.com
> CC: imdbpy-help@lists.sourceforge.net
> 
> On Sun, Feb 17, 2013 at 11:45 PM, D L  wrote:
> >
> > Yeah tried that and ran it overnight, still no luck - it gets stuck on the
> > foreign keys part. I'm just trying this on my laptop, so I may just proceed
> > with using the web access for the data. Once I get everything set up for a
> > web hosting, I may try other databases such as sqlite to see if that works.
> 
> D'oh! :(
> Versions of:
> - IMDbPY
> - SQLAlchemy
> - SQLObject
> - MySQL
> - python-mysqldb
> - python-migrate
> ?

IMDbPY - 5.0dev20130210
SQLAlchemy - 0.8.0b2
SQLObject - 1.3.2
MySQL - Server version: 5.5.29-0ubuntu0.12.04.1 (Ubuntu)
python-mysqldb - 1.2.3
python-migrate - 0.7.2 

Both my python-mysqldb and python-migrate were older versions, which I just 
updated as I typed this. I tried the process with sqlite a night ago and it was 
stuck on the foreign keys section as well, I will try it again now that mysqldb 
and migrate have been updated and hopefully it will work. I also wrote a rough 
script for the data retrieval using the webaccess method, and you're right it 
does take a while. 

> Anyway, if you interrupt it while it's creating the foreign key, maybe
> you can try to see which were already created, and add the missing
> one following the scheme you can find in imdb/parser/sql/dbschema.py
> 
> Anyway, obviously I'll try to reproduce the problem, since it's not
> nice at all. :-/

Hopefully, the updated mysqldb and migrate would fix it, but we'll see. 
 
> --
> Davide Alberani   [PGP KeyID: 0x465BFD47]
> http://www.mimante.net/


  --
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy to mysql help

2013-02-22 Thread D L

So after updating those dependencies, the MySQL still gets stuck on the foreign 
keys section, however sqlite actually manages to finish. But one of my concerns 
is that even the requests with sqlite can be slow the first time, and on 
occasion the web access was a lot faster than using the sqlite. For example, 
the search_person script is faster via the web, but if I run it twice 
(searching the same person) using the sql database, the 2nd time is noticeably 
much faster, most likely due to the data already being cached. My question is 
how fast does something like search_person take on MySQL (if I can eventually 
get it to work), since using sqlite seems like it's slower than just going the 
web route so far.

From: dlm...@hotmail.com
To: davide.alber...@gmail.com
Subject: RE: [Imdbpy-help] imdbpy to mysql help
Date: Tue, 19 Feb 2013 17:54:16 -0800










> Date: Tue, 19 Feb 2013 21:28:18 +0100
> Subject: Re: [Imdbpy-help] imdbpy to mysql help
> From: davide.alber...@gmail.com
> To: dlm...@hotmail.com
> CC: imdbpy-help@lists.sourceforge.net
> 
> On Sun, Feb 17, 2013 at 11:45 PM, D L  wrote:
> >
> > Yeah tried that and ran it overnight, still no luck - it gets stuck on the
> > foreign keys part. I'm just trying this on my laptop, so I may just proceed
> > with using the web access for the data. Once I get everything set up for a
> > web hosting, I may try other databases such as sqlite to see if that works.
> 
> D'oh! :(
> Versions of:
> - IMDbPY
> - SQLAlchemy
> - SQLObject
> - MySQL
> - python-mysqldb
> - python-migrate
> ?

IMDbPY - 5.0dev20130210
SQLAlchemy - 0.8.0b2
SQLObject - 1.3.2
MySQL - Server version: 5.5.29-0ubuntu0.12.04.1 (Ubuntu)
python-mysqldb - 1.2.3
python-migrate - 0.7.2 

Both my python-mysqldb and python-migrate were older versions, which I just 
updated as I typed this. I tried the process with sqlite a night ago and it was 
stuck on the foreign keys section as well, I will try it again now that mysqldb 
and migrate have been updated and hopefully it will work. I also wrote a rough 
script for the data retrieval using the webaccess method, and you're right it 
does take a while. 

> Anyway, if you interrupt it while it's creating the foreign key, maybe
> you can try to see which were already created, and add the missing
> one following the scheme you can find in imdb/parser/sql/dbschema.py
> 
> Anyway, obviously I'll try to reproduce the problem, since it's not
> nice at all. :-/

Hopefully, the updated mysqldb and migrate would fix it, but we'll see. 
 
> --
> Davide Alberani   [PGP KeyID: 0x465BFD47]
> http://www.mimante.net/


  --
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy to mysql help

2013-02-17 Thread D L



> Date: Sun, 17 Feb 2013 09:57:50 +0100
> Subject: Re: [Imdbpy-help] imdbpy to mysql help
> From: davide.alber...@gmail.com
> To: dlm...@hotmail.com
> CC: imdbpy-help@lists.sourceforge.net
> 
> On Sat, Feb 16, 2013 at 9:21 PM, D L  wrote:
> >
> > Yeah the adding foreign keys is still going, and when I run top it seems
> > like mysql is actually doing stuff with the CPU. I'm most likely going to
> > rerun it and hope it works again in a timely fashion.
> 
> Doesn't looks good. :-(
> Can you try again using the SQLAlchemy ORM?
> Basically, you have to install it (if not already present on your system),
> and add to the command line of imdbpy2sql.py: -o sqlalchemy

Yeah tried that and ran it overnight, still no luck - it gets stuck on the 
foreign keys part. I'm just trying this on my laptop, so I may just proceed 
with using the web access for the data. Once I get everything set up for a web 
hosting, I may try other databases such as sqlite to see if that works. 

> >> Re-run the script with the new dataset. No other way.
> >> imdbIDs are (hopefully... se above) preserved between runs.
> >
> > Alright, will the script pass through the ones already in the database and
> > be faster, or would it require the same amount of time?
> 
> The same time, sorry.  No way to do otherwise, trust me. :-)
> 
> > Do most apps that have been made with imdbpy use the local or web access for
> > data?
> 
> Hmmm I guess that most of the person who does some kind of
> analysis/heavy use of the data, uses the SQL access.
> Plugins of media centers, small scripts and so on, mostly uses
> the web access.
> 
> > Also what defines public redistribution, what I had in mind was
> > something along the lines of having someone input a request for say an
> > actor, and I have a script that spits back out a bunch of data/graphs using
> > the imdb info, would that be allowed?
> 
> I'm not a lawyer and so my opinion is worth about zero, but... :-)
> I guess that if you just process the data, show the result of this processing
> and so on (i.e. you do some transformation on it, not just printing it out
> exactly as taken from the db), you're on the safe side.
> Also, don't forget to put links to the imdb.com site and a footer which
> explain the copyright of the data.
> 
Alright I guess I shouldn't have a problem with that then. 

> --
> Davide Alberani   [PGP KeyID: 0x465BFD47]
> http://www.mimante.net/
  --
The Go Parallel Website, sponsored by Intel - in partnership with Geeknet, 
is your hub for all things parallel software development, from weekly thought 
leadership blogs to news, videos, case studies, tutorials, tech docs, 
whitepapers, evaluation guides, and opinion stories. Check out the most 
recent posts - join the conversation now. http://goparallel.sourceforge.net/___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


Re: [Imdbpy-help] imdbpy to mysql help

2013-02-16 Thread D L



> Date: Sat, 16 Feb 2013 15:35:43 +0100
> Subject: Re: [Imdbpy-help] imdbpy to mysql help
> From: davide.alber...@gmail.com
> To: dlm...@hotmail.com
> CC: imdbpy-help@lists.sourceforge.net
> 
> On Sat, Feb 16, 2013 at 3:01 AM, D L  wrote:
> >
> > The adding foreign keys bit has been taking roughly 8 hours. Should I
> > restart the whole process or wait it out?
> 
> Seems really slow.
> Is the db actually doing something?
> 
> Anyway, the creation of indexes/foreign keys and the store/restore
> of imdbIDs at db updates seems to be slightly broken.
> Any help fixing it is welcome.

Yeah the adding foreign keys is still going, and when I run top it seems like 
mysql is actually doing stuff with the CPU. I'm most likely going to rerun it 
and hope it works again in a timely fashion.

> > Once you have the database set up, is there a simple way to update it with
> > the imdb text files they routinely release, or would you have to rerun the
> > script with the new files?
> 
> Re-run the script with the new dataset.  No other way.
> imdbIDs are (hopefully... se above) preserved between runs.

Alright, will the script pass through the ones already in the database and be 
faster, or would it require the same amount of time?

> > If I'm making a webapp which could potentially receive a lot of requests,
> > it'd be optimal to fetch the requests from the local database instead of
> > through the web requests am I correct?
> 
> You're correct.
> 
> > Since the web requests scrapes the imdb pages and imdb frowns on that?
> 
> Main point, it's much slower.
> 
> Anyway, no matter what data you access (local or remote), I'm pretty sure
> that the lIMDb license forbid you to use it for anything that is not
> personal *and*
> non-commercial.
> I.e.: no money (not even *saved* money) from it, and no public redistribution
> of the data.
> 
> HTH.

Do most apps that have been made with imdbpy use the local or web access for 
data? Also what defines public redistribution, what I had in mind was something 
along the lines of having someone input a request for say an actor, and I have 
a script that spits back out a bunch of data/graphs using the imdb info, would 
that be allowed? I wouldn't be making any money off of it, it'd just be a 
webapp tool. 
Once again thank you for the quick response. 

> -- 
> Davide Alberani   [PGP KeyID: 0x465BFD47]
> http://www.mimante.net/
  --
The Go Parallel Website, sponsored by Intel - in partnership with Geeknet, 
is your hub for all things parallel software development, from weekly thought 
leadership blogs to news, videos, case studies, tutorials, tech docs, 
whitepapers, evaluation guides, and opinion stories. Check out the most 
recent posts - join the conversation now. http://goparallel.sourceforge.net/___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


[Imdbpy-help] imdbpy to mysql help

2013-02-15 Thread D L

So I began importing the data into a mysql database and everything seems to be 
going fine but this last step is taking abnormally long:

# TIME completeCast() : 0min, 0sec (wall) 0min, 0sec (user) 0min, 0sec (system)
# TIME fushing caches... : 0min, 2sec (wall) 0min, 1sec (user) 0min, 0sec 
(system)
# TIME TOTAL TIME TO INSERT/WRITE DATA : 58min, 11sec (wall) 23min, 50sec 
(user) 0min, 7sec (system)
building database indexes (this may take a while)
# TIME createIndexes() : 30min, 59sec (wall) 0min, 0sec (user) 0min, 0sec 
(system)
adding foreign keys (this may take a while)

The adding foreign keys bit has been taking roughly 8 hours. Should I restart 
the whole process or wait it out?

I also have a few other questions that I could not find in the documentation:

Once you have the database set up, is there a simple way to update it with the 
imdb text files they routinely release, or would you have to rerun the script 
with the new files?

If I'm making a webapp which could potentially receive a lot of requests, it'd 
be optimal to fetch the requests from the local database instead of through the 
web requests am I correct? Since the web requests scrapes the imdb pages and 
imdb frowns on that?

Thanks for the help!
DL
  --
The Go Parallel Website, sponsored by Intel - in partnership with Geeknet, 
is your hub for all things parallel software development, from weekly thought 
leadership blogs to news, videos, case studies, tutorials, tech docs, 
whitepapers, evaluation guides, and opinion stories. Check out the most 
recent posts - join the conversation now. http://goparallel.sourceforge.net/___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help


[Imdbpy-help] Is search still broken?

2013-02-14 Thread D L

I saw the December 14, 2012 post stating that the changes to IMDB broke the 
search system, I'm wondering if it is still broken since I ran the 
search_person.py script from the /bin folder and it couldn't find anything 
either. If it is broken, would searches to a local database still work, since 
that would be my next course of action if it works. 

Thanks,
DL
  --
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb___
Imdbpy-help mailing list
Imdbpy-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/imdbpy-help