Re: [Imdbpy-help] imdbpy to mysql help
> Date: Sun, 24 Feb 2013 11:52:45 +0100 > Subject: Re: [Imdbpy-help] imdbpy to mysql help > From: davide.alber...@gmail.com > To: dlm...@hotmail.com > CC: imdbpy-help@lists.sourceforge.net > > On Sun, Feb 24, 2013 at 12:32 AM, D L wrote: > > > > Ok, well here's an update. I just let the foreign keys run for a little over > > a full day and it actually completed for mysql: > > # TIME FINAL : 1883min, 1sec (wall) 23min, 57sec (user) 0min, 5sec (system) > > I see. > I've just run it with a subset of the db (1% taken from each file) and > my numbers are: > # TIME TOTAL TIME TO INSERT/WRITE DATA : 12min, 18sec (wall) 5min, > 23sec (user) 0min, 43sec (system) > building database indexes (this may take a while) > # TIME createIndexes() : 1min, 25sec (wall) 0min, 0sec (user) 0min, > 0sec (system) > adding foreign keys (this may take a while) > # TIME createForeignKeys() : 10min, 2sec (wall) 0min, 0sec (user) > 0min, 0sec (system) > RESTORING imdbIDs values for movies... DONE! (restored 0 entries out of 0) > # TIME restore movies : 0min, 0sec (wall) 0min, 0sec (user) 0min, 0sec > (system) > RESTORING imdbIDs values for people... DONE! (restored 0 entries out of 0) > # TIME restore people : 0min, 0sec (wall) 0min, 0sec (user) 0min, 0sec > (system) > RESTORING imdbIDs values for characters... DONE! (restored 0 entries out of 0) > # TIME restore characters : 0min, 0sec (wall) 0min, 0sec (user) 0min, > 0sec (system) > RESTORING imdbIDs values for companies... DONE! (restored 0 entries out of 0) > # TIME restore companies : 0min, 0sec (wall) 0min, 0sec (user) 0min, > 0sec (system) > # TIME FINAL : 23min, 45sec (wall) 5min, 23sec (user) 0min, 43sec (system) > > What kind of CPU/RAM/disk have you used? I'm doing it on a laptop that has an i3 2.53GHZ, 4GB DDR3, with about 200gb of space (not SSD). I'm planning to just try to get everything working on my laptop before I buy web hosting to put it on there. > > One of my main questions right now is the difference in results between the > > web search and the sql search. For example, if I ran a search on all the > > movies that Denzel Washington has acted in via the web search, it basically > > outputs all the main ones, > > Yep, they are just grouped in a different way. > It would be not easy for us (even if it's not impossible, I guess) to identify > alle the various categories used on the web and the rules used to categorize > the movies, but... > > For the moment, I think you could take the whole filmography and search > for tv series and/or movies in which an actor is playing Himself (or anything > that starts with Himself/Herself/Themselves) Yeah, but filtering that may require even more processing time..? > > And I haven't tested it that much, but it appears that sqlite and mysql have > > roughly the same speeds in running these queries, but I'm not completely > > sure yet. > > I expect them to be comparable in speed, but not to be slower than a > web search. :-/ They may (hopefully) be faster once I get it up on a web hosting machine instead of my laptop. > -- > Davide Alberani [PGP KeyID: 0x465BFD47] > http://www.mimante.net/ -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb___ Imdbpy-help mailing list Imdbpy-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-help
Re: [Imdbpy-help] imdbpy to mysql help
Ok, well here's an update. I just let the foreign keys run for a little over a full day and it actually completed for mysql: # TIME FINAL : 1883min, 1sec (wall) 23min, 57sec (user) 0min, 5sec (system) One of my main questions right now is the difference in results between the web search and the sql search. For example, if I ran a search on all the movies that Denzel Washington has acted in via the web search, it basically outputs all the main ones, whereas if I do it via the sql search it will include a lot of random stuff like award ceremonies and random tv shows that he may have had a cameo on. How would I make the sql search more like the web search so that it excludes stuff like award ceremonies and only outputs the main movies? And I haven't tested it that much, but it appears that sqlite and mysql have roughly the same speeds in running these queries, but I'm not completely sure yet. From: dlm...@hotmail.com To: davide.alber...@gmail.com; imdbpy-help@lists.sourceforge.net Subject: RE: [Imdbpy-help] imdbpy to mysql help Date: Fri, 22 Feb 2013 00:10:28 -0800 So after updating those dependencies, the MySQL still gets stuck on the foreign keys section, however sqlite actually manages to finish. But one of my concerns is that even the requests with sqlite can be slow the first time, and on occasion the web access was a lot faster than using the sqlite. For example, the search_person script is faster via the web, but if I run it twice (searching the same person) using the sql database, the 2nd time is noticeably much faster, most likely due to the data already being cached. My question is how fast does something like search_person take on MySQL (if I can eventually get it to work), since using sqlite seems like it's slower than just going the web route so far. From: dlm...@hotmail.com To: davide.alber...@gmail.com Subject: RE: [Imdbpy-help] imdbpy to mysql help Date: Tue, 19 Feb 2013 17:54:16 -0800 > Date: Tue, 19 Feb 2013 21:28:18 +0100 > Subject: Re: [Imdbpy-help] imdbpy to mysql help > From: davide.alber...@gmail.com > To: dlm...@hotmail.com > CC: imdbpy-help@lists.sourceforge.net > > On Sun, Feb 17, 2013 at 11:45 PM, D L wrote: > > > > Yeah tried that and ran it overnight, still no luck - it gets stuck on the > > foreign keys part. I'm just trying this on my laptop, so I may just proceed > > with using the web access for the data. Once I get everything set up for a > > web hosting, I may try other databases such as sqlite to see if that works. > > D'oh! :( > Versions of: > - IMDbPY > - SQLAlchemy > - SQLObject > - MySQL > - python-mysqldb > - python-migrate > ? IMDbPY - 5.0dev20130210 SQLAlchemy - 0.8.0b2 SQLObject - 1.3.2 MySQL - Server version: 5.5.29-0ubuntu0.12.04.1 (Ubuntu) python-mysqldb - 1.2.3 python-migrate - 0.7.2 Both my python-mysqldb and python-migrate were older versions, which I just updated as I typed this. I tried the process with sqlite a night ago and it was stuck on the foreign keys section as well, I will try it again now that mysqldb and migrate have been updated and hopefully it will work. I also wrote a rough script for the data retrieval using the webaccess method, and you're right it does take a while. > Anyway, if you interrupt it while it's creating the foreign key, maybe > you can try to see which were already created, and add the missing > one following the scheme you can find in imdb/parser/sql/dbschema.py > > Anyway, obviously I'll try to reproduce the problem, since it's not > nice at all. :-/ Hopefully, the updated mysqldb and migrate would fix it, but we'll see. > -- > Davide Alberani [PGP KeyID: 0x465BFD47] > http://www.mimante.net/ -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb___ Imdbpy-help mailing list Imdbpy-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-help
Re: [Imdbpy-help] imdbpy to mysql help
So after updating those dependencies, the MySQL still gets stuck on the foreign keys section, however sqlite actually manages to finish. But one of my concerns is that even the requests with sqlite can be slow the first time, and on occasion the web access was a lot faster than using the sqlite. For example, the search_person script is faster via the web, but if I run it twice (searching the same person) using the sql database, the 2nd time is noticeably much faster, most likely due to the data already being cached. My question is how fast does something like search_person take on MySQL (if I can eventually get it to work), since using sqlite seems like it's slower than just going the web route so far. From: dlm...@hotmail.com To: davide.alber...@gmail.com Subject: RE: [Imdbpy-help] imdbpy to mysql help Date: Tue, 19 Feb 2013 17:54:16 -0800 > Date: Tue, 19 Feb 2013 21:28:18 +0100 > Subject: Re: [Imdbpy-help] imdbpy to mysql help > From: davide.alber...@gmail.com > To: dlm...@hotmail.com > CC: imdbpy-help@lists.sourceforge.net > > On Sun, Feb 17, 2013 at 11:45 PM, D L wrote: > > > > Yeah tried that and ran it overnight, still no luck - it gets stuck on the > > foreign keys part. I'm just trying this on my laptop, so I may just proceed > > with using the web access for the data. Once I get everything set up for a > > web hosting, I may try other databases such as sqlite to see if that works. > > D'oh! :( > Versions of: > - IMDbPY > - SQLAlchemy > - SQLObject > - MySQL > - python-mysqldb > - python-migrate > ? IMDbPY - 5.0dev20130210 SQLAlchemy - 0.8.0b2 SQLObject - 1.3.2 MySQL - Server version: 5.5.29-0ubuntu0.12.04.1 (Ubuntu) python-mysqldb - 1.2.3 python-migrate - 0.7.2 Both my python-mysqldb and python-migrate were older versions, which I just updated as I typed this. I tried the process with sqlite a night ago and it was stuck on the foreign keys section as well, I will try it again now that mysqldb and migrate have been updated and hopefully it will work. I also wrote a rough script for the data retrieval using the webaccess method, and you're right it does take a while. > Anyway, if you interrupt it while it's creating the foreign key, maybe > you can try to see which were already created, and add the missing > one following the scheme you can find in imdb/parser/sql/dbschema.py > > Anyway, obviously I'll try to reproduce the problem, since it's not > nice at all. :-/ Hopefully, the updated mysqldb and migrate would fix it, but we'll see. > -- > Davide Alberani [PGP KeyID: 0x465BFD47] > http://www.mimante.net/ -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb___ Imdbpy-help mailing list Imdbpy-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-help
Re: [Imdbpy-help] imdbpy to mysql help
> Date: Sun, 17 Feb 2013 09:57:50 +0100 > Subject: Re: [Imdbpy-help] imdbpy to mysql help > From: davide.alber...@gmail.com > To: dlm...@hotmail.com > CC: imdbpy-help@lists.sourceforge.net > > On Sat, Feb 16, 2013 at 9:21 PM, D L wrote: > > > > Yeah the adding foreign keys is still going, and when I run top it seems > > like mysql is actually doing stuff with the CPU. I'm most likely going to > > rerun it and hope it works again in a timely fashion. > > Doesn't looks good. :-( > Can you try again using the SQLAlchemy ORM? > Basically, you have to install it (if not already present on your system), > and add to the command line of imdbpy2sql.py: -o sqlalchemy Yeah tried that and ran it overnight, still no luck - it gets stuck on the foreign keys part. I'm just trying this on my laptop, so I may just proceed with using the web access for the data. Once I get everything set up for a web hosting, I may try other databases such as sqlite to see if that works. > >> Re-run the script with the new dataset. No other way. > >> imdbIDs are (hopefully... se above) preserved between runs. > > > > Alright, will the script pass through the ones already in the database and > > be faster, or would it require the same amount of time? > > The same time, sorry. No way to do otherwise, trust me. :-) > > > Do most apps that have been made with imdbpy use the local or web access for > > data? > > Hmmm I guess that most of the person who does some kind of > analysis/heavy use of the data, uses the SQL access. > Plugins of media centers, small scripts and so on, mostly uses > the web access. > > > Also what defines public redistribution, what I had in mind was > > something along the lines of having someone input a request for say an > > actor, and I have a script that spits back out a bunch of data/graphs using > > the imdb info, would that be allowed? > > I'm not a lawyer and so my opinion is worth about zero, but... :-) > I guess that if you just process the data, show the result of this processing > and so on (i.e. you do some transformation on it, not just printing it out > exactly as taken from the db), you're on the safe side. > Also, don't forget to put links to the imdb.com site and a footer which > explain the copyright of the data. > Alright I guess I shouldn't have a problem with that then. > -- > Davide Alberani [PGP KeyID: 0x465BFD47] > http://www.mimante.net/ -- The Go Parallel Website, sponsored by Intel - in partnership with Geeknet, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials, tech docs, whitepapers, evaluation guides, and opinion stories. Check out the most recent posts - join the conversation now. http://goparallel.sourceforge.net/___ Imdbpy-help mailing list Imdbpy-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-help
Re: [Imdbpy-help] imdbpy to mysql help
> Date: Sat, 16 Feb 2013 15:35:43 +0100 > Subject: Re: [Imdbpy-help] imdbpy to mysql help > From: davide.alber...@gmail.com > To: dlm...@hotmail.com > CC: imdbpy-help@lists.sourceforge.net > > On Sat, Feb 16, 2013 at 3:01 AM, D L wrote: > > > > The adding foreign keys bit has been taking roughly 8 hours. Should I > > restart the whole process or wait it out? > > Seems really slow. > Is the db actually doing something? > > Anyway, the creation of indexes/foreign keys and the store/restore > of imdbIDs at db updates seems to be slightly broken. > Any help fixing it is welcome. Yeah the adding foreign keys is still going, and when I run top it seems like mysql is actually doing stuff with the CPU. I'm most likely going to rerun it and hope it works again in a timely fashion. > > Once you have the database set up, is there a simple way to update it with > > the imdb text files they routinely release, or would you have to rerun the > > script with the new files? > > Re-run the script with the new dataset. No other way. > imdbIDs are (hopefully... se above) preserved between runs. Alright, will the script pass through the ones already in the database and be faster, or would it require the same amount of time? > > If I'm making a webapp which could potentially receive a lot of requests, > > it'd be optimal to fetch the requests from the local database instead of > > through the web requests am I correct? > > You're correct. > > > Since the web requests scrapes the imdb pages and imdb frowns on that? > > Main point, it's much slower. > > Anyway, no matter what data you access (local or remote), I'm pretty sure > that the lIMDb license forbid you to use it for anything that is not > personal *and* > non-commercial. > I.e.: no money (not even *saved* money) from it, and no public redistribution > of the data. > > HTH. Do most apps that have been made with imdbpy use the local or web access for data? Also what defines public redistribution, what I had in mind was something along the lines of having someone input a request for say an actor, and I have a script that spits back out a bunch of data/graphs using the imdb info, would that be allowed? I wouldn't be making any money off of it, it'd just be a webapp tool. Once again thank you for the quick response. > -- > Davide Alberani [PGP KeyID: 0x465BFD47] > http://www.mimante.net/ -- The Go Parallel Website, sponsored by Intel - in partnership with Geeknet, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials, tech docs, whitepapers, evaluation guides, and opinion stories. Check out the most recent posts - join the conversation now. http://goparallel.sourceforge.net/___ Imdbpy-help mailing list Imdbpy-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-help
[Imdbpy-help] imdbpy to mysql help
So I began importing the data into a mysql database and everything seems to be going fine but this last step is taking abnormally long: # TIME completeCast() : 0min, 0sec (wall) 0min, 0sec (user) 0min, 0sec (system) # TIME fushing caches... : 0min, 2sec (wall) 0min, 1sec (user) 0min, 0sec (system) # TIME TOTAL TIME TO INSERT/WRITE DATA : 58min, 11sec (wall) 23min, 50sec (user) 0min, 7sec (system) building database indexes (this may take a while) # TIME createIndexes() : 30min, 59sec (wall) 0min, 0sec (user) 0min, 0sec (system) adding foreign keys (this may take a while) The adding foreign keys bit has been taking roughly 8 hours. Should I restart the whole process or wait it out? I also have a few other questions that I could not find in the documentation: Once you have the database set up, is there a simple way to update it with the imdb text files they routinely release, or would you have to rerun the script with the new files? If I'm making a webapp which could potentially receive a lot of requests, it'd be optimal to fetch the requests from the local database instead of through the web requests am I correct? Since the web requests scrapes the imdb pages and imdb frowns on that? Thanks for the help! DL -- The Go Parallel Website, sponsored by Intel - in partnership with Geeknet, is your hub for all things parallel software development, from weekly thought leadership blogs to news, videos, case studies, tutorials, tech docs, whitepapers, evaluation guides, and opinion stories. Check out the most recent posts - join the conversation now. http://goparallel.sourceforge.net/___ Imdbpy-help mailing list Imdbpy-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-help
[Imdbpy-help] Is search still broken?
I saw the December 14, 2012 post stating that the changes to IMDB broke the search system, I'm wondering if it is still broken since I ran the search_person.py script from the /bin folder and it couldn't find anything either. If it is broken, would searches to a local database still work, since that would be my next course of action if it works. Thanks, DL -- Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb___ Imdbpy-help mailing list Imdbpy-help@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/imdbpy-help