Hmm somehow a typo (or old syntax) was still present in the SQL file. $ zcat actors.list.gz | pv | ./imdb2sql | mysql -u root --socket=/tmp/mysql.sock test 725MB 0:47:17 loading the actors requires 260263 pages (or K) in the keybuffer.
$ zcat actresses.list.gz | pv | ./imdb2sql | mysql -u root --socket=/tmp/mysql.sock test 427MB 0:24:42 After this, 380564 MyISAM index pages in the key buffer. So key_buffer_size=400M for the above two operations is sufficient. mysql> insert into actors_movies_actors select am1.actor_movie_id,am2.actor_id from actors_movies am1 join actors_movies am2 on (am1.movie_id=am2.movie_id AND am1.actor_id != am2.actor_id); Query OK, 169301244 rows affected (1 hour 39 min 7.58 sec) Records: 169301244 Duplicates: 0 Warnings: 0 MariaDB [test]> insert ignore into actors_graph_base (origid,destid) select am.actor_id,ama.actor_id from actors_movies_actors ama join actors_movies am using (actor_movie_id); Query OK, 149853110 rows affected, 65535 warnings (1 hour 51 min 19.86 sec) Records: 169301244 Duplicates: 19448134 Warnings: 19448134 Ok so a simple actor -> actor graph is 149 million vertices. Somehow the system doesn't like the actors_graph table, getting errors (not found, error 2) no matter what I do. When I name it ag it's fine, with the basetable named agb. Single underscores have been used in other tests, so I dunno what's going on there. ----- Original Message ----- > From: "Arjen Lentz" <ar...@openquery.com> > To: oqgraph-dev@lists.launchpad.net > Sent: Tuesday, 26 February, 2013 10:06:51 AM > Subject: [Oqgraph-dev] IMDB tools > Hi all > > Here is my old IMDB conversion tool (C code), and the schema I used. > It uses only Actors and Movies, filtering out TV series and DVDs. > > 1484084 actors \_ actors > 866799 actresses / > 574006 movies > 5900319 actors_movies > > actors.list.gz and actresses.list.gz files from > ftp://ftp.fu-berlin.de/pub/misc/movies/database/ > (via http://www.imdb.com/interfaces) > > My laptop is still building the actors_movies_actors table so I don't > yet have a total for that, but from memory I think it was around 45 > million edges. > > Currently I load using MyISAM with delay key writes and large key > buffer size. > We can possibly optimise the load methodology a bit further. > > > Cheers, > Arjen. > -- > Arjen Lentz, Exec.Director @ Open Query (http://openquery.com) > Australian peace of mind for your MySQL/MariaDB infrastructure. > > Follow us at http://openquery.com/blog/ & http://twitter.com/openquery > > > -- > Mailing list: https://launchpad.net/~oqgraph-dev > Post to : oqgraph-dev@lists.launchpad.net > Unsubscribe : https://launchpad.net/~oqgraph-dev > More help : https://help.launchpad.net/ListHelp -- Arjen Lentz, Exec.Director @ Open Query (http://openquery.com) Australian peace of mind for your MySQL/MariaDB infrastructure. Follow us at http://openquery.com/blog/ & http://twitter.com/openquery
-- We use MyISAM here because it's just a repeatable bulk load -- ACID commits would just delay stuff... -- enabling delay key write, key_buffer_size=3G CREATE TABLE actors ( actor_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100) NOT NULL UNIQUE ) ENGINE=MyISAM DEFAULT CHARSET=latin1 DELAY_KEY_WRITE=1; CREATE TABLE movies ( movie_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, title VARCHAR(300) NOT NULL UNIQUE ) ENGINE=MyISAM DEFAULT CHARSET=latin1 DELAY_KEY_WRITE=1; CREATE TABLE actors_movies ( actor_movie_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, actor_id INT UNSIGNED NOT NULL, movie_id INT UNSIGNED NOT NULL, UNIQUE (actor_id,movie_id), INDEX (movie_id) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 DELAY_KEY_WRITE=1; CREATE TABLE `actors_movies_actors` ( `actor_movie_id` INT(10) UNSIGNED NOT NULL, `actor_id` INT(10) UNSIGNED NOT NULL, PRIMARY KEY (`actor_movie_id`,`actor_id`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 DELAY_KEY_WRITE=1; -- populate link table -- insert into actors_movies_actors select am1.actor_movie_id,am2.actor_id from actors_movies am1 join actors_movies am2 on (am1.movie_id=am2.movie_id AND am1.actor_id != am2.actor_id); -- populate graph table -- insert into actors_graph (origid,destid) select am.actor_id,ama.actor_id from actors_movies_actors ama join actors_movies am using (actor_movie_id);
-- Mailing list: https://launchpad.net/~oqgraph-dev Post to : oqgraph-dev@lists.launchpad.net Unsubscribe : https://launchpad.net/~oqgraph-dev More help : https://help.launchpad.net/ListHelp