Re: [HACKERS] Netflix Prize data

2006-10-05 Thread Heikki Linnakangas
Mark Woodward wrote: I tried to cluster the data along a particular index but had to cancel it after 3 hours. If the data is in random order, it's faster to do SELECT * INTO foo_sorted FROM foo ORDER BY bar then CREATE INDEX, than to run CLUSTER. That's because CLUSTER does a full index

[HACKERS] Netflix Prize data

2006-10-04 Thread Mark Woodward
I signed up for the Netflix Prize. (www.netflixprize.com) and downloaded their data and have imported it into PostgreSQL. Here is how I created the table: Table public.ratings Column | Type | Modifiers +-+--- item | integer | client | integer | rating | integer

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Luke Lonergan
Mark, On 10/4/06 1:43 PM, Mark Woodward [EMAIL PROTECTED] wrote: [EMAIL PROTECTED]:~/netflix$ time psql netflix -c select count(*) from ratings count --- 100480507 (1 row) real2m6.270s user0m0.004s sys 0m0.005s I think you are getting about 40MB/s on your

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Tom Lane
Mark Woodward [EMAIL PROTECTED] writes: The one thing I notice is that it is REAL slow. How fast is your disk? Counting on my fingers, I estimate you are scanning the table at about 47MB/sec, which might or might not be disk-limited... I'm using 8.1.4. The rdate field looks something like:

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Gregory Stark
Mark Woodward [EMAIL PROTECTED] writes: I'm using 8.1.4. The rdate field looks something like: 2005-09-06 So, the raw data is 23 bytes, the date string will probably be rounded up to 12 bytes, that's 24 bytes per row of data. What is the overhead per variable? per row? Is there any

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Greg Sabino Mullane
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I signed up for the Netflix Prize. (www.netflixprize.com) and downloaded their data and have imported it into PostgreSQL. Here is how I created the table: I signed up as well, but have the table as follows: CREATE TABLE rating ( movie

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Mark Woodward
I signed up for the Netflix Prize. (www.netflixprize.com) and downloaded their data and have imported it into PostgreSQL. Here is how I created the table: I signed up as well, but have the table as follows: CREATE TABLE rating ( movie SMALLINT NOT NULL, person INTEGER NOT NULL,

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Mark Woodward
Mark Woodward [EMAIL PROTECTED] writes: The one thing I notice is that it is REAL slow. How fast is your disk? Counting on my fingers, I estimate you are scanning the table at about 47MB/sec, which might or might not be disk-limited... I'm using 8.1.4. The rdate field looks something

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Gregory Stark
Greg Sabino Mullane [EMAIL PROTECTED] writes: CREATE TABLE rating ( movie SMALLINT NOT NULL, person INTEGER NOT NULL, rating SMALLINT NOT NULL, viewed DATE NOT NULL ); You would probably be better off putting the two smallints first followed by the integer and date.

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Mark Woodward
Greg Sabino Mullane [EMAIL PROTECTED] writes: CREATE TABLE rating ( movie SMALLINT NOT NULL, person INTEGER NOT NULL, rating SMALLINT NOT NULL, viewed DATE NOT NULL ); You would probably be better off putting the two smallints first followed by the integer and date.

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Tom Lane
Mark Woodward [EMAIL PROTECTED] writes: The rating, however, is one char 1~9. Would making it a char(1) buy anything? No, that would actually hurt because of the length word for the char field. Even if you used the char type, which really is only one byte, you wouldn't win anything because of

Re: [HACKERS] Netflix Prize data

2006-10-04 Thread Mark Woodward
Mark Woodward [EMAIL PROTECTED] writes: The rating, however, is one char 1~9. Would making it a char(1) buy anything? No, that would actually hurt because of the length word for the char field. Even if you used the char type, which really is only one byte, you wouldn't win anything because