Mark Woodward wrote:
I tried to cluster the data along a particular index but had to cancel it
after 3 hours.
If the data is in random order, it's faster to do
SELECT * INTO foo_sorted FROM foo ORDER BY bar
then CREATE INDEX, than to run CLUSTER.
That's because CLUSTER does a full index
I signed up for the Netflix Prize. (www.netflixprize.com) and downloaded
their data and have imported it into PostgreSQL. Here is how I created the
table:
Table public.ratings
Column | Type | Modifiers
+-+---
item | integer |
client | integer |
rating | integer
Mark,
On 10/4/06 1:43 PM, Mark Woodward [EMAIL PROTECTED] wrote:
[EMAIL PROTECTED]:~/netflix$ time psql netflix -c select count(*) from
ratings
count
---
100480507
(1 row)
real2m6.270s
user0m0.004s
sys 0m0.005s
I think you are getting about 40MB/s on your
Mark Woodward [EMAIL PROTECTED] writes:
The one thing I notice is that it is REAL slow.
How fast is your disk? Counting on my fingers, I estimate you are
scanning the table at about 47MB/sec, which might or might not be
disk-limited...
I'm using 8.1.4. The rdate field looks something like:
Mark Woodward [EMAIL PROTECTED] writes:
I'm using 8.1.4. The rdate field looks something like: 2005-09-06 So,
the raw data is 23 bytes, the date string will probably be rounded up to
12 bytes, that's 24 bytes per row of data. What is the overhead per
variable? per row?
Is there any
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
I signed up for the Netflix Prize. (www.netflixprize.com)
and downloaded their data and have imported it into PostgreSQL.
Here is how I created the table:
I signed up as well, but have the table as follows:
CREATE TABLE rating (
movie
I signed up for the Netflix Prize. (www.netflixprize.com)
and downloaded their data and have imported it into PostgreSQL.
Here is how I created the table:
I signed up as well, but have the table as follows:
CREATE TABLE rating (
movie SMALLINT NOT NULL,
person INTEGER NOT NULL,
Mark Woodward [EMAIL PROTECTED] writes:
The one thing I notice is that it is REAL slow.
How fast is your disk? Counting on my fingers, I estimate you are
scanning the table at about 47MB/sec, which might or might not be
disk-limited...
I'm using 8.1.4. The rdate field looks something
Greg Sabino Mullane [EMAIL PROTECTED] writes:
CREATE TABLE rating (
movie SMALLINT NOT NULL,
person INTEGER NOT NULL,
rating SMALLINT NOT NULL,
viewed DATE NOT NULL
);
You would probably be better off putting the two smallints first followed by
the integer and date.
Greg Sabino Mullane [EMAIL PROTECTED] writes:
CREATE TABLE rating (
movie SMALLINT NOT NULL,
person INTEGER NOT NULL,
rating SMALLINT NOT NULL,
viewed DATE NOT NULL
);
You would probably be better off putting the two smallints first followed
by
the integer and date.
Mark Woodward [EMAIL PROTECTED] writes:
The rating, however, is one char 1~9. Would making it a char(1) buy anything?
No, that would actually hurt because of the length word for the char
field. Even if you used the char type, which really is only one byte,
you wouldn't win anything because of
Mark Woodward [EMAIL PROTECTED] writes:
The rating, however, is one char 1~9. Would making it a char(1) buy
anything?
No, that would actually hurt because of the length word for the char
field. Even if you used the char type, which really is only one byte,
you wouldn't win anything because
12 matches
Mail list logo