from:"Yang Zhang"

[GENERAL] Curious why planner can't handle NOT IN

2013-05-03 Thread Yang Zhang

I have an `account` table with 5.3M rows, with primary key `id` of
type `text` (and 600+ columns if that matters).

I'm trying to create a `newaccount` table with the same schema but
600k newly imported rows, then insert all the old rows for which `id`
isn't already in the set of newly updated rows.

= create table newaccount as select * from account limit 0;

= \copy newaccount from stdin with (format csv)
[...copy 600k rows in...]

= analyze newaccount;

The most obvious query doesn't work so hot due to repeated execution
of the subplan:

= explain insert into newaccount select * from account where id not
in (select id from account);

   QUERY PLAN
-
 Insert on newaccount  (cost=0.00..458800400224.89 rows=2985976 width=5366)
   -  Seq Scan on account  (cost=0.00..458800400224.89 rows=2985976 width=5366)
 Filter: (NOT (SubPlan 1))
 SubPlan 1
   -  Materialize  (cost=0.00..153276.00 rows=15 width=32)
 -  Seq Scan on newaccount  (cost=0.00..151500.00
rows=15 width=32)
(6 rows)

This works fine, though:

= explain insert into newaccount select * from account where id in
(select id from account except select id from newaccount);

   QUERY PLAN

 Insert on newaccount  (cost=3749026.24..14034547.77 rows=2985976 width=5366)
   -  Hash Join  (cost=3749026.24..14034547.77 rows=2985976 width=5366)
 Hash Cond: (account.id = ANY_subquery.id)
 -  Seq Scan on account  (cost=0.00..2263744.51 rows=5971951
width=5366)
 -  Hash  (cost=3633552.85..3633552.85 rows=5971951 width=32)
   -  Subquery Scan on ANY_subquery
(cost=3543223.59..3633552.85 rows=5971951 width=32)
 -  SetOp Except  (cost=3543223.59..3573833.34
rows=5971951 width=19)
   -  Sort  (cost=3543223.59..3558528.47
rows=6121951 width=19)
 Sort Key: *SELECT* 1.id
 -  Append  (cost=0.00..2476464.02
rows=6121951 width=19)
   -  Subquery Scan on *SELECT*
1  (cost=0.00..2323464.02 rows=5971951 width=19)
 -  Seq Scan on account
(cost=0.00..2263744.51 rows=5971951 width=19)
   -  Subquery Scan on *SELECT*
2  (cost=0.00..153000.00 rows=15 width=32)
 -  Seq Scan on
newaccount  (cost=0.00..151500.00 rows=15 width=32)
(14 rows)

This is all in PG 9.1.

This isn't a big deal as there's a straightforward workaround, but I
am curious what happened here.  Googling turns up various mentions of
NOT IN with poor plans that involve subplans.  Then again I have
read mention of hash anti-join which seems appropriate here(?), but
that wasn't used here (nor was the latter join used, though for
whatever reason it looks like a more complex/deeper plan tree than I
had expected, so maybe it was out of the plan generator's reach?).

E.g. the following mentions cranking up work_mem, but I probably can't
crank up work_mem to meet the requirements of this example, and even
if it doesn't fit in memory, it'd be nice for the planner to not
degenerate to a pathological plan and still execute this join
efficiently while spilling to and from disk.

http://stackoverflow.com/questions/7125291/postgresql-not-in-versus-except-performance-difference-edited-2

Thanks!


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Unlogged indexes

2013-05-03 Thread Yang Zhang

Guessing the answer's no, but is there any way to construct indexes
such that I can safely put them on (faster) volatile storage? (Just to
be clear, I'm asking about indexes for *logged* tables.)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Unlogged indexes

2013-05-03 Thread Yang Zhang

Yeah, I know that indexes for unlogged tables are unlogged.  I was
just wondering if you could do this for logged tables.  (Safely, such
that on crash recovery WAL replay won't throw up, these can be omitted
from base backups, etc.)

On Fri, May 3, 2013 at 1:46 PM, Thom Brown t...@linux.com wrote:
 On 3 May 2013 21:06, Yang Zhang yanghates...@gmail.com wrote:
 Guessing the answer's no, but is there any way to construct indexes
 such that I can safely put them on (faster) volatile storage? (Just to
 be clear, I'm asking about indexes for *logged* tables.)

 Yes:

 CREATE INDEX ... TABLESPACE tablespacename;
 ALTER INDEX ... SET TABLESPACE tablespacename;

 Although there's a disparity between your email subject and main text.
  Indexes for logged tables are always logged.  If you want an unlogged
 index you can only create it for an unlogged table.

 And putting indexes on a separate tablespace is probably not as
 advantageous as you're thinking.  Might be worth testing.

 --
 Thom


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] OK to put temp tablespace on volatile storage or to omit it from backups?

2013-05-01 Thread Yang Zhang

On Tue, Apr 30, 2013 at 8:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Ian Lawrence Barwick barw...@gmail.com writes:
 2013/5/1 Yang Zhang yanghates...@gmail.com:
 That is unfortunate.  Good thing I asked, I guess.  Do you have a
 pointer to said blog post?

 I think this is the post in question:
 http://thebuild.com/blog/2013/03/10/you-cannot-recover-from-the-loss-of-a-tablespace/

 Appears to be sheer blather, or at least not tempered by any thoughts
 of whether it'd work in special cases.  The main reality underlying it,
 I think, is that WAL replay will complain if files are missing.  But
 there will be no WAL log entries for temp tables.

 The main concern I'd have about Yang's idea is that just because *he*
 thinks a tablespace is temp doesn't mean the system knows it is,
 so there would be no protection against accidentally creating a regular
 table there; whereupon he's at risk of replay failures.

So this is interesting: if it's OK to put the temp tablespace on
volatile storage, is it OK to put indexes for non-temp tables into the
same temp tablespace (and everything works)?


 Having said that, there's no substitute for testing ;-).  I wouldn't be
 surprised for instance if the DB won't restart until you create the
 tablespace directories, and maybe even PG_VERSION files therein.  But it
 really shouldn't have an issue with the files underlying a temp table
 not being there anymore; at worst you'd get some bleats in the log.

Do you know what exactly I would need to create in place for this to work out?

This isn't exactly the same test as what I should be running (pulling
the cord), but I just tried:

create tablespace ephemeral location '/mnt/eph0/pgtmp';

Then reloading PG with temp_tablespaces = 'ephemeral' in postgresql.conf.

At this point I (cleanly) stop PG, ran rm -rf /mnt/eph0/pgtmp/,
started PG, and ran:

create temp table foo (a int);

which failed with:

ERROR:  could not create directory
pg_tblspc/16384/PG_9.1_201105231/11919: No such file or directory

Once I did

mkdir -p /mnt/eph0/pgtmp/PG_9.1_201105231/11919

everything seems to be back to normal.

Is this the extent of what I can expect, *always*, even if I had run
the proper experiment involving pulling the cord (or at least kill
-9)?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] OK to put temp tablespace on volatile storage or to omit it from backups?

2013-05-01 Thread Yang Zhang

On Tue, Apr 30, 2013 at 11:14 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Tue, Apr 30, 2013 at 8:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Ian Lawrence Barwick barw...@gmail.com writes:
 I think this is the post in question:
 http://thebuild.com/blog/2013/03/10/you-cannot-recover-from-the-loss-of-a-tablespace/

 Appears to be sheer blather, or at least not tempered by any thoughts
 of whether it'd work in special cases.  The main reality underlying it,
 I think, is that WAL replay will complain if files are missing.  But
 there will be no WAL log entries for temp tables.

Just thought of another thing - temp tables aren't the only thing temp
tablespaces are used for.  Even if I were creating or updating a
non-temp table, if it involves (say) a large sort, the temp tablespace
is what's used for this.  Is volatile storage of temp tablespaces safe
for these as well?

And does this mean it's also OK to omit temp tablespaces from base
backups as well?

 Having said that, there's no substitute for testing ;-).  I wouldn't be
 surprised for instance if the DB won't restart until you create the
 tablespace directories, and maybe even PG_VERSION files therein.  But it
 really shouldn't have an issue with the files underlying a temp table
 not being there anymore; at worst you'd get some bleats in the log.

 Do you know what exactly I would need to create in place for this to work out?

 This isn't exactly the same test as what I should be running (pulling
 the cord), but I just tried:

 create tablespace ephemeral location '/mnt/eph0/pgtmp';

 Then reloading PG with temp_tablespaces = 'ephemeral' in postgresql.conf.

 At this point I (cleanly) stop PG, ran rm -rf /mnt/eph0/pgtmp/,
 started PG, and ran:

 create temp table foo (a int);

 which failed with:

 ERROR:  could not create directory
 pg_tblspc/16384/PG_9.1_201105231/11919: No such file or directory

 Once I did

 mkdir -p /mnt/eph0/pgtmp/PG_9.1_201105231/11919

 everything seems to be back to normal.

 Is this the extent of what I can expect, *always*, even if I had run
 the proper experiment involving pulling the cord (or at least kill
 -9)?

In other words, I guess, I'm asking because of Xof's comment on that blog post:

That’s true if you recreate the PG_ directory in the tablespace;
otherwise, you get the error:
reindexdb: reindexing of database test failed: ERROR: could not
create directory pg_tblspc/69944/PG_9.2_201204301/61884: No such
file or directory
However, that’s not a guaranteed feature of PostgreSQL, and I would
never rely on it.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Basic question on recovery and disk snapshotting

2013-05-01 Thread Yang Zhang

On Wed, May 1, 2013 at 4:56 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 That brings up another point to consider.  If wal level is minimal, then
 tables which you bulk load in the same transaction as you created them or
 truncated them will not get any WAL records written.  (That is the main
 reason the WAL verbosity is reduced).  But that also means that if any of
 those operations is happening while you are taking your snapshot, those
 operations will be corrupted.  If the data and xlogs were part of the same
 atomic snapshot, this would not be a problem, as either the operation
 completed, or it never happened.  But with different snapshots, the data can
 get partially but not completely into the data-snapshot, but then the xlog
 record which says the data was completely written does gets into the xlog
 snapshot

Come to think of it, I'm no longer sure that EBS snapshots, which are
on the block device level, are OK, even if all your data is on a
single volume, since base backups (as documented) are supposed to be
taken via the FS (e.g. normal read operations, or even FS snapshots).
Block device level copies are not mentioned.

Can anyone confirm or refute?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] OK to put temp tablespace on volatile storage or to omit it from backups?

2013-04-30 Thread Yang Zhang

I would intuit that it's fine, but I just want to make sure there are
no gotchas from a recovery point of view:

If I were to lose my temp tablespace upon system crash, would this
prevent proper crash recovery?

Also, if I were to omit the temp tablespace from the base backup,
would that prevent proper backup recovery?

Thanks.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] OK to put temp tablespace on volatile storage or to omit it from backups?

2013-04-30 Thread Yang Zhang

That is unfortunate.  Good thing I asked, I guess.  Do you have a
pointer to said blog post?

On Tue, Apr 30, 2013 at 5:31 PM, Darren Duncan dar...@darrenduncan.net wrote:
 On 2013.04.30 4:55 PM, Yang Zhang wrote:

 I would intuit that it's fine, but I just want to make sure there are
 no gotchas from a recovery point of view:

 If I were to lose my temp tablespace upon system crash, would this
 prevent proper crash recovery?

 Also, if I were to omit the temp tablespace from the base backup,
 would that prevent proper backup recovery?


 Although it would be nice if what you said would work, I read in a recent
 blog post that losing any tablespace would prevent the database server from
 starting, even if it was only for temporary things. -- Darren Duncan




 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] OK to put temp tablespace on volatile storage or to omit it from backups?

2013-04-30 Thread Yang Zhang

On Tue, Apr 30, 2013 at 7:14 PM, Ian Lawrence Barwick barw...@gmail.com wrote:
 2013/5/1 Yang Zhang yanghates...@gmail.com:

 On Tue, Apr 30, 2013 at 5:31 PM, Darren Duncan dar...@darrenduncan.net 
 wrote:
 On 2013.04.30 4:55 PM, Yang Zhang wrote:

 I would intuit that it's fine, but I just want to make sure there are
 no gotchas from a recovery point of view:

 If I were to lose my temp tablespace upon system crash, would this
 prevent proper crash recovery?

 Also, if I were to omit the temp tablespace from the base backup,
 would that prevent proper backup recovery?


 Although it would be nice if what you said would work, I read in a recent
 blog post that losing any tablespace would prevent the database server from
 starting, even if it was only for temporary things. -- Darren Duncan

 That is unfortunate.  Good thing I asked, I guess.  Do you have a
 pointer to said blog post?

 I think this is the post in question:

 http://thebuild.com/blog/2013/03/10/you-cannot-recover-from-the-loss-of-a-tablespace/

 (BTW please try not to top-post :) )

Argh, trying not to.  Old habits die hard.

The comments suggest recovery from a lost tablespace is possible.

If anyone has any details or even pointers, I'd love to dig deeper,
since this would yield some nice and economical performance gains for
us on EC2.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] OK to put temp tablespace on volatile storage or to omit it from backups?

2013-04-30 Thread Yang Zhang

On Tue, Apr 30, 2013 at 8:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Ian Lawrence Barwick barw...@gmail.com writes:
 2013/5/1 Yang Zhang yanghates...@gmail.com:
 That is unfortunate.  Good thing I asked, I guess.  Do you have a
 pointer to said blog post?

 I think this is the post in question:
 http://thebuild.com/blog/2013/03/10/you-cannot-recover-from-the-loss-of-a-tablespace/

 Appears to be sheer blather, or at least not tempered by any thoughts
 of whether it'd work in special cases.  The main reality underlying it,
 I think, is that WAL replay will complain if files are missing.  But
 there will be no WAL log entries for temp tables.

 The main concern I'd have about Yang's idea is that just because *he*
 thinks a tablespace is temp doesn't mean the system knows it is,
 so there would be no protection against accidentally creating a regular
 table there; whereupon he's at risk of replay failures.

 Having said that, there's no substitute for testing ;-).  I wouldn't be
 surprised for instance if the DB won't restart until you create the
 tablespace directories, and maybe even PG_VERSION files therein.  But it
 really shouldn't have an issue with the files underlying a temp table
 not being there anymore; at worst you'd get some bleats in the log.

I mentioned this in my response to Julian but I would not trust
black-box tests as strong evidence.  E.g. I can imagine a system
implementation wherein everything just happens to work out for my own
experiments (e.g. certain temp files are not created), but not work in
other circumstances (e.g. where those temp files are created and then
subsequently expected).


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang

On Fri, Apr 26, 2013 at 9:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yang Zhang yanghates...@gmail.com writes:
 It currently takes up to 24h for us to run a large set of UPDATE
 statements on a database, which are of the form:

 UPDATE table SET field1 = constant1, field2 = constant2, ...  WHERE
 id = constid

 (We're just overwriting fields of objects identified by ID.)

 Forgive the obvious question, but you do have an index on id, right?
 Have you checked it's being used (ie EXPLAIN ANALYZE on one of these)?

Totally valid question.  That is the primary key with its own index.
Yes, we verified that explain says it just use a simple index scan.
Each individual query runs reasonably quickly (we can run several
dozen such statements per second).


 The tables have handfuls of indices each and no foreign key constraints.

 How much is a handful?

The table with the largest volume of updates (our bottleneck) has four indexes:

account_pkey PRIMARY KEY, btree (id)
account_createddate btree (createddate)
account_id_prefix btree (id text_pattern_ops)
account_recordtypeid btree (recordtypeid)


 It takes 2h to import a `pg_dump` of the entire DB.  This seems like a
 baseline we should reasonably target.

 Well, maybe.  You didn't say what percentage of the DB you're updating.

It can be 10-50% of rows changed - a large portion.


 But the thing that comes to mind here is that you're probably incurring
 a network round trip for each row, and maybe a query-planning round as
 well, so you really can't expect that this is going to be anywhere near
 as efficient as a bulk load operation.  You could presumably get rid of
 the planner overhead by using a prepared statement.  Cutting the network
 overhead is going to require a bit more ingenuity --- could you move
 some logic into a stored procedure, perhaps, so that one command from
 the client is sufficient to update multiple rows?

You're right, we're only sequentially issuing (unprepared) UPDATEs.

If we ship many UPDATE statements per call to our DB API's execution
function (we're using Python's psycopg2 if that matters, but I think
that just binds libpq), would that avoid the network round trip per
statement?

If not, what if we use anonymous procedures (DO) to run multiple
UPDATE statements?

Finally, we could use the technique highlighted in my third bullet and
use COPY (or at least multiple-value INSERT), then merging the new
data with the old.  Would that be the most direct route to maximum
performance?

In any case, I assume deleting and rebuilding indexes is important
here, yes?  But what about raising checkpoint_segments - does this
actually help sustained throughput?


 regards, tom lane


--
Yang Zhang
http://yz.mit.edu/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang

On Sat, Apr 27, 2013 at 12:24 AM, Yang Zhang yanghates...@gmail.com wrote:
 On Fri, Apr 26, 2013 at 9:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yang Zhang yanghates...@gmail.com writes:
 It currently takes up to 24h for us to run a large set of UPDATE
 statements on a database, which are of the form:

 UPDATE table SET field1 = constant1, field2 = constant2, ...  WHERE
 id = constid

 (We're just overwriting fields of objects identified by ID.)

 Forgive the obvious question, but you do have an index on id, right?
 Have you checked it's being used (ie EXPLAIN ANALYZE on one of these)?

 Totally valid question.  That is the primary key with its own index.
 Yes, we verified that explain says it just use a simple index scan.
 Each individual query runs reasonably quickly (we can run several
 dozen such statements per second).


 The tables have handfuls of indices each and no foreign key constraints.

 How much is a handful?

 The table with the largest volume of updates (our bottleneck) has four 
 indexes:

 account_pkey PRIMARY KEY, btree (id)
 account_createddate btree (createddate)
 account_id_prefix btree (id text_pattern_ops)
 account_recordtypeid btree (recordtypeid)


 It takes 2h to import a `pg_dump` of the entire DB.  This seems like a
 baseline we should reasonably target.

 Well, maybe.  You didn't say what percentage of the DB you're updating.

 It can be 10-50% of rows changed - a large portion.


 But the thing that comes to mind here is that you're probably incurring
 a network round trip for each row, and maybe a query-planning round as
 well, so you really can't expect that this is going to be anywhere near
 as efficient as a bulk load operation.  You could presumably get rid of
 the planner overhead by using a prepared statement.  Cutting the network
 overhead is going to require a bit more ingenuity --- could you move
 some logic into a stored procedure, perhaps, so that one command from
 the client is sufficient to update multiple rows?

 You're right, we're only sequentially issuing (unprepared) UPDATEs.

 If we ship many UPDATE statements per call to our DB API's execution
 function (we're using Python's psycopg2 if that matters, but I think
 that just binds libpq), would that avoid the network round trip per
 statement?

 If not, what if we use anonymous procedures (DO) to run multiple
 UPDATE statements?

 Finally, we could use the technique highlighted in my third bullet and
 use COPY (or at least multiple-value INSERT), then merging the new
 data with the old.  Would that be the most direct route to maximum
 performance?

 In any case, I assume deleting and rebuilding indexes is important
 here, yes?  But what about raising checkpoint_segments - does this
 actually help sustained throughput?

(I ask because I'm wondering if raising checkpoint_segments simply
postpones inevitable work, or if collecting a larger amount of changes
really does dramatically improve throughput somehow.)



 regards, tom lane


 --
 Yang Zhang
 http://yz.mit.edu/



--
Yang Zhang
http://yz.mit.edu/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang

On Sat, Apr 27, 2013 at 1:55 AM, Misa Simic misa.si...@gmail.com wrote:
 Hi,

 If dataset for update is large...

 Maybe best would be:

 From client machine, instead of sending update statements with data - export
 data to file ready for copy command
 Transfer file to the server where pg is running
 Make pgsql function which

 Create temp table
 Copy to temp from the file

 Update original table with values in temp

 UPDATE foo
 SET foo.col1 = bar.col1
 FROM bar
 WHERE foo.id = bar.id

 You dont need to do delete/insert - if you have just update comands

 From client when file is transfered - call your import function on the the
 server

 Optionaly you can run vacuum analyze after bulk operation...

But wouldn't a bulk UPDATE touch many existing pages (say, 20%
scattered around) to mark rows as dead (per MVCC)?  I guess it comes
down to: will PG be smart enough to mark dead rows in largely
sequential scans (rather than, say, jumping around in whatever order
rows from foo are yielded by the above join)?

In other words, when considering the alternative of:

CREATE TABLE newfoo AS
SELECT * FROM bar
UNION
SELECT * FROM foo
WHERE id NOT IN (SELECT id FROM bar);

Wouldn't this alternative be faster?


 Kind regards,

 Misa



 On Saturday, April 27, 2013, Yang Zhang wrote:

 On Fri, Apr 26, 2013 at 9:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Yang Zhang yanghates...@gmail.com writes:
  It currently takes up to 24h for us to run a large set of UPDATE
  statements on a database, which are of the form:
 
  UPDATE table SET field1 = constant1, field2 = constant2, ...  WHERE
  id = constid
 
  (We're just overwriting fields of objects identified by ID.)
 
  Forgive the obvious question, but you do have an index on id, right?
  Have you checked it's being used (ie EXPLAIN ANALYZE on one of these)?

 Totally valid question.  That is the primary key with its own index.
 Yes, we verified that explain says it just use a simple index scan.
 Each individual query runs reasonably quickly (we can run several
 dozen such statements per second).

 
  The tables have handfuls of indices each and no foreign key
  constraints.
 
  How much is a handful?

 The table with the largest volume of updates (our bottleneck) has four
 indexes:

 account_pkey PRIMARY KEY, btree (id)
 account_createddate btree (createddate)
 account_id_prefix btree (id text_pattern_ops)
 account_recordtypeid btree (recordtypeid)

 
  It takes 2h to import a `pg_dump` of the entire DB.  This seems like a
  baseline we should reasonably target.
 
  Well, maybe.  You didn't say what percentage of the DB you're updating.

 It can be 10-50% of rows changed - a large portion.

 
  But the thing that comes to mind here is that you're probably incurring
  a network round trip for each row, and maybe a query-planning round as
  well, so you really can't expect that this is going to be anywhere near
  as efficient as a bulk load operation.  You could presumably get rid of
  the planner overhead by using a prepared statement.  Cutting the network
  overhead is going to require a bit more ingenuity --- could you move
  some logic into a stored procedure, perhaps, so that one command from
  the client is sufficient to update multiple rows?

 You're right, we're only sequentially issuing (unprepared) UPDATEs.

 If we ship many UPDATE statements per call to our DB API's execution
 function (we're using Python's psycopg2 if that matters, but I think
 that just binds libpq), would that avoid the network round trip per
 statement?

 If not, what if we use anonymous procedures (DO) to run multiple
 UPDATE statements?

 Finally, we could use the technique highlighted in my third bullet and
 use COPY (or at least multiple-value INSERT), then merging the new
 data with the old.  Would that be the most direct route to maximum
 performance?

 In any case, I assume deleting and rebuilding indexes is important
 here, yes?  But what about raising checkpoint_segments - does this
 actually help sustained throughput?

 
  regards, tom lane


 --
 Yang Zhang
 http://yz.mit.edu/


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



--
Yang Zhang
http://yz.mit.edu/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang

On Sat, Apr 27, 2013 at 2:54 AM, Yang Zhang yanghates...@gmail.com wrote:
 On Sat, Apr 27, 2013 at 1:55 AM, Misa Simic misa.si...@gmail.com wrote:
 Hi,

 If dataset for update is large...

 Maybe best would be:

 From client machine, instead of sending update statements with data - export
 data to file ready for copy command
 Transfer file to the server where pg is running
 Make pgsql function which

 Create temp table
 Copy to temp from the file

 Update original table with values in temp

 UPDATE foo
 SET foo.col1 = bar.col1
 FROM bar
 WHERE foo.id = bar.id

 You dont need to do delete/insert - if you have just update comands

 From client when file is transfered - call your import function on the the
 server

 Optionaly you can run vacuum analyze after bulk operation...

 But wouldn't a bulk UPDATE touch many existing pages (say, 20%
 scattered around) to mark rows as dead (per MVCC)?  I guess it comes
 down to: will PG be smart enough to mark dead rows in largely
 sequential scans (rather than, say, jumping around in whatever order
 rows from foo are yielded by the above join)?

(This then begs the question - how might I see this seemingly
substantial performance implication, one way or the other, in say
EXPLAIN output or something like that?)


 In other words, when considering the alternative of:

 CREATE TABLE newfoo AS
 SELECT * FROM bar
 UNION
 SELECT * FROM foo
 WHERE id NOT IN (SELECT id FROM bar);

 Wouldn't this alternative be faster?


 Kind regards,

 Misa



 On Saturday, April 27, 2013, Yang Zhang wrote:

 On Fri, Apr 26, 2013 at 9:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Yang Zhang yanghates...@gmail.com writes:
  It currently takes up to 24h for us to run a large set of UPDATE
  statements on a database, which are of the form:
 
  UPDATE table SET field1 = constant1, field2 = constant2, ...  WHERE
  id = constid
 
  (We're just overwriting fields of objects identified by ID.)
 
  Forgive the obvious question, but you do have an index on id, right?
  Have you checked it's being used (ie EXPLAIN ANALYZE on one of these)?

 Totally valid question.  That is the primary key with its own index.
 Yes, we verified that explain says it just use a simple index scan.
 Each individual query runs reasonably quickly (we can run several
 dozen such statements per second).

 
  The tables have handfuls of indices each and no foreign key
  constraints.
 
  How much is a handful?

 The table with the largest volume of updates (our bottleneck) has four
 indexes:

 account_pkey PRIMARY KEY, btree (id)
 account_createddate btree (createddate)
 account_id_prefix btree (id text_pattern_ops)
 account_recordtypeid btree (recordtypeid)

 
  It takes 2h to import a `pg_dump` of the entire DB.  This seems like a
  baseline we should reasonably target.
 
  Well, maybe.  You didn't say what percentage of the DB you're updating.

 It can be 10-50% of rows changed - a large portion.

 
  But the thing that comes to mind here is that you're probably incurring
  a network round trip for each row, and maybe a query-planning round as
  well, so you really can't expect that this is going to be anywhere near
  as efficient as a bulk load operation.  You could presumably get rid of
  the planner overhead by using a prepared statement.  Cutting the network
  overhead is going to require a bit more ingenuity --- could you move
  some logic into a stored procedure, perhaps, so that one command from
  the client is sufficient to update multiple rows?

 You're right, we're only sequentially issuing (unprepared) UPDATEs.

 If we ship many UPDATE statements per call to our DB API's execution
 function (we're using Python's psycopg2 if that matters, but I think
 that just binds libpq), would that avoid the network round trip per
 statement?

 If not, what if we use anonymous procedures (DO) to run multiple
 UPDATE statements?

 Finally, we could use the technique highlighted in my third bullet and
 use COPY (or at least multiple-value INSERT), then merging the new
 data with the old.  Would that be the most direct route to maximum
 performance?

 In any case, I assume deleting and rebuilding indexes is important
 here, yes?  But what about raising checkpoint_segments - does this
 actually help sustained throughput?

 
  regards, tom lane


 --
 Yang Zhang
 http://yz.mit.edu/


 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



 --
 Yang Zhang
 http://yz.mit.edu/



--
Yang Zhang
http://yz.mit.edu/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Optimizing bulk update performance

2013-04-27 Thread Yang Zhang

On Sat, Apr 27, 2013 at 3:06 AM, Misa Simic misa.si...@gmail.com wrote:
 I dont know - u can test :)

I probably will, but I do have a huge stack of such experiments to run
by now, and it's always tricky / takes care to get benchmarks right,
avoid disk caches, etc.  Certainly I think it would be helpful (or at
least hopefully not harmful) to ask here to see if anyone might just
know.  That's what brought me to this list.  :)


 In whole solution it is just one command different - so easy to test and
 compare...

 To me it doesnt sound as faster... Sounds as more operation needed what
 should be done...

 And produce more problems...i.e what with table foo? What if another table
 refference foo etc...

Yep, I guess more specifically I was just thinking of dumping to a temp table:

CREATE TEMP TABLE tmp AS
SELECT * FROM foo;

TRUNCATE foo;

INSERT INTO foo
SELECT * FROM bar
UNION
SELECT * FROM tmp WHERE id NOT IN (SELECT id FROM bar);

The question I have remaining is whether the bulk UPDATE will be able
to update many rows efficiently (smartly order them to do largely
sequential scans) - if so, I imagine it would be faster than the
above.


 On Saturday, April 27, 2013, Yang Zhang wrote:

 On Sat, Apr 27, 2013 at 1:55 AM, Misa Simic misa.si...@gmail.com wrote:
  Hi,
 
  If dataset for update is large...
 
  Maybe best would be:
 
  From client machine, instead of sending update statements with data -
  export
  data to file ready for copy command
  Transfer file to the server where pg is running
  Make pgsql function which
 
  Create temp table
  Copy to temp from the file
 
  Update original table with values in temp
 
  UPDATE foo
  SET foo.col1 = bar.col1
  FROM bar
  WHERE foo.id = bar.id
 
  You dont need to do delete/insert - if you have just update comands
 
  From client when file is transfered - call your import function on the
  the
  server
 
  Optionaly you can run vacuum analyze after bulk operation...

 But wouldn't a bulk UPDATE touch many existing pages (say, 20%
 scattered around) to mark rows as dead (per MVCC)?  I guess it comes
 down to: will PG be smart enough to mark dead rows in largely
 sequential scans (rather than, say, jumping around in whatever order
 rows from foo are yielded by the above join)?

 In other words, when considering the alternative of:

 CREATE TABLE newfoo AS
 SELECT * FROM bar
 UNION
 SELECT * FROM foo
 WHERE id NOT IN (SELECT id FROM bar);

 Wouldn't this alternative be faster?

 
  Kind regards,
 
  Misa
 
 
 
  On Saturday, April 27, 2013, Yang Zhang wrote:
 
  On Fri, Apr 26, 2013 at 9:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
   Yang Zhang yanghates...@gmail.com writes:
   It currently takes up to 24h for us to run a large set of UPDATE
   statements on a database, which are of the form:
  
   UPDATE table SET field1 = constant1, field2 = constant2, ...
   WHERE
   id = constid
  
   (We're just overwriting fields of objects identified by ID.)
  
   Forgive the obvious question, but you do have an index on id,
   right?
   Have you checked it's being used (ie EXPLAIN ANALYZE on one of
   these)?
 
  Totally valid question.  That is the primary key with its own index.
  Yes, we verified that explain says it just use a simple index scan.
  Each individual query runs reasonably quickly (we can run several
  dozen such statements per second).
 
  
   The tables have handfuls of indices each and no foreign key
   constraints.
  
   How much is a handful?
 
  The table with the largest volume of updates (our bottleneck) has four
  indexes:
 
  account_pkey PRIMARY KEY, btree (id)
  account_createddate btree (createddate)
  account_id_prefix btree (id text_pattern_ops)
  account_recordtypeid btree (recordtypeid)
 
  
   It takes 2h to import a `pg_dump` of the entire DB.  This seems like
   a
   baseline we should reasonably target.
  
   Well, maybe.  You didn't say what percentage of the DB you're
   updating.
 
  It can be 10-50% of rows changed - a large portion.
 
  
   But the thing that comes to mind here is that you're probably
   incurring
   a network round trip for each row, and maybe a query-planning round
   as
   well, so you really can't expect that this is going to be anywhere
   near
   as efficient as a bulk load operation.  You could presumably get rid
   of
   the planner overhead by using a prepared statement.  Cutting the
   network
   overhead is going to require a bit more ingenuity --- could you move
   some logic into a stored procedure, perhaps, so that one command from
   the client is sufficient to update multiple rows?
 
  You're right, we're only sequentially issuing (unprepared) UPDATEs.
 
  If we ship many UPDATE statements per call to our DB API's execution
  function (we're using Python's psycopg2 if that matters, but I think
  that just binds libpq), would that avoid the network round trip per
  statement?
 
  If not, what if we use anonymous procedures (DO) to run multiple
  UPDATE statements

Re: [GENERAL] Basic question on recovery and disk snapshotting

2013-04-27 Thread Yang Zhang

On Sat, Apr 27, 2013 at 11:55 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Sat, Apr 27, 2013 at 10:40 AM, Yang Zhang yanghates...@gmail.com wrote:
 My question really boils down to: if we're interested in using COW
 snapshotting (a common feature of modern filesystems and hosting
 environments), would we necessarily need to ensure the data and
 pg_xlog are on the same snapshotted volume?


 That would certainly make it easier.  But it shouldn't be necessary, as long
 as the xlog snapshot is taken after the cluster snapshot, and also as long
 as no xlog files which were written to after the last completed checkpoint
 prior to the cluster snapshot got recycled before the xlog snapshot.   As
 long as the snapshots run quickly and promptly one after the other, this
 should not be a problem, but you should certainly validate that a snapshot
 collection has all the xlogs it needs before accepting it as being good.  If
 you find some necessary xlog files are missing, you can turn up
 wal_keep_segments and try again.

This information is gold, thank you.

How do I validate that a snapshot collection has all the xlogs it needs?




  If not, how should we be
 taking the snapshots - should we be using pg_start_backup() and then
 taking the snapshot of one before the other?  (What order?)  What if
 we have tablespaces, do we take snapshots of those, followed by the
 cluster directory, followed by pg_xlog?


 First the cluster directory (where pg_control is), then tablespaces, then
 pg_xlog.  pg_start_backup() shouldn't be necessary, unless you are running
 with full_page_writes off.  But it won't hurt, and if you don't use
 pg_start_backup you should probably run a checkpoint of your own immediately
 before starting.


 I read through
 http://www.postgresql.org/docs/9.1/static/continuous-archiving.html
 and it doesn't touch on these questions.


 Your goal seems to be to *avoid* continuous archiving, so I wouldn't expect
 that part of the docs to touch on your issues.   But see the section
 Standalone Hot Backups which would allow you to use snapshots for the
 cluster copy part, and normal archiving for just the xlogs.  The volume of
 pg_xlog should be fairly small, so this seems to me like an attractive
 option.

Just to validate my understanding, are the two options as follows?

a. Checkpoint (optional but helps with time window?), snapshot
tablespaces/cluster/xlog, validate all necessary xlogs present.

b. Set wal_level/archive_mode/archive_command, pg_start_backup,
snapshot tablespaces/cluster, pg_stop_backup to archive xlog.

(a) sounds more appealing since it's treating recovery as crash
recovery rather than backup restore, and as such seems simpler and
lower-overhead (e.g. WAL verbosity, though I don't know how much that
overhead is).  However, I'm not sure how complex that validation step
is.


 If you really don't want to use archiving, even just during the duration of
 the cluster snapshotting, then this is the part that addresses your
 questions:

 http://www.postgresql.org/docs/9.1/static/backup-file.html

I'm still interested in online backups, though - stopping the DB is a
no-go unfortunately.

--
Yang Zhang
http://yz.mit.edu/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Optimizing bulk update performance

2013-04-26 Thread Yang Zhang

It currently takes up to 24h for us to run a large set of UPDATE
statements on a database, which are of the form:

UPDATE table SET field1 = constant1, field2 = constant2, ...  WHERE
id = constid

(We're just overwriting fields of objects identified by ID.)

The tables have handfuls of indices each and no foreign key constraints.
No COMMIT is made till the end.

It takes 2h to import a `pg_dump` of the entire DB.  This seems like a
baseline we should reasonably target.

Short of producing a custom program that somehow reconstructs a dataset
for Postgresql to re-import, is there anything we can do to bring the
bulk UPDATE performance closer to that of the import?  (This is an area
that we believe log-structured merge trees handle well, but we're
wondering if there's anything we can do within Postgresql.)

Some ideas:

- dropping all non-ID indices and rebuilding afterward?
- increasing checkpoint_segments, but does this actually help sustained
  long-term throughput?
- using the techniques mentioned here?  (Load new data as table, then
  merge in old data where ID is not found in new data)
  
http://www.postgresql.org/message-id/3a0028490809301807j59498370m1442d8f5867e9...@mail.gmail.com

Basically there's a bunch of things to try and we're not sure what the
most effective are or if we're overlooking other things.  We'll be
spending the next few days experimenting, but we thought we'd ask here
as well.

Thanks.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Optimizing bulk update performance

2013-04-26 Thread Yang Zhang

We're using Postgresql 9.1.9 on Ubuntu 12.04 on EBS volumes on
m1.xlarge instances, which have:

15 GiB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
64-bit platform

(Yes, we're moving to EBS Optimized instances + Provisioned IOPS
volumes, but prelim. benchmarks suggest this won't get us enough of a
boost as much as possibly refactoring the way we're executing these
bulk updates in our application.)

On Fri, Apr 26, 2013 at 5:27 PM, Gavin Flower
gavinflo...@archidevsys.co.nz wrote:
On 27/04/13 12:14, Yang Zhang wrote:

It currently takes up to 24h for us to run a large set of UPDATE
statements on a database, which are of the form:

UPDATE table SET field1 = constant1, field2 = constant2, ... WHERE
id = constid

(We're just overwriting fields of objects identified by ID.)

The tables have handfuls of indices each and no foreign key constraints.
No COMMIT is made till the end.

It takes 2h to import a `pg_dump` of the entire DB. This seems like a
baseline we should reasonably target.

Short of producing a custom program that somehow reconstructs a dataset
for Postgresql to re-import, is there anything we can do to bring the
bulk UPDATE performance closer to that of the import? (This is an area
that we believe log-structured merge trees handle well, but we're
wondering if there's anything we can do within Postgresql.)

Some ideas:

- dropping all non-ID indices and rebuilding afterward?
- increasing checkpoint_segments, but does this actually help sustained
long-term throughput?
- using the techniques mentioned here? (Load new data as table, then
merge in old data where ID is not found in new data)

http://www.postgresql.org/message-id/3a0028490809301807j59498370m1442d8f5867e9...@mail.gmail.com

Basically there's a bunch of things to try and we're not sure what the
most effective are or if we're overlooking other things. We'll be
spending the next few days experimenting, but we thought we'd ask here
as well.

Thanks.

People will need to know your version of Postgres which Operating System
etc. plus details of CPU RAM, and Disks... AS well as what changes you have
made to postgresql.conf...

I would be inclined to DROP all indexes and reCREATE them later.

Updating a row might lead to new row being added in a new disk page, so I
suspect that updates will hit every index associated with the table with the
(possible exception of partial indexes).

Running too many updates in one transaction, may mean that Postgres may need
to use disk work files.

Depending on RAM etc, it may pay to increase some variables tat affect how
Postgres uses RAM, some of these are per session.

Cheers,
Gavin

--
Yang Zhang
http://yz.mit.edu/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Basic question on recovery and disk snapshotting

2013-04-26 Thread Yang Zhang

We're running on EBS volumes on EC2.  We're interested in leveraging
EBS snapshotting for backups.  However, does this mean we'd need to
ensure our pg_xlog is on the same EBS volume as our data?

(I believe) the usual reasoning for separating pg_xlog onto a separate
volume is for performance.  However, if they are on different volumes,
the snapshots may be out of sync.

Thanks.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Optimizing bulk update performance

2013-04-26 Thread Yang Zhang

On Fri, Apr 26, 2013 at 7:01 PM, Gavin Flower
gavinflo...@archidevsys.co.nz wrote:
Please do not top post, the convention in these list are to add stuff at the
end, apart from comments interspersed to make use of appropriate context!

Noted, thanks.

Anyway, any performance hints are greatly appreciated.

On 27/04/13 13:35, Yang Zhang wrote:

We're using Postgresql 9.1.9 on Ubuntu 12.04 on EBS volumes on
m1.xlarge instances, which have:

15 GiB memory
8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each)
64-bit platform

On Fri, Apr 26, 2013 at 5:27 PM, Gavin Flower
gavinflo...@archidevsys.co.nz wrote:

On 27/04/13 12:14, Yang Zhang wrote:

It currently takes up to 24h for us to run a large set of UPDATE
statements on a database, which are of the form:

UPDATE table SET field1 = constant1, field2 = constant2, ... WHERE
id = constid

(We're just overwriting fields of objects identified by ID.)

The tables have handfuls of indices each and no foreign key constraints.
No COMMIT is made till the end.

It takes 2h to import a `pg_dump` of the entire DB. This seems like a
baseline we should reasonably target.

Some ideas:

http://www.postgresql.org/message-id/3a0028490809301807j59498370m1442d8f5867e9...@mail.gmail.com

Thanks.

People will need to know your version of Postgres which Operating
System
etc. plus details of CPU RAM, and Disks... AS well as what changes you
have
made to postgresql.conf...

I would be inclined to DROP all indexes and reCREATE them later.

Updating a row might lead to new row being added in a new disk page, so I
suspect that updates will hit every index associated with the table with
the
(possible exception of partial indexes).

Running too many updates in one transaction, may mean that Postgres may
need
to use disk work files.

Depending on RAM etc, it may pay to increase some variables tat affect
how
Postgres uses RAM, some of these are per session.

Cheers,
Gavin

--
Yang Zhang
http://yz.mit.edu/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Why is PostgreSQL 9.1 not using index for simple equality select

2013-04-12 Thread Yang Zhang

Any hints with this question I had posted to SO?

http://stackoverflow.com/questions/15965785/why-is-postgresql-9-1-not-using-index-for-simple-equality-select

Pasted here as well.  Thanks.

My table `lead` has an index:

\d lead
...
Indexes:
lead_pkey PRIMARY KEY, btree (id)
lead_account__c btree (account__c)
...
lead_email btree (email)
lead_id_prefix btree (id text_pattern_ops)

Why doesn't PG (9.1) use the index for this straightforward equality
selection?  Emails are almost all unique

db= explain select * from lead where email = 'blah';
 QUERY PLAN

 Seq Scan on lead  (cost=0.00..319599.38 rows=1 width=5108)
   Filter: (email = 'blah'::text)
(2 rows)

Other index-hitting queries seem to be OK (though I don't know why
this one doesn't just use the pkey index):

db= explain select * from lead where id = '';
  QUERY PLAN

--
 Index Scan using lead_id_prefix on lead  (cost=0.00..8.57 rows=1
width=5108)
   Index Cond: (id = ''::text)
(2 rows)

db= explain select * from lead where account__c = '';
QUERY PLAN

--
 Index Scan using lead_account__c on lead  (cost=0.00..201.05
rows=49 width=5108)
   Index Cond: (account__c = ''::text)
(2 rows)

At first I thought it may be due to not enough distinct values of
`email`.  For instance, if the stats claim that `email` is `blah` for
most of the table, then a seq scan is faster.  But that's not the
case:

db= select count(*), count(distinct email) from lead;
 count  | count
+
 749148 | 733416
(1 row)

Even if I force seq scans to be off, the planner behaves as if it has
no other choice:

db= set enable_seqscan = off;
SET
db= show enable_seqscan;
 enable_seqscan

 off
(1 row)

db= explain select * from lead where email = 'f...@blah.com';
QUERY PLAN
---
 Seq Scan on lead  (cost=100.00..1319599.38 rows=1 width=5108)
   Filter: (email = 'f...@blah.com'::text)
(2 rows)

I searched over a good number of past SO questions but none were about
a simple equality query like this one.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why is PostgreSQL 9.1 not using index for simple equality select

2013-04-12 Thread Yang Zhang

Doesn't seem to be the case.  This table has been around for a while
and should have been auto-analyzed by now.  But anyway:

db= analyze lead;
ANALYZE
db= explain select * from lead where email = 'f...@blah.com';
QUERY PLAN
---
 Seq Scan on lead  (cost=100.00..1319666.99 rows=1 width=5208)
   Filter: (email = 'f...@blah.com'::text)
(2 rows)

On Fri, Apr 12, 2013 at 1:13 AM, John R Pierce pie...@hogranch.com wrote:
 On 4/12/2013 1:03 AM, Yang Zhang wrote:

  db= explain select * from lead where email = 'blah';
   QUERY PLAN
  
   Seq Scan on lead  (cost=0.00..319599.38 rows=1 width=5108)
 Filter: (email = 'blah'::text)



 try EXPLAIN ANALYZE .its more useful.

 my guess is, there's no statistics on this table, and doing an ANALYZE lead;
 would rectify this.



 --
 john r pierce  37N 122W
 somewhere on the middle of the left coast



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
Yang Zhang
http://yz.mit.edu/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why is PostgreSQL 9.1 not using index for simple equality select

2013-04-12 Thread Yang Zhang

It's actually just `text`.

I updated my SO question with some more info including explain analyze
(no difference), \d,
and your last incantation.

Thanks!

On Fri, Apr 12, 2013 at 7:11 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yang Zhang yanghates...@gmail.com writes:
 db= explain select * from lead where email = 'blah';
  QUERY PLAN
 
  Seq Scan on lead  (cost=0.00..319599.38 rows=1 width=5108)
Filter: (email = 'blah'::text)
 (2 rows)

 That's awfully odd.  What data type is the email column?

 It seems possible also that the index on it is marked invalid.  I'd have
 expected \d to tell you so, but maybe you're using a version of psql that
 doesn't know about that.  It'd be interesting to look at
 select * from pg_index where indexrelid = 'index name here'::regclass;


 regards, tom lane



--
Yang Zhang
http://yz.mit.edu/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why is PostgreSQL 9.1 not using index for simple equality select

2013-04-12 Thread Yang Zhang

Apologies for that Tom.  I will paste the information in line once I'm
back at my computer.  I do appreciate your help.

On Fri, Apr 12, 2013 at 10:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yang Zhang yanghates...@gmail.com writes:
 I updated my SO question with some more info including explain analyze
 (no difference), \d,
 and your last incantation.

 The question is being asked here, not in SO, and I find it rather
 impolite of you to expect me to go chasing off to some other forum
 to answer your question.

 regards, tom lane



-- 
Yang Zhang
http://yz.mit.edu/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Why is this query running slowly?

2011-09-15 Thread Yang Zhang

  nuthouse.5434: Flags [.], ack
97699, win 720, options [nop,nop,TS val 194795549 ecr 194795549],
length 0
00:44:50.430346 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
97699:105891, ack 19, win 265, options [nop,nop,TS val 194795912 ecr
194795549], length 8192
00:44:50.430455 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
105891, win 720, options [nop,nop,TS val 194795912 ecr 194795912],
length 0
00:44:54.211644 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
105891:114083, ack 19, win 265, options [nop,nop,TS val 194796290 ecr
194795912], length 8192
00:44:54.211792 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
114083, win 720, options [nop,nop,TS val 194796290 ecr 194796290],
length 0
00:44:57.948539 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
114083:122275, ack 19, win 265, options [nop,nop,TS val 194796664 ecr
194796290], length 8192
...

And I verified that the query is not blocked:

mydb=# select * from pg_stat_activity;
 datid | datname | procpid | usesysid | usename |
   current_query |
waiting |  xact_start   |  query_start
 | backend_start | client_addr | client_port
---+-+-+--+-+---+-+---+---+---+-+-
 55244 | mydb|5128 |16384 | yang|
 | f
| 2011-09-14 15:44:46.586014-07 | 2011-09-15 00:46:22.878715-07 |
2011-09-14 15:44:46.549151-07 | ::1 |   56791
:   select user_id
from den where user_id not in (select duid from user_mappings)
:   and timestamp
between '2009-04-01'::date and '2010-04-01'::date
 55244 | mydb|   28142 |16384 | yang| select * from
pg_stat_activity;
 | f   | 2011-09-15 00:46:40.100652-07 | 2011-09-15
00:46:40.100652-07 | 2011-09-15 00:46:35.913309-07 | |
 -1
(2 rows)

(The ports don't match up since this is tunneled via ssh.  `pv 
/dev/zero | ssh nuthouse 'cat/dev/null'` shows sustained bandwidth at
~30-40MB/s.)

I know that `den` is large but things seem to be running much slower
than I'd expect.  Nothing in the logs.  Is there any way to inspect
what's going on?  Hesitant to kill the query in case it's almost done,
though I doubt it (didn't have the foresight to expose this
information in the client process - wasn't expecting to run into
this).  Thanks in advance.

-- 
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Yang Zhang

If it matters, the client is a JDBC app doing:

con = DriverManager.getConnection(jdbc:postgresql://localhost:5434/mydb,
yang, password)
con.setAutoCommit(false)
val st = con.prepareStatement(
  select user_id from den where user_id not in (select duid from
user_mappings)
  and timestamp between '2009-04-01'::date and '2010-04-01'::date;
, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
st.setFetchSize(8192)
rs = st.executeQuery()

On Thu, Sep 15, 2011 at 12:53 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have a simple query that's been running for a while, which is fine,
 but it seems to be running very slowly, which is a problem:

 mydb=# explain select user_id from den where user_id not in (select
 duid from user_mappings) and timestamp between '2009-04-01' and
 '2010-04-01';

           QUERY PLAN
 ---
  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
   Filter: ((timestamp = '2009-04-01 00:00:00'::timestamp without
 time zone) AND (timestamp = '2010-04-01 00:00:00'::timestamp
 without time zone) AND (NOT (SubPlan 1)))
   SubPlan 1
     -  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
           -  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780 
 width=4)

 user_mappings is fairly small:

 mydb=# select count(*) from user_mappings;
  count
 ---
  36780
 (1 row)

 The client is on the same LAN, but only sees packets trickling in:

 $ sudo tcpdump -i lo port 5434
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
 00:44:05.475584 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 3757958746:3757966938, ack 3697417382, win 265, options [nop,nop,TS
 val 194791416 ecr 194791044], length 8192
 00:44:05.475684 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 8192, win 771, options [nop,nop,TS val 194791416 ecr 194791416],
 length 0
 00:44:08.867976 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 8192:15779, ack 1, win 265, options [nop,nop,TS val 194791756 ecr
 194791416], length 7587
 00:44:08.868019 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 15779, win 771, options [nop,nop,TS val 194791756 ecr 194791756],
 length 0
 00:44:08.870336 IP nuthouse.40349  nuthouse.5434: Flags [P.], seq
 1:19, ack 15779, win 771, options [nop,nop,TS val 194791756 ecr
 194791756], length 18
 00:44:08.870343 IP nuthouse.5434  nuthouse.40349: Flags [.], ack 19,
 win 265, options [nop,nop,TS val 194791756 ecr 194791756], length 0
 00:44:12.578220 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 15779:23971, ack 19, win 265, options [nop,nop,TS val 194792127 ecr
 194791756], length 8192
 00:44:12.618116 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 23971, win 771, options [nop,nop,TS val 194792131 ecr 194792127],
 length 0
 00:44:16.664645 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 23971:32163, ack 19, win 265, options [nop,nop,TS val 194792535 ecr
 194792131], length 8192
 00:44:16.664755 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 32163, win 771, options [nop,nop,TS val 194792535 ecr 194792535],
 length 0
 00:44:20.465773 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 32163:40355, ack 19, win 265, options [nop,nop,TS val 194792915 ecr
 194792535], length 8192
 00:44:20.465878 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 40355, win 720, options [nop,nop,TS val 194792915 ecr 194792915],
 length 0
 00:44:24.115273 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 40355:48547, ack 19, win 265, options [nop,nop,TS val 194793280 ecr
 194792915], length 8192
 00:44:24.115380 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 48547, win 720, options [nop,nop,TS val 194793280 ecr 194793280],
 length 0
 00:44:27.749084 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 48547:56739, ack 19, win 265, options [nop,nop,TS val 194793644 ecr
 194793280], length 8192
 00:44:27.749192 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 56739, win 720, options [nop,nop,TS val 194793644 ecr 194793644],
 length 0
 00:44:31.618124 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 56739:64931, ack 19, win 265, options [nop,nop,TS val 194794031 ecr
 194793644], length 8192
 00:44:31.618166 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 64931, win 720, options [nop,nop,TS val 194794031 ecr 194794031],
 length 0
 00:44:35.349958 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 64931:73123, ack 19, win 265, options [nop,nop,TS val 194794404 ecr
 194794031], length 8192
 00:44:35.350054 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 73123, win 720, options [nop,nop,TS val 194794404 ecr 194794404],
 length 0
 00:44:39.311627 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 73123:81315, ack 19, win 265, options [nop,nop,TS val 194794800 ecr
 194794404], length 8192

Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Yang Zhang

Should also add that while the client is under no load (CPU1%, load
~0.1, mem ~20%), the server looks pretty busy (CPU 90-100% of 1 core,
load ~1.5, mem ~70-80%), but PG is the only thing using resources.

On Thu, Sep 15, 2011 at 12:58 AM, Yang Zhang yanghates...@gmail.com wrote:
 If it matters, the client is a JDBC app doing:

    con = DriverManager.getConnection(jdbc:postgresql://localhost:5434/mydb,
 yang, password)
    con.setAutoCommit(false)
    val st = con.prepareStatement(
      select user_id from den where user_id not in (select duid from
 user_mappings)
      and timestamp between '2009-04-01'::date and '2010-04-01'::date;
    , ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY)
    st.setFetchSize(8192)
    rs = st.executeQuery()

 On Thu, Sep 15, 2011 at 12:53 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have a simple query that's been running for a while, which is fine,
 but it seems to be running very slowly, which is a problem:

 mydb=# explain select user_id from den where user_id not in (select
 duid from user_mappings) and timestamp between '2009-04-01' and
 '2010-04-01';

           QUERY PLAN
 ---
  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
   Filter: ((timestamp = '2009-04-01 00:00:00'::timestamp without
 time zone) AND (timestamp = '2010-04-01 00:00:00'::timestamp
 without time zone) AND (NOT (SubPlan 1)))
   SubPlan 1
     -  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
           -  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780 
 width=4)

 user_mappings is fairly small:

 mydb=# select count(*) from user_mappings;
  count
 ---
  36780
 (1 row)

 The client is on the same LAN, but only sees packets trickling in:

 $ sudo tcpdump -i lo port 5434
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on lo, link-type EN10MB (Ethernet), capture size 96 bytes
 00:44:05.475584 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 3757958746:3757966938, ack 3697417382, win 265, options [nop,nop,TS
 val 194791416 ecr 194791044], length 8192
 00:44:05.475684 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 8192, win 771, options [nop,nop,TS val 194791416 ecr 194791416],
 length 0
 00:44:08.867976 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 8192:15779, ack 1, win 265, options [nop,nop,TS val 194791756 ecr
 194791416], length 7587
 00:44:08.868019 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 15779, win 771, options [nop,nop,TS val 194791756 ecr 194791756],
 length 0
 00:44:08.870336 IP nuthouse.40349  nuthouse.5434: Flags [P.], seq
 1:19, ack 15779, win 771, options [nop,nop,TS val 194791756 ecr
 194791756], length 18
 00:44:08.870343 IP nuthouse.5434  nuthouse.40349: Flags [.], ack 19,
 win 265, options [nop,nop,TS val 194791756 ecr 194791756], length 0
 00:44:12.578220 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 15779:23971, ack 19, win 265, options [nop,nop,TS val 194792127 ecr
 194791756], length 8192
 00:44:12.618116 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 23971, win 771, options [nop,nop,TS val 194792131 ecr 194792127],
 length 0
 00:44:16.664645 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 23971:32163, ack 19, win 265, options [nop,nop,TS val 194792535 ecr
 194792131], length 8192
 00:44:16.664755 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 32163, win 771, options [nop,nop,TS val 194792535 ecr 194792535],
 length 0
 00:44:20.465773 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 32163:40355, ack 19, win 265, options [nop,nop,TS val 194792915 ecr
 194792535], length 8192
 00:44:20.465878 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 40355, win 720, options [nop,nop,TS val 194792915 ecr 194792915],
 length 0
 00:44:24.115273 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 40355:48547, ack 19, win 265, options [nop,nop,TS val 194793280 ecr
 194792915], length 8192
 00:44:24.115380 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 48547, win 720, options [nop,nop,TS val 194793280 ecr 194793280],
 length 0
 00:44:27.749084 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 48547:56739, ack 19, win 265, options [nop,nop,TS val 194793644 ecr
 194793280], length 8192
 00:44:27.749192 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 56739, win 720, options [nop,nop,TS val 194793644 ecr 194793644],
 length 0
 00:44:31.618124 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 56739:64931, ack 19, win 265, options [nop,nop,TS val 194794031 ecr
 194793644], length 8192
 00:44:31.618166 IP nuthouse.40349  nuthouse.5434: Flags [.], ack
 64931, win 720, options [nop,nop,TS val 194794031 ecr 194794031],
 length 0
 00:44:35.349958 IP nuthouse.5434  nuthouse.40349: Flags [P.], seq
 64931:73123, ack 19, win 265, options [nop,nop,TS val 194794404 ecr
 194794031], length 8192
 00:44:35.350054

Re: [GENERAL] Why is this query running slowly?

2011-09-15 Thread Yang Zhang

On Thu, Sep 15, 2011 at 1:22 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 15 Září 2011, 9:53, Yang Zhang wrote:
 I have a simple query that's been running for a while, which is fine,
 but it seems to be running very slowly, which is a problem:

 mydb=# explain select user_id from den where user_id not in (select
 duid from user_mappings) and timestamp between '2009-04-01' and
 '2010-04-01';

            QUERY PLAN
 ---
  Seq Scan on den  (cost=711.58..66062724212.74 rows=22634720 width=4)
    Filter: ((timestamp = '2009-04-01 00:00:00'::timestamp without
 time zone) AND (timestamp = '2010-04-01 00:00:00'::timestamp
 without time zone) AND (NOT (SubPlan 1)))
    SubPlan 1
      -  Materialize  (cost=711.58..1223.38 rows=36780 width=4)
            -  Seq Scan on user_mappings  (cost=0.00..530.80 rows=36780
 width=4)

 user_mappings is fairly small:

 The problem is that for each of the 22634720 rows in den a separate
 uncorrelated subquery (a seq scan on user_mappings) has to be executed.
 Althogh the subquery is not very expensive, multiplied by the number of
 rows in den the total cost is extreme.

 The only solution is to get rid of the not in subquery - try to turn it
 to a join like this:

 SELECT user_id FROM den LEFT JOIN user_mappings ON (user_id = duid)
 WHERE (timestamp BETWEEN '2009-04-01' AND '2010-04-01')
  AND (duid IS NULL)

 That should give the same result I guess.

This worked great, thank you.  Too bad the planner isn't smart enough
to do this yet!


 Tomas





-- 
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Preventing OOM kills

2011-05-24 Thread Yang Zhang

PG tends to be picked on by the Linux OOM killer, so lately we've been
forcing the OOM killer to kill other processes first with this script:

while true; do
  for i in `pgrep postgres`; do
echo -17  /proc/$i/oom_adj
  done
  sleep 60
done

Is there a Better Way?  Thanks in advance.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Disk space usage discrepancy

2011-04-22 Thread Yang Zhang

We're trying to figure out how to account for our disk space
consumption in a database.

$ sudo du -shx /var/lib/postgresql/8.4/main/
1.9G/var/lib/postgresql/8.4/main/

But when we query Postgresql to find out how much disk space is
actually being used by the various databases, we get a total of under
600MB (the exact query we use for determining this is below, derived
from the example query in the PG documentation):

$ pg-dbspace
psql: FATAL:  database template0 is not currently accepting connections
  1272446976 rp
   971186176 sfbox
   513794048 yang
30326784 ch
16400384 reviewboard
14958592 pod
 6733824 cbkup
 5767168 redmine_default
 2138112 ibkup
 2138112 foo
 2113536 template1
 2113536 postgres

There are two databases with tablespaces on different volumes than
what /var/lib/postgresql/ is on - their PG-reported consumption is
~2.1GB, and they take up about ~1.5x more on disk:

$ df -h | fgrep /mnt
/dev/sdf  2.0G  1.4G  502M  74% /mnt/box
/dev/sdg  5.0G  2.1G  2.7G  44% /mnt/rp

We're also curious about the 1.5x overhead, but we're mainly not sure
why the rest of the database takes up 3x more than reported, even
discounting pg_xlog (which is for the entire cluster):

$ sudo -i 'du -shx /var/lib/postgresql/8.4/main/*'
1.8G/var/lib/postgresql/8.4/main/base
816K/var/lib/postgresql/8.4/main/global
144K/var/lib/postgresql/8.4/main/pg_clog
28K /var/lib/postgresql/8.4/main/pg_multixact
192K/var/lib/postgresql/8.4/main/pg_stat_tmp
80K /var/lib/postgresql/8.4/main/pg_subtrans
4.0K/var/lib/postgresql/8.4/main/pg_tblspc
4.0K/var/lib/postgresql/8.4/main/pg_twophase
4.0K/var/lib/postgresql/8.4/main/PG_VERSION
129M/var/lib/postgresql/8.4/main/pg_xlog
4.0K/var/lib/postgresql/8.4/main/postmaster.opts
4.0K/var/lib/postgresql/8.4/main/postmaster.pid
0   /var/lib/postgresql/8.4/main/server.crt
0   /var/lib/postgresql/8.4/main/server.key

Any hints?  Thanks in advance.

The queries were using:

$ type pg-dbspace
pg-dbspace is a function
pg-dbspace ()
{
for db in $(psql -Atc 'select datname from pg_database');
do
printf '%12d %s\n' $(PGDATABASE=$db pg-space total) $db;
done | sort -rn
}

$ type pg-space
pg-space is a function
pg-space ()
{
local schema=${schema:-${1:-}} flags=;
case ${schema:-} in
total)
local query='select sum(bytes) from schemas' flags=-At
;;
'*')
local query='select * from tables'
;;
'')
local query='select * from schemas'
;;
*)
local query=select * from tables where _schema = '$schema'
;;
esac;
psql $flags -c 
with
  total as (
select sum(pg_relation_size(oid)) from pg_class where relkind = 'r'
  ),
  basic as (
select
  n.nspname as _schema,
  relname as _table,
  pg_relation_size(r.oid) as bytes,
  (100*pg_relation_size(r.oid)/(select * from
total))::numeric(4,1) as pct
from pg_class r inner join pg_namespace n on (n.oid = relnamespace)
where relkind = 'r'
  ),
  tables as (
select
  _schema,
  _table,
  bytes,
  lpad(pg_size_pretty(bytes), 9) as size,
  pct
from basic
order by bytes desc
  ),
  schemas as (
select
  _schema,
  sum(bytes) as bytes,
  lpad(pg_size_pretty(sum(bytes)::int), 9) as size,
  sum(pct) as pct
from basic
group by _schema
order by bytes desc
  )
$query;
  
}

-- 
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Compression

2011-04-14 Thread Yang Zhang

Is there any effort to add compression into PG, a la MySQL's
row_format=compressed or HBase's LZO block compression?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Compression

2011-04-14 Thread Yang Zhang

On Thu, Apr 14, 2011 at 5:07 PM, Adrian Klaver adrian.kla...@gmail.com wrote:
 On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:

 On 15/04/2011 7:01 AM, Yang Zhang wrote:

  Is there any effort to add compression into PG, a la MySQL's

  row_format=compressed or HBase's LZO block compression?



 There's no row compression, but as mentioned by others there is

 out-of-line compression of large values using TOAST.

 I could be misunderstanding but I thought compression happened in the row as
 well. From the docs:

 EXTENDED allows both compression and out-of-line storage. This is the
 default for most TOAST-able data types. Compression will be attempted first,
 then out-of-

 line storage if the row is still too big. 



 Row compression would be interesting, but I can't imagine it not having

 been investigated already.

 --

 Adrian Klaver

 adrian.kla...@gmail.com

Already know about TOAST.  I could've been clearer, but that's not the
same as the block-/page-level compression I was referring to.

-- 
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Compression

2011-04-14 Thread Yang Zhang

On Thu, Apr 14, 2011 at 7:42 PM, Adrian Klaver adrian.kla...@gmail.com wrote:
On Thursday, April 14, 2011 5:51:21 pm Yang Zhang wrote:

adrian.kla...@gmail.com

Already know about TOAST. I could've been clearer, but that's not the

same as the block-/page-level compression I was referring to.

I am obviously missing something. The TOAST mechanism is designed to keep
tuple data below the default 8KB page size. In fact it kicks in at a lower
level than that:

The TOAST code is triggered only when a row value to be stored in a table
is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code
will compress and/or move field values out-of-line until the row value is
shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB) or no more gains
can be had. During an UPDATE operation, values of unchanged fields are
normally preserved as-is; so an UPDATE of a row with out-of-line values
incurs no TOAST costs if none of the out-of-line values change.'

Granted no all data types are TOASTable. Are you looking for something more
aggressive than that?

Yes.

http://blog.oskarsson.nu/2009/03/hadoop-feat-lzo-save-disk-space-and.html

http://wiki.apache.org/hadoop/UsingLzoCompression

http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals-algorithms.html

Adrian Klaver

adrian.kla...@gmail.com

--
Yang Zhang
http://yz.mit.edu/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Compression

2011-04-14 Thread Yang Zhang

On Thu, Apr 14, 2011 at 6:46 PM, mark dvlh...@gmail.com wrote:

 -Original Message-
 From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general-
 ow...@postgresql.org] On Behalf Of Yang Zhang
 Sent: Thursday, April 14, 2011 6:51 PM
 To: Adrian Klaver
 Cc: pgsql-general@postgresql.org; Craig Ringer
 Subject: Re: [GENERAL] Compression

 On Thu, Apr 14, 2011 at 5:07 PM, Adrian Klaver
 adrian.kla...@gmail.com wrote:
  On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:

  On 15/04/2011 7:01 AM, Yang Zhang wrote:

   Is there any effort to add compression into PG, a la MySQL's

   row_format=compressed or HBase's LZO block compression?

  There's no row compression, but as mentioned by others there is

  out-of-line compression of large values using TOAST.

  I could be misunderstanding but I thought compression happened in the
 row as
  well. From the docs:

  EXTENDED allows both compression and out-of-line storage. This is
 the
  default for most TOAST-able data types. Compression will be attempted
 first,
  then out-of-

  line storage if the row is still too big. 

  Row compression would be interesting, but I can't imagine it not
 having

  been investigated already.

  --

  Adrian Klaver

  adrian.kla...@gmail.com

 Already know about TOAST.  I could've been clearer, but that's not the
 same as the block-/page-level compression I was referring to.

 There is a (closed source) PG fork that has row (or column) oriented storage
 that can have compression applied to them if you are willing to give up
 updates and deletes on the table that is.

Greenplum and Aster?

We *are* mainly doing analytical (non-updating/deleting) processing.
But it's not a critical pain point - we're mainly interested in FOSS
for now.

 I haven't seen a lot of people talking about wanting that in the Postgres
 core tho.

 -M

 --
 Yang Zhang
 http://yz.mit.edu/

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general

-- 
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] .pgpass not working?

2011-04-08 Thread Yang Zhang

I'm using the postgresql 8.4.7 in Ubuntu 10.04, and I'm trying to use
.pgpass documented here:

http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html

I have a ~/.pgpass with 600 perms containing:

myhostname.com:yang:yang:mypassword

However, it doesn't seem to get picked up by psql -h myhostname.com.
I also tried explicitly specifying PGPASSFILE, but that didn't work.
Neither that env var nor .pgpass is documented in man psql.

Entering my password manually works, however.  (Don't need to specify
username/database since they default to my username, which is also
yang.)

Also tried connecting with psycopg2 via sqlalchemy (connect string
postgresql://myhostname.com/yang), and it also didn't auto pick up
.pgpass.

Any hints on what's up?  Thanks in advance.

Yang

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] .pgpass not working?

2011-04-08 Thread Yang Zhang

Dah, left out the port.

On Fri, Apr 8, 2011 at 10:36 PM, Yang Zhang yanghates...@gmail.com wrote:
 I'm using the postgresql 8.4.7 in Ubuntu 10.04, and I'm trying to use
 .pgpass documented here:

 http://www.postgresql.org/docs/8.4/interactive/libpq-pgpass.html

 I have a ~/.pgpass with 600 perms containing:

 myhostname.com:yang:yang:mypassword

 However, it doesn't seem to get picked up by psql -h myhostname.com.
 I also tried explicitly specifying PGPASSFILE, but that didn't work.
 Neither that env var nor .pgpass is documented in man psql.

 Entering my password manually works, however.  (Don't need to specify
 username/database since they default to my username, which is also
 yang.)

 Also tried connecting with psycopg2 via sqlalchemy (connect string
 postgresql://myhostname.com/yang), and it also didn't auto pick up
 .pgpass.

 Any hints on what's up?  Thanks in advance.

 Yang




-- 
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] What happened to ALTER SCHEMA ... SET TABLESPACE?

2011-04-07 Thread Yang Zhang

Last I could find on this, it was slated for 8.1 inclusion:

http://archives.postgresql.org/pgsql-patches/2004-08/msg00425.php

But it doesn't seem to be there in my PG8.4

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] What happened to ALTER SCHEMA ... SET TABLESPACE?

2011-04-07 Thread Yang Zhang

Also, in PG8.4+, is there any way to set the default tablespace on a
per-schema basis?

On Thu, Apr 7, 2011 at 12:27 PM, Yang Zhang yanghates...@gmail.com wrote:
 Last I could find on this, it was slated for 8.1 inclusion:

 http://archives.postgresql.org/pgsql-patches/2004-08/msg00425.php

 But it doesn't seem to be there in my PG8.4


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] How to set password without echoing characters?

2011-04-06 Thread Yang Zhang

Anyone know how to set your password without having it visible in the
terminal, psql history, etc. (i.e. getpass(3))? Hoping there's a
program that already does this. Thanks.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to set password without echoing characters?

2011-04-06 Thread Yang Zhang

On Wed, Apr 6, 2011 at 2:15 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Wed, Apr 6, 2011 at 2:09 PM, Adrian Klaver adrian.kla...@gmail.com wrote:
 On Wednesday, April 06, 2011 2:00:53 pm Yang Zhang wrote:
 Anyone know how to set your password without having it visible in the
 terminal, psql history, etc. (i.e. getpass(3))? Hoping there's a
 program that already does this. Thanks.

 http://www.postgresql.org/docs/9.0/interactive/libpq-pgpass.html

 This seems to only address authenticating with an already-set
 password, and not setting the password


Just discovered the \password psql command. Thanks.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How to set password without echoing characters?

2011-04-06 Thread Yang Zhang

On Wed, Apr 6, 2011 at 2:09 PM, Adrian Klaver adrian.kla...@gmail.com wrote:
 On Wednesday, April 06, 2011 2:00:53 pm Yang Zhang wrote:
 Anyone know how to set your password without having it visible in the
 terminal, psql history, etc. (i.e. getpass(3))? Hoping there's a
 program that already does this. Thanks.

 http://www.postgresql.org/docs/9.0/interactive/libpq-pgpass.html

This seems to only address authenticating with an already-set
password, and not setting the password

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Preventing accidental non-SSL connections in psql?

2011-04-06 Thread Yang Zhang

How do I prevent accidental non-SSL connections (at least to specific
hosts) when connecting via psql? Is there any configuration for this?
Thanks.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Preventing accidental non-SSL connections in psql?

2011-04-06 Thread Yang Zhang

On Wed, Apr 6, 2011 at 4:22 PM, Adrian Klaver adrian.kla...@gmail.com wrote:
 On Wednesday, April 06, 2011 4:06:40 pm Yang Zhang wrote:
 How do I prevent accidental non-SSL connections (at least to specific
 hosts) when connecting via psql? Is there any configuration for this?
 Thanks.

 http://www.postgresql.org/docs/9.0/interactive/auth-pg-hba-conf.html
 hostssl
 http://www.postgresql.org/docs/9.0/interactive/libpq-connect.html
 sslmode

I'm aware of sslmode and hostssl - the threat model I'm asking about
is the client getting MITM'd because the user forgets to specify `psql
sslmode=verify-full`.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Preventing accidental non-SSL connections in psql?

2011-04-06 Thread Yang Zhang

On Wed, Apr 6, 2011 at 4:53 PM, Adrian Klaver adrian.kla...@gmail.com wrote:
 On Wednesday, April 06, 2011 4:24:30 pm Yang Zhang wrote:

 On Wed, Apr 6, 2011 at 4:22 PM, Adrian Klaver adrian.kla...@gmail.com
 wrote:

  On Wednesday, April 06, 2011 4:06:40 pm Yang Zhang wrote:

  How do I prevent accidental non-SSL connections (at least to specific

  hosts) when connecting via psql? Is there any configuration for this?

  Thanks.

 

  http://www.postgresql.org/docs/9.0/interactive/auth-pg-hba-conf.html

  hostssl

  http://www.postgresql.org/docs/9.0/interactive/libpq-connect.html

  sslmode



 I'm aware of sslmode and hostssl - the threat model I'm asking about

 is the client getting MITM'd because the user forgets to specify `psql

 sslmode=verify-full`.

 http://www.postgresql.org/docs/9.0/interactive/ssl-tcp.html

 17.8.1. Using client certificates

Client certs are only for client authentication; I'm interested in
mandatory server authentication.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Preventing accidental non-SSL connections in psql?

2011-04-06 Thread Yang Zhang

On Wed, Apr 6, 2011 at 4:57 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Wed, Apr 6, 2011 at 5:24 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Wed, Apr 6, 2011 at 4:22 PM, Adrian Klaver adrian.kla...@gmail.com 
 wrote:
 On Wednesday, April 06, 2011 4:06:40 pm Yang Zhang wrote:
 How do I prevent accidental non-SSL connections (at least to specific
 hosts) when connecting via psql? Is there any configuration for this?
 Thanks.

 http://www.postgresql.org/docs/9.0/interactive/auth-pg-hba-conf.html
 hostssl
 http://www.postgresql.org/docs/9.0/interactive/libpq-connect.html
 sslmode

 I'm aware of sslmode and hostssl - the threat model I'm asking about
 is the client getting MITM'd because the user forgets to specify `psql
 sslmode=verify-full`.

 As long as you only have hostssl entries for connections the users
 can't connect without ssl.


hostssl is a server-side policy; I'm interested in setting up my
client with mandatory server authentication.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Preventing accidental non-SSL connections in psql?

2011-04-06 Thread Yang Zhang

On Wed, Apr 6, 2011 at 6:18 PM, Adrian Klaver adrian.kla...@gmail.com wrote:
 On Wednesday, April 06, 2011 5:21:23 pm Yang Zhang wrote:

 On Wed, Apr 6, 2011 at 4:57 PM, Scott Marlowe scott.marl...@gmail.com
 wrote:

  On Wed, Apr 6, 2011 at 5:24 PM, Yang Zhang yanghates...@gmail.com
  wrote:

  On Wed, Apr 6, 2011 at 4:22 PM, Adrian Klaver adrian.kla...@gmail.com
  wrote:

  On Wednesday, April 06, 2011 4:06:40 pm Yang Zhang wrote:

  How do I prevent accidental non-SSL connections (at least to specific

  hosts) when connecting via psql? Is there any configuration for this?

  Thanks.

 

  http://www.postgresql.org/docs/9.0/interactive/auth-pg-hba-conf.html

  hostssl

  http://www.postgresql.org/docs/9.0/interactive/libpq-connect.html

  sslmode

 

  I'm aware of sslmode and hostssl - the threat model I'm asking about

  is the client getting MITM'd because the user forgets to specify `psql

  sslmode=verify-full`.

 

  As long as you only have hostssl entries for connections the users

  can't connect without ssl.



 hostssl is a server-side policy; I'm interested in setting up my

 client with mandatory server authentication.

 http://www.postgresql.org/docs/9.0/interactive/libpq-envars.html

 PGSSLMODE behaves the same as the sslmode connection parameter.

 Now you have both ends and the middle:)

Thanks, exactly what I was looking for.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Attaching/detaching tablespaces (or, in general, parts of a DB)

2011-04-06 Thread Yang Zhang

Is it possible to attach or detach parts of a DB (e.g. tablespaces),
such that I can flexibly move the disks containing the DB tables
around to different hosts?

The last discussion I could find on this topic is from 2007, and the
answer was No:

http://postgresql.1045698.n5.nabble.com/Possible-to-Attach-Detach-Tablespaces-td1885602.html

Wondering if there's been any change to the answer since then.  I can
alternatively dump/load, which is slower and requires 2x the disk
space.  Thanks in advance.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Disk space usage analyzer?

2011-03-25 Thread Yang Zhang

Is there any tool for breaking down how much disk space is used by
(could be freed by removing) various tables, indexes, selected rows,
etc.?  Thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Default permissions for CREATE SCHEMA/TABLE?

2011-03-24 Thread Yang Zhang

Any way I can have all newly created schemas/tables be owned by, or
have all permissions granted to, a certain group, without having to
remember to GRANT ALL ON [SCHEMA|TABLE] TO that group?  Thanks in
advance.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Default permissions for CREATE SCHEMA/TABLE?

2011-03-24 Thread Yang Zhang

Any luck if I'm still on 8.4?

On Thu, Mar 24, 2011 at 3:17 PM, David Johnston pol...@yahoo.com wrote:
 Keep in mind if you want to alter the GLOBAL privileges (i.e., the defaults)
 granted via PUBLIC you MUST NOT specify a schema.

 From what I can tell there is no way to associate a default owner different
 that the one executing the CREATE statement (though some inheritance cases
 do arise IIRC).

 David J.

 -Original Message-
 From: pgsql-general-ow...@postgresql.org
 [mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Stephen Frost
 Sent: Thursday, March 24, 2011 4:56 PM
 To: Yang Zhang
 Cc: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Default permissions for CREATE SCHEMA/TABLE?

 * Yang Zhang (yanghates...@gmail.com) wrote:
 Any way I can have all newly created schemas/tables be owned by, or
 have all permissions granted to, a certain group, without having to
 remember to GRANT ALL ON [SCHEMA|TABLE] TO that group?  Thanks in
 advance.

 ALTER DEFAULT PRIVILEGES FOR ROLE my_user IN SCHEMA my_schema GRANT SELECT
 ON TABLES TO other_role;

 http://www.postgresql.org/docs/9.0/static/sql-alterdefaultprivileges.html

        Thanks,

                Stephen





-- 
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Compression hacks?

2011-02-24 Thread Yang Zhang

For various workloads, compression could be a win on both disk space
and speed (see, e.g.,
http://blog.oskarsson.nu/2009/03/hadoop-feat-lzo-save-disk-space-and.html).
 I realize Postgresql doesn't have general table compression a la
InnoDB's row_format=compressed (there's TOAST for large values and
there's some old discussion on
http://wiki.postgresql.org/wiki/CompressedTables), but I thought I'd
ask: anybody tried to compress their PG data somehow?  E.g., any
positive experiences running PG on a compressed filesystem (and any
caveats)?  Anecdotal stories of the effects of app-level large-field
compression in analytical workloads (though I'd be curious about
transactional workloads as well)?  Thanks in advance.

-- 
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Synchronous replication hack for 9.0?

2010-09-27 Thread Yang Zhang

I realize that synchronous replication is in the works for 9.1, but
for 9.0, could you hack your way to synchronous replication by polling
the standby's pg_last_xlog_receive_location() and waiting until it's
= pg_current_xlog_location() from your last transaction? Any major
pitfalls here (besides the need to balance polling frequency and
latency)? Any recommended polling periods?

Aside: would 9.1's synchronous replication be adding anything that
lowers the replication delay compared to asynchronous replication? Or
could the changes be summarized as commits now wait for the standby?
--
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Checking for stale reads on hot standby

2010-09-26 Thread Yang Zhang

Say you have an application using PG asynchronous streaming
replication to some hot standbys, to distribute the read load. The
application itself is a typical web application consisting of multiple
servers, serving a number of sessions (perhaps belonging to different
users), and the workload is OLTP-ish, with each session continually
issuing a bunch of transactions. To guarantee session timeline
consistency for clients of the application, you want to make sure that
they can read data that's at least as new as anything they've
read/written previously, never traveling back in time.

With asynchronous replication, after seeing a new version of the data
from one standby, you may see an older version from a subsequent query
to another standby. The question: what are some ways to provide this
form of consistency in the context of PG asynchronous replication?

Is the standard/recommended approach to use a sequence representing
the global database version? Here, the application is responsible for
incrementing this from update transactions. In read transactions,
check that the sequence value is = the session's highest-seen-value,
and raise the latter if necessary.

Thanks in advance.
--
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Bizarre problem: Python stored procedure using protocol buffers not working

2010-05-15 Thread Yang Zhang

I have a table containing a protocol buffer (pb) attribute, and a
stored procedure foo that tries to parse it and return the parsed
value as a human-friendly string (not actual application code, just a
minimal test case to demonstrate the problem). But running foo gives
me nothing back.

yang=# \d qapb
 Table public.qapb
 Column |  Type   | Modifiers
+-+---
 id | integer |
 pb | bytea   |

yang=# select * from qapb;
 id |   pb
+
  0 | \012\006hello?\020\000\030\000 \000*\014\012\006hello!\020\000\030\000
(1 row)

yang=# create or replace function foo() returns text as $$
import sys, plpy
if '/home/yang/work/pod/' not in sys.path:
sys.path.append('/home/yang/work/pod/')
from my_pb2 import * # this is a protobuf generated module
rv = plpy.execute('select * from qapb')
q = Q()
q.ParseFromString(str(rv[0]['pb']))
return str(q);
$$ language plpythonu;
CREATE FUNCTION
yang=# select foo();
 foo
-

(1 row)

From a Python prompt, this works fine. I also verified that parsing
works fine from a Python client that first pulls the blob out and
parses client-side.

 from my_pb2 import *
 q=Q()
 q.ParseFromString('\012\006hello?\020\000\030\000 
 \000*\014\012\006hello!\020\000\030\000')
 print q
a: hello?
b: 0
c: 0
d: 0
e {
  a: hello!
  b: 0
  c: 0
}

Any hints on how I can get to the bottom of this? Thanks in advance.
--
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bizarre problem: Python stored procedure using protocol buffers not working

2010-05-15 Thread Yang Zhang

On Sat, May 15, 2010 at 10:20 PM, Peter Eisentraut pete...@gmx.net wrote:
 You have null bytes in the data value, which is not supported very well
 in PL/Python.  Try the 9.0 beta version; it should be fixed there.

Thanks. Out of curiosity, is this an issue just with PL/Python or with
other stored procedure languages as well?
-- 
Yang Zhang
http://yz.mit.edu/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Performance cost of a sort-merge join

2010-02-22 Thread Yang Zhang

Hi, I have the given tables and query, but I'm confused why the cost
of the query is so high. I've left it running over night. By
comparison, a select * from metarelcloud_transactionlog order by
transactionid takes on the order of seconds/minutes (at least in
MySQL). As far as I can tell, the complex query below basically just
consists of two sorts (three if PG doesn't in fact reuse the leaf
sorts). Why the enormous estimated page IO count (cost) on the second
sort? There are roughly 30 tuples per transactionid, so the join
shouldn't produce a vastly exploded dataset. Thanks in advance.

tpcc=# \d metarelcloud_graph
  Table public.metarelcloud_graph
  Column  | Type  | Modifiers
--+---+---
 tableid1 | character varying(20) | not null
 tupleid1 | integer   | not null
 tableid2 | character varying(20) | not null
 tupleid2 | integer   | not null
 node1| integer   | not null
 node2| integer   | not null
 weight   | integer   | not null
Indexes:
metarelcloud_graph_pkey PRIMARY KEY, btree (tableid1, tupleid1,
tableid2, tupleid2)

tpcc=# \d metarelcloud_transactionlog
   Table
public.metarelcloud_transactionlog
   Column| Type  |
   Modifiers
-+---+--
 id  | integer   | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
 transactionid   | integer   | not null
 queryid | smallint  | not null
 tableid | character varying(30) | not null
 tupleid | integer   | not null
 querytype   | character varying | not null
 graphpartition  | smallint  |
 replicatedpartition | smallint  |
 justifiedpartition  | smallint  |
 hashpartition   | smallint  |
 nodeid  | integer   |
 manualpartition | smallint  |
Indexes:
metarelcloud_transactionlog_pkey PRIMARY KEY, btree (id)
Check constraints:
metarelcloud_transactionlog_graphpartition_check CHECK
(graphpartition = 0)
metarelcloud_transactionlog_hashpartition_check CHECK (hashpartition = 0)
metarelcloud_transactionlog_justifiedpartition_check CHECK
(justifiedpartition = 0)
metarelcloud_transactionlog_manualpartition_check CHECK
(manualpartition = 0)
metarelcloud_transactionlog_querytype_check CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
metarelcloud_transactionlog_replicatedpartition_check CHECK
(replicatedpartition = 0)

tpcc=# analyze metarelcloud_transactionlog;
ANALYZE

tpcc=# explain insert into metarelcloud_graph (node1, node2, tableid1,
tupleid1, tableid2, tupleid2, weight)
select 0, 0, a.tableid, a.tupleid, b.tableid, b.tupleid, count(*)
from metarelcloud_transactionlog a, metarelcloud_transactionlog b
where a.transactionid = b.transactionid
  and (a.tableid, a.tupleid)  (b.tableid, b.tupleid)
group by a.tableid, a.tupleid, b.tableid, b.tupleid;
  QUERY PLAN
--
 Subquery Scan *SELECT*  (cost=968062444010.30..1088362355018.20
rows=2673331355731 width=180)
   -  GroupAggregate  (cost=968062444010.30..1041579056292.91
rows=2673331355731 width=26)
 -  Sort  (cost=968062444010.30..974745772399.63
rows=2673331355731 width=26)
   Sort Key: a.tableid, a.tupleid, b.tableid, b.tupleid
   -  Merge Join  (cost=16817274.69..160416950669.79
rows=2673331355731 width=26)
 Merge Cond: (a.transactionid = b.transactionid)
 Join Filter: (ROW((a.tableid)::text, a.tupleid) 
ROW((b.tableid)::text, b.tupleid))
 -  Sort  (cost=8408637.34..8534662.95
rows=50410244 width=17)
   Sort Key: a.transactionid
   -  Seq Scan on metarelcloud_transactionlog
a  (cost=0.00..925543.44 rows=50410244 width=17)
 -  Materialize  (cost=8408637.34..9038765.39
rows=50410244 width=17)
   -  Sort  (cost=8408637.34..8534662.95
rows=50410244 width=17)
 Sort Key: b.transactionid
 -  Seq Scan on
metarelcloud_transactionlog b  (cost=0.00..925543.44 rows=50410244
width=17)
(14 rows)

-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance cost of a sort-merge join

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 12:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I've left it running over night. By
 comparison, a select * from metarelcloud_transactionlog order by
 transactionid takes on the order of seconds/minutes (at least in
 MySQL).

 That's got approximately nothing to do with this query.

Isn't that exactly what the leaf sorts are doing? By comparison,
select * from metarelcloud_transactionlog order by transactionid
takes much, much longer in PG (it's been running for 68 minutes now,
and still going, whereas MySQL took 6 minutes).
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

I have the exact same table of data in both MySQL and Postgresql. In Postgresql:

tpcc=# \d metarelcloud_transactionlog
   Table
public.metarelcloud_transactionlog
   Column| Type  |
   Modifiers
-+---+--
 id  | integer   | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
 transactionid   | integer   | not null
 queryid | smallint  | not null
 tableid | character varying(30) | not null
 tupleid | integer   | not null
 querytype   | character varying | not null
 graphpartition  | smallint  |
 replicatedpartition | smallint  |
 justifiedpartition  | smallint  |
 hashpartition   | smallint  |
 nodeid  | integer   |
 manualpartition | smallint  |
Indexes:
metarelcloud_transactionlog_pkey PRIMARY KEY, btree (id)
Check constraints:
metarelcloud_transactionlog_graphpartition_check CHECK
(graphpartition = 0)
metarelcloud_transactionlog_hashpartition_check CHECK (hashpartition = 0)
metarelcloud_transactionlog_justifiedpartition_check CHECK
(justifiedpartition = 0)
metarelcloud_transactionlog_manualpartition_check CHECK
(manualpartition = 0)
metarelcloud_transactionlog_querytype_check CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
metarelcloud_transactionlog_replicatedpartition_check CHECK
(replicatedpartition = 0)

In MySQL:

CREATE TABLE `metarelcloud_transactionlog` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `transactionid` int(11) NOT NULL,
  `queryid` tinyint(4) NOT NULL,
  `tableid` varchar(30) NOT NULL,
  `tupleid` int(11) NOT NULL,
  `querytype` enum('select','insert','delete','update') NOT NULL,
  `graphpartition` tinyint(3) unsigned DEFAULT NULL,
  `replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
  `justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
  `hashpartition` tinyint(3) unsigned DEFAULT NULL,
  `nodeid` int(11) DEFAULT NULL,
  `manualpartition` tinyint(3) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `transactionid` (`transactionid`),
  KEY `tableid` (`tableid`,`tupleid`),
  KEY `nodeid` (`nodeid`)
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

I'm running:

  select * from metarelcloud_transactionlog order by transactionid;

It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 hello

 the speed depends on setting of working_memory. Try to increase a 
 working_memory

 set working_memory to '10MB';

It's already at

tpcc=# show work_mem;
 work_mem
--
 2kB
(1 row)

I also wouldn't have imagined an external merge-sort as being very
memory-intensive--wouldn't it only enough buffer space to read 2x and
write 1x in big-enough chunks for mostly-sequential access?
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 1:30 PM, Richard Broersma
richard.broer...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens frankheik...@mac.com wrote:

 There is no index on the column transactionid in your PostgreSQL-table, as
 there is in your MySQL-table. This explains the difference.

 CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
 (transactionid);

 Does an index help a sort operation in PostgreSQL?

I also share the same doubt. An external merge-sort needs to make
complete passes over the entire dataset, with no index-directed
accesses.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
dal...@solfertje.student.utwente.nl wrote:
 On 22 Feb 2010, at 19:35, Yang Zhang wrote:

 I also wouldn't have imagined an external merge-sort as being very


 Where's that external merge-sort coming from? Can you show an explain analyze?

I just assumed that the Sort in the EXPLAIN output meant an external
merge-sort, given that the table has over 50 million tuples and is
over 3GB, *and* there is no index on the sort key:

tpcc=# explain select * from metarelcloud_transactionlog order by transactionid;
   QUERY PLAN
-
 Sort  (cost=8408637.34..8534662.95 rows=50410244 width=17)
   Sort Key: a.transactionid
   -  Seq Scan on metarelcloud_transactionlog a
(cost=0.00..925543.44 rows=50410244 width=17)
(3 rows)

Anyway, I added the INDEX as suggested by Frank, but it's been 20
minutes and it's still running. With the index, EXPLAIN says:

tpcc=# explain select * from metarelcloud_transactionlog order by transactionid;
   QUERY PLAN
-
 Index Scan using i_transactionid on metarelcloud_transactionlog
(cost=0.00..4453076.81 rows=50410164 width=44)
(1 row)

 If your work-mem is too low there's a good chance that Postgres has to use 
 your disks for sorting, which will obviously be quite slow.

Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
much faster (on the order of several minutes).
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 2:03 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Yang Zhang escribió:

 I'm running:

   select * from metarelcloud_transactionlog order by transactionid;

 It takes MySQL 6 minutes, but Postgresql is still running after 70
 minutes. Is there something like a glaring misconfiguration that I'm
 overlooking? Thanks in advance.

 How large is the table, and have you vacuumed it?  Did you analyze it?
 What Pg version is this?

The table has 50.4M tuples. It's been vacuumed and analyzed. I'm using
PG 8.3.8 on Fedora 10 x86_64.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 2:15 PM, Frank Heikens frankheik...@mac.com wrote:

 Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven:

 On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
 dal...@solfertje.student.utwente.nl wrote:

 On 22 Feb 2010, at 19:35, Yang Zhang wrote:

 I also wouldn't have imagined an external merge-sort as being very


 Where's that external merge-sort coming from? Can you show an explain
 analyze?

 I just assumed that the Sort in the EXPLAIN output meant an external
 merge-sort, given that the table has over 50 million tuples and is
 over 3GB, *and* there is no index on the sort key:

 tpcc=# explain select * from metarelcloud_transactionlog order by
 transactionid;
                                                  QUERY PLAN

 -
 Sort  (cost=8408637.34..8534662.95 rows=50410244 width=17)
  Sort Key: a.transactionid
  -  Seq Scan on metarelcloud_transactionlog a
 (cost=0.00..925543.44 rows=50410244 width=17)
 (3 rows)

 Anyway, I added the INDEX as suggested by Frank, but it's been 20
 minutes and it's still running. With the index, EXPLAIN says:

 tpcc=# explain select * from metarelcloud_transactionlog order by
 transactionid;
                                                  QUERY PLAN

 -
 Index Scan using i_transactionid on metarelcloud_transactionlog
 (cost=0.00..4453076.81 rows=50410164 width=44)
 (1 row)


 Use EXPLAIN ANALYZE to see how the query is executed, gives you more
 details.

As mentioned, this would take a very long time to complete running --
I have not yet seen one successful query completion yet. I'd need to
let it run overnight.



 If your work-mem is too low there's a good chance that Postgres has to
 use your disks for sorting, which will obviously be quite slow.

 Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
 much faster (on the order of several minutes).

 Make sure your index does fit into memory, what's the size of the index?

How might I find out the size and whether it's being fit in memory?
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 2:27 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Yang Zhang escribió:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 I just noticed two things:

 [snip lots of stuff]

 1.

 ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

 You're doing a comparison to MyISAM.

We've actually been using innodb as well; it exhibits similar
execution times to MyISAM.



 2.

   select * from metarelcloud_transactionlog order by transactionid;

 You're reading the whole table.

 This is unlikely to fly very far.  I suggest you try some query that's
 actually going to be used in the real world.

This isn't some microbenchmark. This is part of our actual analytical
application. We're running large-scale graph partitioning algorithms.
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 2:39 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang yanghates...@gmail.com wrote:
 This isn't some microbenchmark. This is part of our actual analytical
 application. We're running large-scale graph partitioning algorithms.

 It's important to see how it runs if you can fit more / most of the
 data set into memory by cranking up work_mem to something really big
 (like a gigabyte or two) and if the query planner can switch to some
 sort of hash algorithm.

We're actually using a very small dataset right now. Being bounded by
memory capacity is not a scalable approach for our application.


 Also, can you cluster the table on transactionid ?


We can, but that's not really addressing the core issue, which matters
to us since the sort itself is only for performing a self merge join
on transactionid, and the *very next step* is a group by a.tableid,
a.tupleid, b.tableid, b.tupleid (i.e. requiring another sort for the
group-agg).
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 Just wondering, are these on the same exact machine?


Yes, on the same disk.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 2:41 PM, Frank Heikens frankheik...@mac.com wrote:

 Op 22 feb 2010, om 20:28 heeft Yang Zhang het volgende geschreven:




 If your work-mem is too low there's a good chance that Postgres has to
 use your disks for sorting, which will obviously be quite slow.

 Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
 much faster (on the order of several minutes).

 Make sure your index does fit into memory, what's the size of the index?

 How might I find out the size and whether it's being fit in memory?

 SELECT pg_size_pretty(pg_relation_size('i_transactionid'));

 pg_size_pretty

 1080 MB
(1 row)
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yang Zhang yanghates...@gmail.com writes:
 On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 the speed depends on setting of working_memory. Try to increase a 
 working_memory

 It's already at
  2kB

 According to your original posting, you're trying to sort something like
 a gigabyte of data.  20MB is peanuts.  I wouldn't recommend increasing
 the value across-the-board, but setting it to several hundred meg for
 this particular query might help.  How much RAM in your machine anyway?

We have 16GB of RAM, but again, Unix sort (and even our own
hand-rolled merge-sort) can operate zippily while avoiding consuming
additional memory.

All the same, we increased work_mem to 1GB, and still the query is not
completing.


 Also, the fact that mysql is faster suggests that having an index does help.
 Possibly the data is nearly ordered by transactionid, in which case an
 indexscan would not have random-access problems and would be much faster
 than an explicit sort.

Note that earlier in the thread I tried running this query with an
index scan, but it's still much slower.
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 5:31 PM, Igor Neyman iney...@perceptron.com wrote:
 When in doubt - test.
 Why not remove index in MySQL (or create index in PostgreSQL) and see
 what happens.
 Why trying compare apples and oranges?

Continue reading this thread -- I also tried using an index in Postgresql.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

nOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe
scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 12:53 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 Just wondering, are these on the same exact machine?


 Yes, on the same disk.

 I'm wondering how much of this could be caching effects.  Is the MySQL
 database warmed up before you started, and the pgsql database is
 cold and no caching has taken place?

 What do things like vmstat 10 say while the query is running on each
 db?  First time, second time, things like that.

Awesome -- this actually led me to discover the problem.

When running the query in MySQL InnoDB:

$ vmstat 10
procs ---memory-- ---swap-- -io --system--
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 0 13 13733604  83020   5648 219388433   936   16821
4  2 89  5  0
 1 12 13749952  80164   5600 21780320 4354   908  4379 3586 2638
0  1 38 60  0
 0 19 13762228  80576   5556 2145220  208 3527  1280  3690 3668 2635
1  1 39 59  0
 0 19 13778632  79420   5560 2135228   52 4186  1046  4191 3682 2418
0  1 37 62  0
 0 19 13792964  77336   5592 2082520   41 3731  1698  3804 4102 2686
1  1 53 45  0
 0 14 13810356  84036   5556 2049836   36 4241   797  4246 3913 2603
0  1 68 31  0
 1 14 13825640  81336   5520 20019200 4212   958  4220 3848 2736
1  1 73 25  0
 0 17 13844952  78036   5476 19769568 4685   923  4689 3832 2547
0  1 69 29  0
 2 13 13863828  79812   5448 19549523 4627   692  4634 3744 2505
0  1 70 28  0
 0 15 13883828  77764   5440 1920528  249 4544   972  4548 4345 2506
0  1 70 28  0
 1 20 13898900  79132   5456 1890192   28 4341   723  4438 4982 3030
0  3 64 33  0
 0 11 13915252  85184   5624 1865260   79 3668   752  3764 4472 2765
0  3 57 40  0
 0 12 13933964  78448   5700 1832640  120 4327  1066  4434 4484 2777
1  3 52 45  0
 0 19 13951748  77640   5816 1795720   94 4005  1159  4091 4580 2762
1  3 48 49  0
 0 16 13972748  79884   5780 17536760 4737   787  4746 4385 2766
1  3 51 45  0
 0 25 13988108  78936   5884 1726068  547 3954  1468  4116 4976 3502
0  4 44 52  0
 1 20 14011500  77676   5868 1689136  161 4980   843  5506 5218 3131
0  3 34 62  0
 0 22 14032472  81348   5816 1647884  270 4198   943  4369 4521 2826
1  3 40 56  0
 0 23 14055220  81712   5804 1626872  193 4774  1408  4856 4617 2754
1  3 38 58  0
 0 21 14075848  81844   5696 15768360 4738   974  4742 4528 2704
1  3 40 56  0
 0 25 14097260  79788   5628 1536336  213 4512   922  4639 4726 2929
1  3 27 69  0
 0 24 14123900  80820   5616 1488460  319 5033  1059  5128 4895 2780
2  3 17 78  0
 1 26 14142632  77276   5660 1445592  445 4605  1434  4727 5401 3364
1  3 16 79  0
 0 31 14165668  83736   5976 1387048  362 4288  1444  4428 4739 2963
2  3 17 78  0
 1 28 14180104  77564   6324 1369232  387 4526    4677 5748 3559
1  3 16 80  0

I'm guessing the swap numbers are because MySQL uses mmap?

Anyway, when running the query in Postgresql:

$ vmstat 10
procs ---memory-- ---swap-- -io --system--
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 1  2 13866336 1574540  25024 787898033   936   16821
4  2 89  5  0
 1  3 13861520 1163596  25328 8128360 10460 25429   433 4368 4175
4  2 80 14  0
 0  3 13856484 803024  25600 8343220 11170 22062   688 4492 4590
4  2 73 20  0
 0  6 13855304 678868  26052 8435540  1600  9239   598 5195 7141
1  5 70 24  0
 0  6 13853644 513568  26332 8563680  4010 12480  7100 4775 4248
3  3 68 26  0
 2  2 13851804 166228  26624 8775304  6340 21466  1497 4680 4550
6  2 64 28  0
 0  5 13861556  81896  26740 8825360  860 3547  6100  3847 5142 3386
6  2 57 35  0
 0  6 13867688  91368  26808 8832712  653 3326  1835  3604 4738 2762
5  2 61 32  0
 0  5 13870676  88524  26872 8849392  638 3272  2578  3517 4864 2909
4  2 55 39  0
 0  5 13872748  79512  27004 8864456  629 1788  2086  2949 4337 2921
1  3 55 41  0
 0  7 13876760  83124  27136 8867272 1018 2253  1713  2409 4321 2889
0  3 63 33  0
 0  6 13878964  82876  27240 8874540  792 2119  1854  2314 4288 2813
2  2 72 24  0
 3  4 13883204  81224  27280 8887068  661 3067  2995  3385 4558 2899
4  2 72 22  0
 0  6 13886636  82036  27352 8905628  594 3726  2628  4013 4744 2765
4  2 69 25  0
 0  8 13899852  85604  27400 8925800  638 4423  2689  4658 4903 2808
4  2 55 40  0
 1  4 13905184  80544  27484 8940040  676 3501  3006  3799 4805 2932
4  2 66 28  0
 0  9 13908480  80100  27516 8948476  668 2996  1720  3192 4594 2799
4  2 60 35  0

vmstat showed no swapping-out for a while, and then suddenly it
started spilling a lot. Checking psql's memory stats showed that it
was huge -- apparently, it's trying

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Mon, Feb 22, 2010 at 9:30 PM, Alex Hunsaker bada...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 11:10, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In ? 
 Postgresql:

 FWIW on a stock (unchanged postgresql.conf) 8.3.9 I get (best of 3
 runs) 79 seconds, 26 using an index and 27 seconds with it clustered.
 Now yes it goes a lot faster because im skipping all the overhead of
 sending the data to the client...

Last sentence also contributed to my realizing the problem (the client
I was using was psql), but there's one oddity

 # select count(1) from (SELECT * from metarelcould_transactionlog
 order by transactionid) as foo;

Does it strike anyone else that the query optimizer/rewriter should be
able to toss out the sort from such a query altogether?
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang

On Tue, Feb 23, 2010 at 1:48 AM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 10:51 PM, Yang Zhang yanghates...@gmail.com wrote:
 nOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe
 scott.marl...@gmail.com wrote:

 What do things like vmstat 10 say while the query is running on each
 db?  First time, second time, things like that.

 Awesome -- this actually led me to discover the problem.

 vmstat showed no swapping-out for a while, and then suddenly it
 started spilling a lot. Checking psql's memory stats showed that it
 was huge -- apparently, it's trying to store its full result set in
 memory. As soon as I added a LIMIT 1, everything worked
 beautifully and finished in 4m (I verified that the planner was still
 issuing a Sort).

 I'm relieved that Postgresql itself does not, in fact, suck, but
 slightly disappointed in the behavior of psql. I suppose it needs to
 buffer everything in memory to properly format its tabular output,
 among other possible reasons I could imagine.

 It's best when working with big sets to do so with a cursor and fetch
 a few thousand rows at a time.  It's how we handle really big sets at
 work and it works like a charm in keeping the client from bogging down
 with a huge memory footprint.


Thing is, this is how I got here:

- ran complex query that does SELECT INTO.
- that never terminated, so killed it and tried a simpler SELECT (the
subject of this thread) from psql to see how long that would take.

I.e., my original application doesn't receive the entire dataset.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

72 matches

Mail list logo