Re: [GENERAL] Sorting performance vs. MySQL

2010-02-23 Thread Greg Stark
On Tue, Feb 23, 2010 at 6:48 AM, Scott Marlowe scott.marl...@gmail.com wrote:
 I'm relieved that Postgresql itself does not, in fact, suck, but
 slightly disappointed in the behavior of psql. I suppose it needs to
 buffer everything in memory to properly format its tabular output,
 among other possible reasons I could imagine.

 It's best when working with big sets to do so with a cursor and fetch
 a few thousand rows at a time.  It's how we handle really big sets at
 work and it works like a charm in keeping the client from bogging down
 with a huge memory footprint.


You can do \set FETCH_COUNT to have psql use a cursor automatically.


-- 
greg

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-23 Thread John Gage
I am under the impression that MySQL does not have anything resembling  
Postgres' support for regular expressions.  Though some might think  
that regular expressions are a sort of poor man's SQL, in any  
application which manages large amounts of text they are crucial.   
Postgres definitely does not suck.


Is this the authoratative webpage for Snowball (which I never  
realized was a play on Snobol)?


http://snowball.tartarus.org/

Thanks,

John


On Feb 23, 2010, at 6:51 AM, Yang Zhang wrote:


I'm relieved that Postgresql itself does not, in fact, suck,



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-23 Thread Yeb Havinga

Greg Stark wrote:


You can do \set FETCH_COUNT to have psql use a cursor automatically.
  
It seems like a big win in this case. What would be the downside of 
having a fetch_count set default in psql?


regards
Yeb Havinga





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-23 Thread Greg Stark
On Tue, Feb 23, 2010 at 9:48 AM, Yeb Havinga yebhavi...@gmail.com wrote:
 Greg Stark wrote:

 You can do \set FETCH_COUNT to have psql use a cursor automatically.


 It seems like a big win in this case. What would be the downside of having a
 fetch_count set default in psql?

They were mentioned previously in this thread:

1) If an error occurs partway through the execution of the query you
might receive part of the result set.

2) psql won't be able to align the columns properly

-- 
greg

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-23 Thread Alex Hunsaker
On Tue, Feb 23, 2010 at 00:02, Yang Zhang yanghates...@gmail.com wrote:
 Thing is, this is how I got here:

 - ran complex query that does SELECT INTO.
 - that never terminated, so killed it and tried a simpler SELECT (the
 subject of this thread) from psql to see how long that would take.

You might have better luck if you paste the EXPLAIN ANALYZE of the
SELECT INTO here (Maybe a new thread? Maybe on -performance? use your
judgement...).  But I bet if its doing something with transactionid
like your straight select was, an index would help.  If you are just
using SELECT INTO to copy all of the data into a new table... COPY
might be faster or CREATE TABLE AS.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-23 Thread Baron Schwartz
Hi,

On Tue, Feb 23, 2010 at 12:51 AM, Yang Zhang yanghates...@gmail.com wrote:
 When running the query in MySQL InnoDB:

 $ vmstat 10
 procs ---memory-- ---swap-- -io --system--
 -cpu--
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa 
 st
  0 13 13733604  83020   5648 2193884    3    3   936   168    2    1
 4  2 89  5  0
  1 12 13749952  80164   5600 2178032    0 4354   908  4379 3586 2638
 0  1 38 60  0
  0 19 13762228  80576   5556 2145220  208 3527  1280  3690 3668 2635
 1  1 39 59  0
  0 19 13778632  79420   5560 2135228   52 4186  1046  4191 3682 2418
 0  1 37 62  0
[snip]
 I'm guessing the swap numbers are because MySQL uses mmap?

InnoDB doesn't use mmap.

Baron

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
I have the exact same table of data in both MySQL and Postgresql. In Postgresql:

tpcc=# \d metarelcloud_transactionlog
   Table
public.metarelcloud_transactionlog
   Column| Type  |
   Modifiers
-+---+--
 id  | integer   | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
 transactionid   | integer   | not null
 queryid | smallint  | not null
 tableid | character varying(30) | not null
 tupleid | integer   | not null
 querytype   | character varying | not null
 graphpartition  | smallint  |
 replicatedpartition | smallint  |
 justifiedpartition  | smallint  |
 hashpartition   | smallint  |
 nodeid  | integer   |
 manualpartition | smallint  |
Indexes:
metarelcloud_transactionlog_pkey PRIMARY KEY, btree (id)
Check constraints:
metarelcloud_transactionlog_graphpartition_check CHECK
(graphpartition = 0)
metarelcloud_transactionlog_hashpartition_check CHECK (hashpartition = 0)
metarelcloud_transactionlog_justifiedpartition_check CHECK
(justifiedpartition = 0)
metarelcloud_transactionlog_manualpartition_check CHECK
(manualpartition = 0)
metarelcloud_transactionlog_querytype_check CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
metarelcloud_transactionlog_replicatedpartition_check CHECK
(replicatedpartition = 0)

In MySQL:

CREATE TABLE `metarelcloud_transactionlog` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `transactionid` int(11) NOT NULL,
  `queryid` tinyint(4) NOT NULL,
  `tableid` varchar(30) NOT NULL,
  `tupleid` int(11) NOT NULL,
  `querytype` enum('select','insert','delete','update') NOT NULL,
  `graphpartition` tinyint(3) unsigned DEFAULT NULL,
  `replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
  `justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
  `hashpartition` tinyint(3) unsigned DEFAULT NULL,
  `nodeid` int(11) DEFAULT NULL,
  `manualpartition` tinyint(3) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `transactionid` (`transactionid`),
  KEY `tableid` (`tableid`,`tupleid`),
  KEY `nodeid` (`nodeid`)
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

I'm running:

  select * from metarelcloud_transactionlog order by transactionid;

It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Pavel Stehule
hello

the speed depends on setting of working_memory. Try to increase a working_memory

set working_memory to '10MB';

Regards
Pavel Stehule

2010/2/22 Yang Zhang yanghates...@gmail.com:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 tpcc=# \d metarelcloud_transactionlog
                                       Table
 public.metarelcloud_transactionlog
       Column        |         Type          |
       Modifiers
 -+---+--
  id                  | integer               | not null default
 nextval('metarelcloud_transactionlog_id_seq'::regclass)
  transactionid       | integer               | not null
  queryid             | smallint              | not null
  tableid             | character varying(30) | not null
  tupleid             | integer               | not null
  querytype           | character varying     | not null
  graphpartition      | smallint              |
  replicatedpartition | smallint              |
  justifiedpartition  | smallint              |
  hashpartition       | smallint              |
  nodeid              | integer               |
  manualpartition     | smallint              |
 Indexes:
    metarelcloud_transactionlog_pkey PRIMARY KEY, btree (id)
 Check constraints:
    metarelcloud_transactionlog_graphpartition_check CHECK
 (graphpartition = 0)
    metarelcloud_transactionlog_hashpartition_check CHECK (hashpartition = 
 0)
    metarelcloud_transactionlog_justifiedpartition_check CHECK
 (justifiedpartition = 0)
    metarelcloud_transactionlog_manualpartition_check CHECK
 (manualpartition = 0)
    metarelcloud_transactionlog_querytype_check CHECK
 (querytype::text = ANY (ARRAY['select'::character varying,
 'insert'::character varying, 'delete'::character varying,
 'update'::character varying]::text[]))
    metarelcloud_transactionlog_replicatedpartition_check CHECK
 (replicatedpartition = 0)

 In MySQL:

 CREATE TABLE `metarelcloud_transactionlog` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `transactionid` int(11) NOT NULL,
  `queryid` tinyint(4) NOT NULL,
  `tableid` varchar(30) NOT NULL,
  `tupleid` int(11) NOT NULL,
  `querytype` enum('select','insert','delete','update') NOT NULL,
  `graphpartition` tinyint(3) unsigned DEFAULT NULL,
  `replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
  `justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
  `hashpartition` tinyint(3) unsigned DEFAULT NULL,
  `nodeid` int(11) DEFAULT NULL,
  `manualpartition` tinyint(3) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `transactionid` (`transactionid`),
  KEY `tableid` (`tableid`,`tupleid`),
  KEY `nodeid` (`nodeid`)
 ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

 I'm running:

  select * from metarelcloud_transactionlog order by transactionid;

 It takes MySQL 6 minutes, but Postgresql is still running after 70
 minutes. Is there something like a glaring misconfiguration that I'm
 overlooking? Thanks in advance.
 --
 Yang Zhang
 http://www.mit.edu/~y_z/

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Frank Heikens
There is no index on the column transactionid in your PostgreSQL- 
table, as there is in your MySQL-table. This explains the difference.


CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog  
(transactionid);



Op 22 feb 2010, om 19:10 heeft Yang Zhang het volgende geschreven:

I have the exact same table of data in both MySQL and Postgresql. In  
Postgresql:


tpcc=# \d metarelcloud_transactionlog
  Table
public.metarelcloud_transactionlog
  Column| Type  |
  Modifiers
-+--- 
+ 
--

id  | integer   | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
transactionid   | integer   | not null
queryid | smallint  | not null
tableid | character varying(30) | not null
tupleid | integer   | not null
querytype   | character varying | not null
graphpartition  | smallint  |
replicatedpartition | smallint  |
justifiedpartition  | smallint  |
hashpartition   | smallint  |
nodeid  | integer   |
manualpartition | smallint  |
Indexes:
   metarelcloud_transactionlog_pkey PRIMARY KEY, btree (id)
Check constraints:
   metarelcloud_transactionlog_graphpartition_check CHECK
(graphpartition = 0)
   metarelcloud_transactionlog_hashpartition_check CHECK  
(hashpartition = 0)

   metarelcloud_transactionlog_justifiedpartition_check CHECK
(justifiedpartition = 0)
   metarelcloud_transactionlog_manualpartition_check CHECK
(manualpartition = 0)
   metarelcloud_transactionlog_querytype_check CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
   metarelcloud_transactionlog_replicatedpartition_check CHECK
(replicatedpartition = 0)

In MySQL:

CREATE TABLE `metarelcloud_transactionlog` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `transactionid` int(11) NOT NULL,
 `queryid` tinyint(4) NOT NULL,
 `tableid` varchar(30) NOT NULL,
 `tupleid` int(11) NOT NULL,
 `querytype` enum('select','insert','delete','update') NOT NULL,
 `graphpartition` tinyint(3) unsigned DEFAULT NULL,
 `replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
 `justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
 `hashpartition` tinyint(3) unsigned DEFAULT NULL,
 `nodeid` int(11) DEFAULT NULL,
 `manualpartition` tinyint(3) unsigned DEFAULT NULL,
 PRIMARY KEY (`id`),
 KEY `transactionid` (`transactionid`),
 KEY `tableid` (`tableid`,`tupleid`),
 KEY `nodeid` (`nodeid`)
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

I'm running:

 select * from metarelcloud_transactionlog order by transactionid;

It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
--
Yang Zhang
http://www.mit.edu/~y_z/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Frank Heikens
frankheik...@mac.com




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Richard Broersma
On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens frankheik...@mac.com wrote:

 There is no index on the column transactionid in your PostgreSQL-table, as
 there is in your MySQL-table. This explains the difference.

 CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
 (transactionid);

Does an index help a sort operation in PostgreSQL?


-- 
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 hello

 the speed depends on setting of working_memory. Try to increase a 
 working_memory

 set working_memory to '10MB';

It's already at

tpcc=# show work_mem;
 work_mem
--
 2kB
(1 row)

I also wouldn't have imagined an external merge-sort as being very
memory-intensive--wouldn't it only enough buffer space to read 2x and
write 1x in big-enough chunks for mostly-sequential access?
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 1:30 PM, Richard Broersma
richard.broer...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens frankheik...@mac.com wrote:

 There is no index on the column transactionid in your PostgreSQL-table, as
 there is in your MySQL-table. This explains the difference.

 CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
 (transactionid);

 Does an index help a sort operation in PostgreSQL?

I also share the same doubt. An external merge-sort needs to make
complete passes over the entire dataset, with no index-directed
accesses.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Frank Heikens


Op 22 feb 2010, om 19:30 heeft Richard Broersma het volgende geschreven:

On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens  
frankheik...@mac.com wrote:


There is no index on the column transactionid in your PostgreSQL- 
table, as

there is in your MySQL-table. This explains the difference.

CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
(transactionid);


Does an index help a sort operation in PostgreSQL?


Yes it does, see the manual: 
http://www.postgresql.org/docs/8.4/interactive/indexes-ordering.html

Example without index:
Sort  (cost=804.39..829.39 rows=1 width=4) (actual  
time=16.006..17.171 rows=1 loops=1)

  Sort Key: bar
  Sort Method:  quicksort  Memory: 491kB
  -  Seq Scan on bla  (cost=0.00..140.00 rows=1 width=4) (actual  
time=0.015..2.236 rows=1 loops=1)

Total runtime: 18.098 ms

Same query with index (btree):
Index Scan Backward using i_bar on bla  (cost=0.00..406.25 rows=1  
width=4) (actual time=0.093..4.408 rows=1 loops=1)

Total runtime: 5.381 ms




--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug



Regards,
Frank Heikens




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Alban Hertroys
On 22 Feb 2010, at 19:35, Yang Zhang wrote:

 I also wouldn't have imagined an external merge-sort as being very


Where's that external merge-sort coming from? Can you show an explain analyze?

If your work-mem is too low there's a good chance that Postgres has to use your 
disks for sorting, which will obviously be quite slow.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4b82d18510442035320951!



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Alvaro Herrera
Yang Zhang escribió:

 I'm running:
 
   select * from metarelcloud_transactionlog order by transactionid;
 
 It takes MySQL 6 minutes, but Postgresql is still running after 70
 minutes. Is there something like a glaring misconfiguration that I'm
 overlooking? Thanks in advance.

How large is the table, and have you vacuumed it?  Did you analyze it?
What Pg version is this?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
dal...@solfertje.student.utwente.nl wrote:
 On 22 Feb 2010, at 19:35, Yang Zhang wrote:

 I also wouldn't have imagined an external merge-sort as being very


 Where's that external merge-sort coming from? Can you show an explain analyze?

I just assumed that the Sort in the EXPLAIN output meant an external
merge-sort, given that the table has over 50 million tuples and is
over 3GB, *and* there is no index on the sort key:

tpcc=# explain select * from metarelcloud_transactionlog order by transactionid;
   QUERY PLAN
-
 Sort  (cost=8408637.34..8534662.95 rows=50410244 width=17)
   Sort Key: a.transactionid
   -  Seq Scan on metarelcloud_transactionlog a
(cost=0.00..925543.44 rows=50410244 width=17)
(3 rows)

Anyway, I added the INDEX as suggested by Frank, but it's been 20
minutes and it's still running. With the index, EXPLAIN says:

tpcc=# explain select * from metarelcloud_transactionlog order by transactionid;
   QUERY PLAN
-
 Index Scan using i_transactionid on metarelcloud_transactionlog
(cost=0.00..4453076.81 rows=50410164 width=44)
(1 row)

 If your work-mem is too low there's a good chance that Postgres has to use 
 your disks for sorting, which will obviously be quite slow.

Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
much faster (on the order of several minutes).
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 2:03 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Yang Zhang escribió:

 I'm running:

   select * from metarelcloud_transactionlog order by transactionid;

 It takes MySQL 6 minutes, but Postgresql is still running after 70
 minutes. Is there something like a glaring misconfiguration that I'm
 overlooking? Thanks in advance.

 How large is the table, and have you vacuumed it?  Did you analyze it?
 What Pg version is this?

The table has 50.4M tuples. It's been vacuumed and analyzed. I'm using
PG 8.3.8 on Fedora 10 x86_64.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Frank Heikens


Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven:


On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
dal...@solfertje.student.utwente.nl wrote:

On 22 Feb 2010, at 19:35, Yang Zhang wrote:


I also wouldn't have imagined an external merge-sort as being very



Where's that external merge-sort coming from? Can you show an  
explain analyze?


I just assumed that the Sort in the EXPLAIN output meant an external
merge-sort, given that the table has over 50 million tuples and is
over 3GB, *and* there is no index on the sort key:

tpcc=# explain select * from metarelcloud_transactionlog order by  
transactionid;

  QUERY PLAN
-
Sort  (cost=8408637.34..8534662.95 rows=50410244 width=17)
  Sort Key: a.transactionid
  -  Seq Scan on metarelcloud_transactionlog a
(cost=0.00..925543.44 rows=50410244 width=17)
(3 rows)

Anyway, I added the INDEX as suggested by Frank, but it's been 20
minutes and it's still running. With the index, EXPLAIN says:

tpcc=# explain select * from metarelcloud_transactionlog order by  
transactionid;

  QUERY PLAN
-
Index Scan using i_transactionid on metarelcloud_transactionlog
(cost=0.00..4453076.81 rows=50410164 width=44)
(1 row)



Use EXPLAIN ANALYZE to see how the query is executed, gives you more  
details.



If your work-mem is too low there's a good chance that Postgres has  
to use your disks for sorting, which will obviously be quite slow.


Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
much faster (on the order of several minutes).


Make sure your index does fit into memory, what's the size of the index?



--
Yang Zhang
http://www.mit.edu/~y_z/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Frank Heikens




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 2:15 PM, Frank Heikens frankheik...@mac.com wrote:

 Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven:

 On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
 dal...@solfertje.student.utwente.nl wrote:

 On 22 Feb 2010, at 19:35, Yang Zhang wrote:

 I also wouldn't have imagined an external merge-sort as being very


 Where's that external merge-sort coming from? Can you show an explain
 analyze?

 I just assumed that the Sort in the EXPLAIN output meant an external
 merge-sort, given that the table has over 50 million tuples and is
 over 3GB, *and* there is no index on the sort key:

 tpcc=# explain select * from metarelcloud_transactionlog order by
 transactionid;
                                                  QUERY PLAN

 -
 Sort  (cost=8408637.34..8534662.95 rows=50410244 width=17)
  Sort Key: a.transactionid
  -  Seq Scan on metarelcloud_transactionlog a
 (cost=0.00..925543.44 rows=50410244 width=17)
 (3 rows)

 Anyway, I added the INDEX as suggested by Frank, but it's been 20
 minutes and it's still running. With the index, EXPLAIN says:

 tpcc=# explain select * from metarelcloud_transactionlog order by
 transactionid;
                                                  QUERY PLAN

 -
 Index Scan using i_transactionid on metarelcloud_transactionlog
 (cost=0.00..4453076.81 rows=50410164 width=44)
 (1 row)


 Use EXPLAIN ANALYZE to see how the query is executed, gives you more
 details.

As mentioned, this would take a very long time to complete running --
I have not yet seen one successful query completion yet. I'd need to
let it run overnight.



 If your work-mem is too low there's a good chance that Postgres has to
 use your disks for sorting, which will obviously be quite slow.

 Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
 much faster (on the order of several minutes).

 Make sure your index does fit into memory, what's the size of the index?

How might I find out the size and whether it's being fit in memory?
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Alvaro Herrera
Yang Zhang escribió:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

I just noticed two things:

[snip lots of stuff]

1.

 ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

You're doing a comparison to MyISAM.


2.

   select * from metarelcloud_transactionlog order by transactionid;

You're reading the whole table.

This is unlikely to fly very far.  I suggest you try some query that's
actually going to be used in the real world.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 2:27 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Yang Zhang escribió:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 I just noticed two things:

 [snip lots of stuff]

 1.

 ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

 You're doing a comparison to MyISAM.

We've actually been using innodb as well; it exhibits similar
execution times to MyISAM.



 2.

   select * from metarelcloud_transactionlog order by transactionid;

 You're reading the whole table.

 This is unlikely to fly very far.  I suggest you try some query that's
 actually going to be used in the real world.

This isn't some microbenchmark. This is part of our actual analytical
application. We're running large-scale graph partitioning algorithms.
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang yanghates...@gmail.com wrote:
 This isn't some microbenchmark. This is part of our actual analytical
 application. We're running large-scale graph partitioning algorithms.

It's important to see how it runs if you can fit more / most of the
data set into memory by cranking up work_mem to something really big
(like a gigabyte or two) and if the query planner can switch to some
sort of hash algorithm.

Also, can you cluster the table on transactionid ?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Frank Heikens


Op 22 feb 2010, om 20:28 heeft Yang Zhang het volgende geschreven:






If your work-mem is too low there's a good chance that Postgres  
has to

use your disks for sorting, which will obviously be quite slow.


Relative to the non-terminating 80-minute-so-far sort, Unix sort  
runs

much faster (on the order of several minutes).


Make sure your index does fit into memory, what's the size of the  
index?


How might I find out the size and whether it's being fit in memory?


SELECT pg_size_pretty(pg_relation_size('i_transactionid'));



--
Yang Zhang
http://www.mit.edu/~y_z/

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Frank Heikens




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 2:39 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang yanghates...@gmail.com wrote:
 This isn't some microbenchmark. This is part of our actual analytical
 application. We're running large-scale graph partitioning algorithms.

 It's important to see how it runs if you can fit more / most of the
 data set into memory by cranking up work_mem to something really big
 (like a gigabyte or two) and if the query planner can switch to some
 sort of hash algorithm.

We're actually using a very small dataset right now. Being bounded by
memory capacity is not a scalable approach for our application.


 Also, can you cluster the table on transactionid ?


We can, but that's not really addressing the core issue, which matters
to us since the sort itself is only for performing a self merge join
on transactionid, and the *very next step* is a group by a.tableid,
a.tupleid, b.tableid, b.tupleid (i.e. requiring another sort for the
group-agg).
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

Just wondering, are these on the same exact machine?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 Just wondering, are these on the same exact machine?


Yes, on the same disk.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 12:50 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 2:39 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang yanghates...@gmail.com wrote:
 This isn't some microbenchmark. This is part of our actual analytical
 application. We're running large-scale graph partitioning algorithms.

 It's important to see how it runs if you can fit more / most of the
 data set into memory by cranking up work_mem to something really big
 (like a gigabyte or two) and if the query planner can switch to some
 sort of hash algorithm.

 We're actually using a very small dataset right now. Being bounded by
 memory capacity is not a scalable approach for our application.

But the more you can fit into work_mem the faster it will go anyway.
So it's still worth a try.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 2:41 PM, Frank Heikens frankheik...@mac.com wrote:

 Op 22 feb 2010, om 20:28 heeft Yang Zhang het volgende geschreven:




 If your work-mem is too low there's a good chance that Postgres has to
 use your disks for sorting, which will obviously be quite slow.

 Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
 much faster (on the order of several minutes).

 Make sure your index does fit into memory, what's the size of the index?

 How might I find out the size and whether it's being fit in memory?

 SELECT pg_size_pretty(pg_relation_size('i_transactionid'));

 pg_size_pretty

 1080 MB
(1 row)
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yeb Havinga

Scott Marlowe wrote:

On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote:
  

I have the exact same table of data in both MySQL and Postgresql. In Postgresql:



Just wondering, are these on the same exact machine?

  
Just reading up on this interesting thread. WFIW, 2 years ago I and a 
collegue of mine did a hardware comparison of early Intel and AMD 
desktop quadcore processors to run postgres database, with most other 
parts comparable. The intel processor was 20 to 30 % faster in cpu 
operations to the (first generation) Phenom at almost everything, except 
at index creation. The test that the AMD finished in a few minutes, we 
had to stop on the Intel because it simply didn't finish. We double 
checked configuration settings and could not find explainable 
differences. I hesitate to post this information here, because its hard 
to believe that an actual big difference between the processors exists, 
and it more likely was something in our test setup. Still: the 
difference was *only* in index creation, which is kindoff like a qsort yes?


egards
Yeb Havinga





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Tom Lane
Yang Zhang yanghates...@gmail.com writes:
 On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 the speed depends on setting of working_memory. Try to increase a 
 working_memory

 It's already at
  2kB

According to your original posting, you're trying to sort something like
a gigabyte of data.  20MB is peanuts.  I wouldn't recommend increasing
the value across-the-board, but setting it to several hundred meg for
this particular query might help.  How much RAM in your machine anyway?

Also, the fact that mysql is faster suggests that having an index does help.
Possibly the data is nearly ordered by transactionid, in which case an
indexscan would not have random-access problems and would be much faster
than an explicit sort.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 12:53 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 Just wondering, are these on the same exact machine?


 Yes, on the same disk.

I'm wondering how much of this could be caching effects.  Is the MySQL
database warmed up before you started, and the pgsql database is
cold and no caching has taken place?

What do things like vmstat 10 say while the query is running on each
db?  First time, second time, things like that.

Also, just curios, what's shared_buffers set to on the pgsql instance?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 3:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yang Zhang yanghates...@gmail.com writes:
 On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 the speed depends on setting of working_memory. Try to increase a 
 working_memory

 It's already at
  2kB

 According to your original posting, you're trying to sort something like
 a gigabyte of data.  20MB is peanuts.  I wouldn't recommend increasing
 the value across-the-board, but setting it to several hundred meg for
 this particular query might help.  How much RAM in your machine anyway?

We have 16GB of RAM, but again, Unix sort (and even our own
hand-rolled merge-sort) can operate zippily while avoiding consuming
additional memory.

All the same, we increased work_mem to 1GB, and still the query is not
completing.


 Also, the fact that mysql is faster suggests that having an index does help.
 Possibly the data is nearly ordered by transactionid, in which case an
 indexscan would not have random-access problems and would be much faster
 than an explicit sort.

Note that earlier in the thread I tried running this query with an
index scan, but it's still much slower.
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Tom Lane
Yeb Havinga yebhavi...@gmail.com writes:
 Just reading up on this interesting thread. WFIW, 2 years ago I and a 
 collegue of mine did a hardware comparison of early Intel and AMD 
 desktop quadcore processors to run postgres database, with most other 
 parts comparable. The intel processor was 20 to 30 % faster in cpu 
 operations to the (first generation) Phenom at almost everything, except 
 at index creation. The test that the AMD finished in a few minutes, we 
 had to stop on the Intel because it simply didn't finish. We double 
 checked configuration settings and could not find explainable 
 differences. I hesitate to post this information here, because its hard 
 to believe that an actual big difference between the processors exists, 
 and it more likely was something in our test setup. Still: the 
 difference was *only* in index creation, which is kindoff like a qsort yes?

Interesting.  Yes, btree index creation is essentially a sort ...

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Igor Neyman
When in doubt - test.
Why not remove index in MySQL (or create index in PostgreSQL) and see
what happens.
Why trying compare apples and oranges?

Igor Neyman 

 -Original Message-
 From: Yang Zhang [mailto:yanghates...@gmail.com] 
 Sent: Monday, February 22, 2010 1:37 PM
 To: Richard Broersma
 Cc: Frank Heikens; pgsql-general@postgresql.org
 Subject: Re: Sorting performance vs. MySQL
 
 On Mon, Feb 22, 2010 at 1:30 PM, Richard Broersma 
 richard.broer...@gmail.com wrote:
  On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens 
 frankheik...@mac.com wrote:
 
  There is no index on the column transactionid in your 
  PostgreSQL-table, as there is in your MySQL-table. This 
 explains the difference.
 
  CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
  (transactionid);
 
  Does an index help a sort operation in PostgreSQL?
 
 I also share the same doubt. An external merge-sort needs to 
 make complete passes over the entire dataset, with no 
 index-directed accesses.
 --
 Yang Zhang
 http://www.mit.edu/~y_z/
 

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 5:31 PM, Igor Neyman iney...@perceptron.com wrote:
 When in doubt - test.
 Why not remove index in MySQL (or create index in PostgreSQL) and see
 what happens.
 Why trying compare apples and oranges?

Continue reading this thread -- I also tried using an index in Postgresql.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 3:33 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 5:31 PM, Igor Neyman iney...@perceptron.com wrote:
 When in doubt - test.
 Why not remove index in MySQL (or create index in PostgreSQL) and see
 what happens.
 Why trying compare apples and oranges?

 Continue reading this thread -- I also tried using an index in Postgresql.

But have you tried cranking up work_mem to say 1G?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 3:33 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 5:31 PM, Igor Neyman iney...@perceptron.com wrote:
 When in doubt - test.
 Why not remove index in MySQL (or create index in PostgreSQL) and see
 what happens.
 Why trying compare apples and oranges?

 Continue reading this thread -- I also tried using an index in Postgresql.

And oh yeah, what was shared_buffers set to?  I'm not sure we ever got
an answer to that.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Alex Hunsaker
On Mon, Feb 22, 2010 at 11:10, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In ? 
 Postgresql:

FWIW on a stock (unchanged postgresql.conf) 8.3.9 I get (best of 3
runs) 79 seconds, 26 using an index and 27 seconds with it clustered.
Now yes it goes a lot faster because im skipping all the overhead of
sending the data to the client...  But still that means it has almost
nothing with the sort or indexs.  Leaves pretty much your cpu, disk,
filesystem and network...  BTW the first time I did it it had to write
out the hint bits so that took (a bit) longer... Is this freshly
loaded data?

---

# create table metarelcould_transactionlog (
id serial primary key,
transactionid integer not null,
queryid smallint not null,
tableid varchar(30) not null,
tupleid integer not null,
querytype varchar not null,
graphpartition smallint,
replicatedpartition smallint,
justifiedpartition smallint,
hashpartition smallint,
modeid integer,
manualpartition smallint
);

# insert into metarelcould_transactionlog (transactionid, queryid,
tableid, tupleid, querytype, graphpartition, replicatedpartition,
justifiedpartition, hashpartition, modeid, manualpartition) select
foo,  1, 'sometable', 1, 's', 1, 1, 1, 1, 1, 1 from generate_series(1,
5000) as foo;

# select count(1) from (SELECT * from metarelcould_transactionlog
order by transactionid) as foo;
  count
--
 5000
(1 row)

Time: 79017.186 ms

# create index idx on metarelcould_transactionlog (transactionid);
# select count(1) from (SELECT * from metarelcould_transactionlog
order by transactionid) as foo;
  count
--
 5000
(1 row)

Time: 26230.534 ms

# cluster metarelcould_transactionlog USING  idx;
CLUSTER
Time: 342381.535 ms

# select count(1) from (SELECT * from metarelcould_transactionlog
order by transactionid) as foo;
  count
--
 5000
(1 row)

Time: 27704.794 ms

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
nOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe
scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 12:53 PM, Yang Zhang yanghates...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In 
 Postgresql:

 Just wondering, are these on the same exact machine?


 Yes, on the same disk.

 I'm wondering how much of this could be caching effects.  Is the MySQL
 database warmed up before you started, and the pgsql database is
 cold and no caching has taken place?

 What do things like vmstat 10 say while the query is running on each
 db?  First time, second time, things like that.

Awesome -- this actually led me to discover the problem.

When running the query in MySQL InnoDB:

$ vmstat 10
procs ---memory-- ---swap-- -io --system--
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 0 13 13733604  83020   5648 219388433   936   16821
4  2 89  5  0
 1 12 13749952  80164   5600 21780320 4354   908  4379 3586 2638
0  1 38 60  0
 0 19 13762228  80576   5556 2145220  208 3527  1280  3690 3668 2635
1  1 39 59  0
 0 19 13778632  79420   5560 2135228   52 4186  1046  4191 3682 2418
0  1 37 62  0
 0 19 13792964  77336   5592 2082520   41 3731  1698  3804 4102 2686
1  1 53 45  0
 0 14 13810356  84036   5556 2049836   36 4241   797  4246 3913 2603
0  1 68 31  0
 1 14 13825640  81336   5520 20019200 4212   958  4220 3848 2736
1  1 73 25  0
 0 17 13844952  78036   5476 19769568 4685   923  4689 3832 2547
0  1 69 29  0
 2 13 13863828  79812   5448 19549523 4627   692  4634 3744 2505
0  1 70 28  0
 0 15 13883828  77764   5440 1920528  249 4544   972  4548 4345 2506
0  1 70 28  0
 1 20 13898900  79132   5456 1890192   28 4341   723  4438 4982 3030
0  3 64 33  0
 0 11 13915252  85184   5624 1865260   79 3668   752  3764 4472 2765
0  3 57 40  0
 0 12 13933964  78448   5700 1832640  120 4327  1066  4434 4484 2777
1  3 52 45  0
 0 19 13951748  77640   5816 1795720   94 4005  1159  4091 4580 2762
1  3 48 49  0
 0 16 13972748  79884   5780 17536760 4737   787  4746 4385 2766
1  3 51 45  0
 0 25 13988108  78936   5884 1726068  547 3954  1468  4116 4976 3502
0  4 44 52  0
 1 20 14011500  77676   5868 1689136  161 4980   843  5506 5218 3131
0  3 34 62  0
 0 22 14032472  81348   5816 1647884  270 4198   943  4369 4521 2826
1  3 40 56  0
 0 23 14055220  81712   5804 1626872  193 4774  1408  4856 4617 2754
1  3 38 58  0
 0 21 14075848  81844   5696 15768360 4738   974  4742 4528 2704
1  3 40 56  0
 0 25 14097260  79788   5628 1536336  213 4512   922  4639 4726 2929
1  3 27 69  0
 0 24 14123900  80820   5616 1488460  319 5033  1059  5128 4895 2780
2  3 17 78  0
 1 26 14142632  77276   5660 1445592  445 4605  1434  4727 5401 3364
1  3 16 79  0
 0 31 14165668  83736   5976 1387048  362 4288  1444  4428 4739 2963
2  3 17 78  0
 1 28 14180104  77564   6324 1369232  387 4526    4677 5748 3559
1  3 16 80  0

I'm guessing the swap numbers are because MySQL uses mmap?

Anyway, when running the query in Postgresql:

$ vmstat 10
procs ---memory-- ---swap-- -io --system--
-cpu--
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st
 1  2 13866336 1574540  25024 787898033   936   16821
4  2 89  5  0
 1  3 13861520 1163596  25328 8128360 10460 25429   433 4368 4175
4  2 80 14  0
 0  3 13856484 803024  25600 8343220 11170 22062   688 4492 4590
4  2 73 20  0
 0  6 13855304 678868  26052 8435540  1600  9239   598 5195 7141
1  5 70 24  0
 0  6 13853644 513568  26332 8563680  4010 12480  7100 4775 4248
3  3 68 26  0
 2  2 13851804 166228  26624 8775304  6340 21466  1497 4680 4550
6  2 64 28  0
 0  5 13861556  81896  26740 8825360  860 3547  6100  3847 5142 3386
6  2 57 35  0
 0  6 13867688  91368  26808 8832712  653 3326  1835  3604 4738 2762
5  2 61 32  0
 0  5 13870676  88524  26872 8849392  638 3272  2578  3517 4864 2909
4  2 55 39  0
 0  5 13872748  79512  27004 8864456  629 1788  2086  2949 4337 2921
1  3 55 41  0
 0  7 13876760  83124  27136 8867272 1018 2253  1713  2409 4321 2889
0  3 63 33  0
 0  6 13878964  82876  27240 8874540  792 2119  1854  2314 4288 2813
2  2 72 24  0
 3  4 13883204  81224  27280 8887068  661 3067  2995  3385 4558 2899
4  2 72 22  0
 0  6 13886636  82036  27352 8905628  594 3726  2628  4013 4744 2765
4  2 69 25  0
 0  8 13899852  85604  27400 8925800  638 4423  2689  4658 4903 2808
4  2 55 40  0
 1  4 13905184  80544  27484 8940040  676 3501  3006  3799 4805 2932
4  2 66 28  0
 0  9 13908480  80100  27516 8948476  668 2996  1720  3192 4594 2799
4  2 60 35  0

vmstat showed no swapping-out for a while, and then suddenly it
started spilling a lot. Checking psql's memory stats showed that it
was huge -- apparently, it's trying to 

Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Mon, Feb 22, 2010 at 9:30 PM, Alex Hunsaker bada...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 11:10, Yang Zhang yanghates...@gmail.com wrote:
 I have the exact same table of data in both MySQL and Postgresql. In ? 
 Postgresql:

 FWIW on a stock (unchanged postgresql.conf) 8.3.9 I get (best of 3
 runs) 79 seconds, 26 using an index and 27 seconds with it clustered.
 Now yes it goes a lot faster because im skipping all the overhead of
 sending the data to the client...

Last sentence also contributed to my realizing the problem (the client
I was using was psql), but there's one oddity

 # select count(1) from (SELECT * from metarelcould_transactionlog
 order by transactionid) as foo;

Does it strike anyone else that the query optimizer/rewriter should be
able to toss out the sort from such a query altogether?
--
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Tom Lane
Yang Zhang yanghates...@gmail.com writes:
 # select count(1) from (SELECT * from metarelcould_transactionlog
 order by transactionid) as foo;

 Does it strike anyone else that the query optimizer/rewriter should be
 able to toss out the sort from such a query altogether?

It could, if it knew that the aggregate function didn't care about input
row order.  We don't have that knowledge about aggregates ATM.

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Tom Lane
Yang Zhang yanghates...@gmail.com writes:
 I'm relieved that Postgresql itself does not, in fact, suck, but
 slightly disappointed in the behavior of psql. I suppose it needs to
 buffer everything in memory to properly format its tabular output,
 among other possible reasons I could imagine.

That's half of it, and the other half is not wanting to present a
portion of query output if the query fails partway through.  You could
certainly write a client that disregarded these issues (as I suppose
mysql must be doing).

regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Scott Marlowe
On Mon, Feb 22, 2010 at 10:51 PM, Yang Zhang yanghates...@gmail.com wrote:
 nOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe
 scott.marl...@gmail.com wrote:

 What do things like vmstat 10 say while the query is running on each
 db?  First time, second time, things like that.

 Awesome -- this actually led me to discover the problem.

 vmstat showed no swapping-out for a while, and then suddenly it
 started spilling a lot. Checking psql's memory stats showed that it
 was huge -- apparently, it's trying to store its full result set in
 memory. As soon as I added a LIMIT 1, everything worked
 beautifully and finished in 4m (I verified that the planner was still
 issuing a Sort).

 I'm relieved that Postgresql itself does not, in fact, suck, but
 slightly disappointed in the behavior of psql. I suppose it needs to
 buffer everything in memory to properly format its tabular output,
 among other possible reasons I could imagine.

It's best when working with big sets to do so with a cursor and fetch
a few thousand rows at a time.  It's how we handle really big sets at
work and it works like a charm in keeping the client from bogging down
with a huge memory footprint.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Alex Hunsaker
On Mon, Feb 22, 2010 at 22:51, Yang Zhang yanghates...@gmail.com wrote:
 vmstat showed no swapping-out for a while, and then suddenly it
 started spilling a lot. Checking psql's memory stats showed that it
 was huge -- apparently, it's trying to store its full result set in
 memory. As soon as I added a LIMIT 1, everything worked
 beautifully and finished in 4m (I verified that the planner was still
 issuing a Sort).

Well im half surprised no one has recommend using a cursor.  Have you
looked in to that?  I bet that would fix most of your problems here.

 I'm relieved that Postgresql itself does not, in fact, suck, but
 slightly disappointed in the behavior of psql. I suppose it needs to
 buffer everything in memory to properly format its tabular output,
 among other possible reasons I could imagine.

Well AFAIK it will dump everything you asked for.  So if you said
select * from 1G table;  It should take at least 1G and potentially
quite a bit more formatting and overhead.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Sorting performance vs. MySQL

2010-02-22 Thread Yang Zhang
On Tue, Feb 23, 2010 at 1:48 AM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Mon, Feb 22, 2010 at 10:51 PM, Yang Zhang yanghates...@gmail.com wrote:
 nOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe
 scott.marl...@gmail.com wrote:

 What do things like vmstat 10 say while the query is running on each
 db?  First time, second time, things like that.

 Awesome -- this actually led me to discover the problem.

 vmstat showed no swapping-out for a while, and then suddenly it
 started spilling a lot. Checking psql's memory stats showed that it
 was huge -- apparently, it's trying to store its full result set in
 memory. As soon as I added a LIMIT 1, everything worked
 beautifully and finished in 4m (I verified that the planner was still
 issuing a Sort).

 I'm relieved that Postgresql itself does not, in fact, suck, but
 slightly disappointed in the behavior of psql. I suppose it needs to
 buffer everything in memory to properly format its tabular output,
 among other possible reasons I could imagine.

 It's best when working with big sets to do so with a cursor and fetch
 a few thousand rows at a time.  It's how we handle really big sets at
 work and it works like a charm in keeping the client from bogging down
 with a huge memory footprint.


Thing is, this is how I got here:

- ran complex query that does SELECT INTO.
- that never terminated, so killed it and tried a simpler SELECT (the
subject of this thread) from psql to see how long that would take.

I.e., my original application doesn't receive the entire dataset.
-- 
Yang Zhang
http://www.mit.edu/~y_z/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general