[PERFORM] 8.2.13 commit is taking too much time

2011-05-10 Thread AI Rumman
I am using Postgresql 8.2.13 and I found that most of the commits and insert
or update statements  are taking more than 4s in the db and the app
performance is slow for that.
My db settings are as follows;
bgwriter_all_maxpages  | 300 |
 bgwriter_all_percent  | 15  |
 bgwriter_delay| 300 | ms
 bgwriter_lru_maxpages | 50  |
 bgwriter_lru_percent  | 10  |

SHOW checkpoint_segments ;
 checkpoint_segments
-
 300
(1 row)

 show work_mem ;
 work_mem
--
 16MB
(1 row)

 show checkpoint_timeout ;
 checkpoint_timeout

 5min
(1 row)

 show checkpoint_warning ;
 checkpoint_warning

 30s
(1 row)

show shared_buffers ;
 shared_buffers

 4GB
(1 row)


I have 32 gb RAM and its a 4*2=8 core processors.
Any idea how to improve the performance?


Re: [PERFORM] Postgres 9.0.4 + Hot Standby + FusionIO Drive + Performance => Query failed ERROR: catalog is missing 1 attribute(s) for relid 172226

2011-05-10 Thread Sethu Prasad
Yes it has something to do with Hot Standby, if you omit some parts on the
archive then the standby instance will not have the necessary stuff and
complain like this..

I kept the FusionIO drive in my checklist while attending to this issue, as
we tried it looking for performance combined with read-only hot standby and
in doubt I thought that the recovery is not successful on this drive safely.
so I pointed that Fio Drive here.

Straight to say, I missed the pg_clog directory on archive.

seq_page_cost = 1.0

random_page_cost = 1.0

Is the above settings are fine when we deal with Fio and Performance, as I
have the advice earlier stating that read and write are treated same with
Fio drives.

Any suggestions on configuration changes to have read-only hot standby
faster on READs.

- Sethu


On Sun, May 8, 2011 at 11:08 AM, Simon Riggs  wrote:

> On Tue, May 3, 2011 at 10:02 AM, Sethu Prasad 
> wrote:
>
> > I tried with the PostgreSQL 9.0.4 + Hot Standby and running the database
> > from Fusion IO Drive to understand the PG Performance.
> >
> > While doing so I got the "Query failed ERROR: catalog is missing 1
> > attribute(s) for relid 172226". Any idea on this error? Is that
> combination
> > PG + HotSB + Fusion IO Drive is not advisable?!
>
> Why I wonder do you think this might have anything to do with Hot
> Standby and/or FusionIO drives?
>
> This indicates either catalog or catalog index corruption of some kind.
>
> Did you only get this error once?
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [PERFORM] Benchmarking a large server

2011-05-10 Thread Claudio Freire
On Mon, May 9, 2011 at 10:32 PM, Chris Hoover  wrote:
> So, does anyone have any suggestions/experiences in benchmarking storage
> when the storage is smaller then 2x memory?

Try writing a small python script (or C program) to mmap a large chunk
of memory, with MAP_LOCKED, this will keep it in RAM and avoid that
RAM from being used for caching.
The script should touch the memory at least once to avoid overcommit
from getting smart on you.

I think only root can lock memory, so that small program would have to
run as root.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] indexes ignored when querying the master table

2011-05-10 Thread Thomas Hägi

hi florian

sorry for the late reply - it took almost a day to dump & reload the 
data into 9.1b1.



how can i get postgres to use the indexes when querying the master
table?


I believe that this is a new feature in PostgreSQL 9.1 ("Allow
inheritance table queries to return meaningfully-sorted results").


you are right, pgsql 9.1 indeed makes use of the indexes now:

EXPLAIN ANALYZE SELECT * FROM data.logs
ORDER BY re_timestamp DESC LIMIT 100;

Limit  (cost=11.63..36.45 rows=100 width=1390) (actual time=0.169..0.639 
rows=100 loops=1)
  ->  Result  (cost=11.63..6421619.07 rows=25870141 width=1390) (actual 
time=0.154..0.610 rows=100 loops=1)
->  Merge Append  (cost=11.63..6421619.07 rows=25870141 
width=1390) (actual time=0.150..0.429 rows=100 loops=1)

  Sort Key: data.logs.re_timestamp
  ->  Sort  (cost=11.46..11.56 rows=40 width=1776) (actual 
time=0.014..0.014 rows=0 loops=1)

Sort Key: data.logs.re_timestamp
Sort Method: quicksort  Memory: 25kB
->  Seq Scan on logs  (cost=0.00..10.40 rows=40 
width=1776) (actual time=0.003..0.003 rows=0 loops=1)
  ->  Index Scan Backward using logs_2003_timestamp_idx on 
logs_2003 logs  (cost=0.00..373508.47 rows=1825026 width=1327) (actual 
time=0.026..0.026 rows=1 loops=1)
  ->  Index Scan Backward using logs_2004_timestamp_idx on 
logs_2004 logs  (cost=0.00..417220.55 rows=2034041 width=1327) (actual 
time=0.012..0.012 rows=1 loops=1)
  ->  Index Scan Backward using logs_2005_timestamp_idx on 
logs_2005 logs  (cost=0.00..502664.57 rows=2438968 width=1345) (actual 
time=0.015..0.015 rows=1 loops=1)
  ->  Index Scan Backward using logs_2006_timestamp_idx on 
logs_2006 logs  (cost=0.00..640419.01 rows=3091214 width=1354) (actual 
time=0.015..0.015 rows=1 loops=1)
  ->  Index Scan Backward using logs_2007_timestamp_idx on 
logs_2007 logs  (cost=0.00..752875.00 rows=3603739 width=1369) (actual 
time=0.009..0.009 rows=1 loops=1)
  ->  Index Scan Backward using logs_2008_timestamp_idx on 
logs_2008 logs  (cost=0.00..969357.51 rows=4406653 width=1440) (actual 
time=0.007..0.007 rows=1 loops=1)
  ->  Index Scan Backward using logs_2009_timestamp_idx on 
logs_2009 logs  (cost=0.00..862716.39 rows=3986473 width=1422) (actual 
time=0.016..0.016 rows=1 loops=1)
  ->  Index Scan Backward using logs_2010_timestamp_idx on 
logs_2010 logs  (cost=0.00..778529.29 rows=3579586 width=1426) (actual 
time=0.009..0.009 rows=1 loops=1)
  ->  Index Scan Backward using logs_2011_timestamp_idx on 
logs_2011 logs  (cost=0.00..200253.71 rows=904401 width=1453) (actual 
time=0.006..0.089 rows=100 loops=1)

Total runtime: 1.765 ms


thanks for your help,
thomas

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.2.13 commit is taking too much time

2011-05-10 Thread tv

> Any idea how to improve the performance?

Hmmm, I guess we'll need more info about resource usage (CPU, I/O, locks)
used when the commit happens. Run these two commands

$ iostat -x 1
$ vmstat 1

and then execute the commit. See what's causing problems. Is the drive
utilization close to 100%? You've problems with disks (I'd bet this is the
cause). Etc.

There's a very nice chapter about this in Greg's book.

BTW what filesystem are you using? Ext3, ext4, reiserfs, xfs? I do
remember there were some problems with sync, that some filesystems are
unable to sync individual files and always sync everything (which is going
to suck if you want to sync just the WAL).

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres refusing to use >1 core

2011-05-10 Thread Merlin Moncure
On Mon, May 9, 2011 at 9:40 PM, Aren Cambre  wrote:
>> how are you reading through the table? if you are using OFFSET, you
>> owe me a steak dinner.
>>
>
> Nope. :-)
> Below is my exact code for the main thread. The C# PLINQ statement is
> highlighted. Let me know if I can help to explain this.
>
>             NpgsqlConnection arrestsConnection = new
> NpgsqlConnection(Properties.Settings.Default.dbConnectionString);
>
>     arrestsConnection.Open();
>
>
>
>     /// First clear out the geocoding table
>
>     NpgsqlCommand geocodingTableClear = new NpgsqlCommand("TRUNCATE
> raw.\"TxDPS geocoding\"", arrestsConnection);
>
>     geocodingTableClear.ExecuteNonQuery();
>
>
>
>     NpgsqlDataReader arrests = new NpgsqlCommand("SELECT * FROM
> \"raw\".\"TxDPS all arrests\"", arrestsConnection).ExecuteReader();
>
>
>
>     /// Based on the pattern defined at
>
>     ///
> http://social.msdn.microsoft.com/Forums/en-US/parallelextensions/thread/2f5ce226-c500-4899-a923-99285ace42ae.
>
>     foreach(IDataRecord arrest in
>
>     from row in arrests.AsParallel().Cast ()
>
>     select row)
>
>     {
>
>     Geocoder geocodeThis = new Geocoder(arrest);
>
>     geocodeThis.Geocode();
>
>     }
>
>
>
>     arrestsConnection.Close();


hm. I'm not exactly sure.  how about turning on statement level
logging on the server for a bit and seeing if any unexpected queries
are being generated and sent to the server.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-10 Thread Jeff


On May 9, 2011, at 4:50 PM, Merlin Moncure wrote:


hm, if it was me, I'd write a small C program that just jumped
directly on the device around and did random writes assuming it wasn't
formatted.  For sequential read, just flush caches and dd the device
to /dev/null.  Probably someone will suggest better tools though.

merlin




http://pgfoundry.org/projects/pgiosim

it is a small program we use to beat the [bad word] out of io systems.
it randomly seeks, does an 8kB read, optionally writes it out (and  
optionally fsyncing) and reports how fast it is going (you need to  
watch iostat output as well so you can see actual physical tps without  
hte OS cache interfering).


It goes through regular read & write calls like PG (I didn't want to  
bother with junk like o_direct & friends).


it is also now multithreaded so you can fire up a bunch of random read  
threads (rather than firing up a bunch of pgiosims in parallel) and  
see how things scale up.



--
Jeff Trout 
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-10 Thread Cédric Villemain
2011/5/10 Greg Smith :
> On 05/09/2011 11:13 PM, Shaun Thomas wrote:
>>
>> Take a look at /proc/sys/vm/dirty_ratio and
>> /proc/sys/vm/dirty_background_ratio if you have an older Linux system, or
>> /proc/sys/vm/dirty_bytes, and /proc/sys/vm/dirty_background_bytes with a
>> newer one.
>> On older systems for instance, those are set to 40 and 20 respectively
>> (recent kernels cut these in half).
>
> 1/4 actually; 10% and 5% starting in kernel 2.6.22.  The main sources of
> this on otherwise new servers I see are RedHat Linux RHEL5 systems  running
> 2.6.18.  But as you say, even the lower defaults of the newer kernels can be
> way too much on a system with lots of RAM.

one can experiment writeback storm with this script from Chris Mason,
under GPLv2:
http://oss.oracle.com/~mason/fsync-tester.c

You need to tweak it a bit, AFAIR, this  #define SIZE (32768*32) must
be reduced to be equal to 8kb blocks if you want similar to pg write
pattern.

The script does a big file, many small fsync, writing on both. Please,
see http://www.spinics.net/lists/linux-ext4/msg24308.html

It is used as a torture program by some linuxfs-hackers and may be
useful for the OP on his large server to validate hardware and kernel.


>
> The main downside I've seen of addressing this by using a kernel with
> dirty_bytes and dirty_background_bytes is that VACUUM can slow down
> considerably.  It really relies on the filesystem having a lot of write
> cache to perform well.  In many cases people are happy with VACUUM
> throttling if it means nasty I/O spikes go away, but the trade-offs here are
> still painful at times.
>
> --
> Greg Smith   2ndQuadrant US    g...@2ndquadrant.com   Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
> "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 8.2.13 commit is taking too much time

2011-05-10 Thread Greg Smith

On 05/10/2011 03:01 AM, AI Rumman wrote:
I am using Postgresql 8.2.13 and I found that most of the commits and 
insert or update statements  are taking more than 4s in the db and the 
app performance is slow for that.

My db settings are as follows;
bgwriter_all_maxpages  | 300 |
 bgwriter_all_percent  | 15  |
 bgwriter_delay| 300 | ms
 bgwriter_lru_maxpages | 50  |
 bgwriter_lru_percent  | 10  |


Reduce bgwriter_all_maxpages to 0, definitely, and you might drop 
bgwriter_lru_maxpages to 0 too.  Making the background writer in 
PostgreSQL 8.2 do more work as you've tried here increases the amount of 
repeated I/O done by a lot, without actually getting rid of any pauses.  
It wastes a lot of I/O capacity instead, making the problems you're 
seeing worse.



 shared_buffers

 4GB



On 8.2, shared_buffers should be no more than 128MB if you want to avoid 
long checkpoint pauses.  You might even find best performance at the 
default of 32MB.




I have 32 gb RAM and its a 4*2=8 core processors.
Any idea how to improve the performance?


There's nothing you can do here that will work better than upgrading to 
8.3.  See 
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm for 
more information.  PostgreSQL 8.2 had serious problems with the sort of 
pauses you're seeing back when systems had only 4GB of memory; you'll 
never get rid of them on a server with 32GB of RAM on that version.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partition query on multiple cores

2011-05-10 Thread Maciek Sakrejda
> I have 8-core server, I wanted to ask whether a query can be divided for
> multiple processors or cores, if it could be what to do in postgresql

No, at this time (and for the foreseeable future), a single query will
run on a single core.
---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partition query on multiple cores

2011-05-10 Thread Shaun Thomas

On 05/10/2011 10:06 AM, Maciek Sakrejda wrote:


I have 8-core server, I wanted to ask whether a query can be divided for
multiple processors or cores, if it could be what to do in postgresql


No, at this time (and for the foreseeable future), a single query will
run on a single core.


It can *kinda* be done. Take a look at GridSQL. It's really good for 
splitting up reporting-like queries that benefit from parallel access of 
large tables. It's not exactly Hadoop, but I ran a test on a single 
system with two separate instances of PostgreSQL, and a single query 
over those two nodes cut execution time in half.


It's meant for server parallelism, so I wouldn't necessarily recommend 
splitting your data up across nodes on the same server. But it seems to 
deliver as promised when used in the right circumstances.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Question processor speed differences.

2011-05-10 Thread Tory M Blue
AMD Opteron(tm) Processor 4174 HE  vs  Intel(R) Xeon(R) CPUE5345  @ 2.33GHz

I'm wondering if there is a performance difference running postgres on
fedora  on AMD vs Intel (the 2 listed above).

I have an 8 way Intel Xeon box and a 12way AMD box and was thinking
about migrating to the new AMD box, from the 4 year old Intel box. But
I wasn't sure if there is some performance stats on AMD multi core
procs vs the Intels for DB applications?

Thanks

Tory

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-05-10 Thread Robert Haas
On Tue, Apr 5, 2011 at 3:25 PM, Maria L. Wilson
 wrote:
> Would really appreciate someone taking a look at the query below  Thanks
> in advance!
>
>
> this is on a linux box...
> Linux dsrvr201.larc.nasa.gov 2.6.18-164.9.1.el5 #1 SMP Wed Dec 9 03:27:37
> EST 2009 x86_64 x86_64 x86_64 GNU/Linux
>
> explain analyze
> select MIN(IV.STRTDATE), MAX(IV.ENDDATE)
> from GRAN_VER GV
> left outer join INVENTORY IV on GV.GRANULE_ID = IV.GRANULE_ID, INVSENSOR
> INVS
> where IV.INV_ID='65' and GV.GRANULE_ID = INVS.granule_id and
> INVS.sensor_id='13'
>
>
> "Aggregate  (cost=736364.52..736364.53 rows=1 width=8) (actual
> time=17532.930..17532.930 rows=1 loops=1)"
> "  ->  Hash Join  (cost=690287.33..734679.77 rows=336949 width=8) (actual
> time=13791.593..17323.080 rows=924675 loops=1)"
> "    Hash Cond: (invs.granule_id = gv.granule_id)"
> "    ->  Seq Scan on invsensor invs  (cost=0.00..36189.41 rows=1288943
> width=4) (actual time=0.297..735.375 rows=1277121 loops=1)"
> "  Filter: (sensor_id = 13)"
> "    ->  Hash  (cost=674401.52..674401.52 rows=1270865 width=16) (actual
> time=13787.698..13787.698 rows=1270750 loops=1)"
> "  ->  Hash Join  (cost=513545.62..674401.52 rows=1270865
> width=16) (actual time=1998.702..13105.578 rows=1270750 loops=1)"
> "    Hash Cond: (gv.granule_id = iv.granule_id)"
> "    ->  Seq Scan on gran_ver gv  (cost=0.00..75224.90
> rows=4861490 width=4) (actual time=0.008..1034.885 rows=4867542 loops=1)"
> "    ->  Hash  (cost=497659.81..497659.81 rows=1270865
> width=12) (actual time=1968.918..1968.918 rows=1270750 loops=1)"
> "  ->  Bitmap Heap Scan on inventory iv
> (cost=24050.00..497659.81 rows=1270865 width=12) (actual
> time=253.542..1387.957 rows=1270750 loops=1)"
> "    Recheck Cond: (inv_id = 65)"
> "    ->  Bitmap Index Scan on inven_idx1
> (cost=0.00..23732.28 rows=1270865 width=0) (actual time=214.364..214.364
> rows=1270977 loops=1)"
> "  Index Cond: (inv_id = 65)"
> "Total runtime: 17533.100 ms"
>
> some additional info.
> the table inventory is about 4481 MB and also has postgis types.
> the table gran_ver is about 523 MB
> the table INVSENSOR is about 217 MB
>
> the server itself has 32G RAM with the following set in the postgres conf
> shared_buffers = 3GB
> work_mem = 64MB
> maintenance_work_mem = 512MB
> wal_buffers = 6MB
>
> let me know if I've forgotten anything!  thanks a bunch!!

Late response here, but...

Is there an index on invsensor (sensor_id, granule_id)?  If not, that
might be something to try.  If so, you might want to try to figure out
why it's not being used.

Likewise, is there an index on gran_ver (granule_id)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmarking a large server

2011-05-10 Thread Greg Smith

Greg Smith wrote:

On 05/09/2011 11:13 PM, Shaun Thomas wrote:
Take a look at /proc/sys/vm/dirty_ratio and 
/proc/sys/vm/dirty_background_ratio if you have an older Linux 
system, or /proc/sys/vm/dirty_bytes, and 
/proc/sys/vm/dirty_background_bytes with a newer one.
On older systems for instance, those are set to 40 and 20 
respectively (recent kernels cut these in half).


1/4 actually; 10% and 5% starting in kernel 2.6.22.  The main sources 
of this on otherwise new servers I see are RedHat Linux RHEL5 systems  
running 2.6.18.  But as you say, even the lower defaults of the newer 
kernels can be way too much on a system with lots of RAM.


Ugh...we're both right, sort of.  2.6.22 dropped them to 5/10:  
http://kernelnewbies.org/Linux_2_6_22 as I said.  But on the new 
Scientific Linux 6 box I installed yesterday, they're at 10/20--as you 
suggested.


Can't believe I'm going to need a table by kernel version and possibly 
distribution to keep this all straight now, what a mess.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-05-10 Thread Robert Haas
[ woops, accidentally replied off-list, trying again ]

On Tue, May 10, 2011 at 1:47 PM, Maria L. Wilson
 wrote:
> thanks for taking a look at this  and it's never too late!!
>
> I've tried bumping up work_mem and did not see any improvements -
> All the indexes do exist that you asked see below
> Any other ideas?
>
> CREATE INDEX invsnsr_idx1
>  ON invsensor
>  USING btree
>  (granule_id);
>
> CREATE INDEX invsnsr_idx2
>  ON invsensor
>  USING btree
>  (sensor_id);

What about a composite index on both columns?

> CREATE UNIQUE INDEX granver_idx1
>  ON gran_ver
>  USING btree
>  (granule_id);

It's a bit surprising to me that this isn't getting used.  How big are
these tables, and how much memory do you have, and what values are you
using for seq_page_cost/random_page_cost/effective_cache_size?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partition query on multiple cores

2011-05-10 Thread Tomas Vondra
Dne 10.5.2011 18:22, Shaun Thomas napsal(a):
> On 05/10/2011 10:06 AM, Maciek Sakrejda wrote:
> 
>>> I have 8-core server, I wanted to ask whether a query can be divided for
>>> multiple processors or cores, if it could be what to do in postgresql
>>
>> No, at this time (and for the foreseeable future), a single query will
>> run on a single core.
> 
> It can *kinda* be done. Take a look at GridSQL.

Or pgpool-II, that can give you something similar.

http://pgpool.projects.postgresql.org/

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Postgres NoSQL emulation

2011-05-10 Thread Pierre C


While reading about NoSQL,

MongoDB let's you store and search JSON objects.In that case, you don't  
need to have the same "columns" in each "row"


The following ensued. Isn't it cute ?

CREATE TABLE mongo ( id SERIAL PRIMARY KEY, obj hstore NOT NULL );
INSERT INTO mongo (obj) SELECT ('a=>'||n||',key'||(n%10)||'=>'||n)::hstore  
FROM generate_series(1,10) n;


SELECT * FROM mongo LIMIT 10;
 id |   obj
+-
  1 | "a"=>"1", "key1"=>"1"
  2 | "a"=>"2", "key2"=>"2"
  3 | "a"=>"3", "key3"=>"3"
  4 | "a"=>"4", "key4"=>"4"
  5 | "a"=>"5", "key5"=>"5"
  6 | "a"=>"6", "key6"=>"6"
  7 | "a"=>"7", "key7"=>"7"
  8 | "a"=>"8", "key8"=>"8"
  9 | "a"=>"9", "key9"=>"9"
 10 | "a"=>"10", "key0"=>"10"

CREATE INDEX mongo_a ON mongo((obj->'a')) WHERE (obj->'a') IS NOT NULL;
CREATE INDEX mongo_k1 ON mongo((obj->'key1')) WHERE (obj->'key1') IS NOT  
NULL;
CREATE INDEX mongo_k2 ON mongo((obj->'key2')) WHERE (obj->'key2') IS NOT  
NULL;

VACUUM ANALYZE mongo;

SELECT * FROM mongo WHERE (obj->'key1')='271';
 id  |obj
-+---
 271 | "a"=>"271", "key1"=>"271"
(1 ligne)

EXPLAIN ANALYZE SELECT * FROM mongo WHERE (obj->'key1')='271';
 QUERY PLAN
-
 Index Scan using mongo_k1 on mongo  (cost=0.00..567.05 rows=513 width=36)  
(actual time=0.024..0.025 rows=1 loops=1)

   Index Cond: ((obj -> 'key1'::text) = '271'::text)
 Total runtime: 0.048 ms

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Question processor speed differences.

2011-05-10 Thread Greg Smith

On 05/10/2011 01:28 PM, Tory M Blue wrote:

AMD Opteron(tm) Processor 4174 HE  vs  Intel(R) Xeon(R) CPUE5345  @ 2.33GHz

I'm wondering if there is a performance difference running postgres on
fedora  on AMD vs Intel (the 2 listed above).

I have an 8 way Intel Xeon box and a 12way AMD box and was thinking
about migrating to the new AMD box, from the 4 year old Intel box. But
I wasn't sure if there is some performance stats on AMD multi core
procs vs the Intels for DB applications?
   


The real limiting factor on CPU performance on your E5345 is how fast 
the server can shuffle things back and forth to memory.  The FB-DIMM 
DDR2-667MHz memory on that server will be hard pressed to clear 5GB/s of 
memory access, probably less.  That matter a lot when running in-memory 
database tasks, where the server is constantly shuffling 8K pages of 
data around.


The new AMD box will have DDR3-1333 Mhz and a much better memory 
architecture to go with it.  I'd expect 6 to 7GB/s out of a single core, 
and across multiple cores you might hit as much as 20GB/s if you have 4 
channels of memory in there.  Rough guess, new server is at least twice 
as fast, and might even hit four times as fast.


If you have access to both boxes and can find a quiet period, you could 
try running stream-scaling:  https://github.com/gregs1104/stream-scaling 
to quantify for yourself just how big the speed difference in this 
area.  That's correlated extremely well for me with PostgreSQL 
performance on SELECT statements.  If you're going to disk instead of 
being limited by the CPU, none of this matters though.  Make sure you 
really are waiting for the CPUs most of the time.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres NoSQL emulation

2011-05-10 Thread Merlin Moncure
On Tue, May 10, 2011 at 12:56 PM, Pierre C  wrote:
>
> While reading about NoSQL,
>
>> MongoDB let's you store and search JSON objects.In that case, you don't
>> need to have the same "columns" in each "row"
>
> The following ensued. Isn't it cute ?
>
> CREATE TABLE mongo ( id SERIAL PRIMARY KEY, obj hstore NOT NULL );
> INSERT INTO mongo (obj) SELECT ('a=>'||n||',key'||(n%10)||'=>'||n)::hstore
> FROM generate_series(1,10) n;
>
> SELECT * FROM mongo LIMIT 10;
>  id |           obj
> +-
>  1 | "a"=>"1", "key1"=>"1"
>  2 | "a"=>"2", "key2"=>"2"
>  3 | "a"=>"3", "key3"=>"3"
>  4 | "a"=>"4", "key4"=>"4"
>  5 | "a"=>"5", "key5"=>"5"
>  6 | "a"=>"6", "key6"=>"6"
>  7 | "a"=>"7", "key7"=>"7"
>  8 | "a"=>"8", "key8"=>"8"
>  9 | "a"=>"9", "key9"=>"9"
>  10 | "a"=>"10", "key0"=>"10"
>
> CREATE INDEX mongo_a ON mongo((obj->'a')) WHERE (obj->'a') IS NOT NULL;
> CREATE INDEX mongo_k1 ON mongo((obj->'key1')) WHERE (obj->'key1') IS NOT
> NULL;
> CREATE INDEX mongo_k2 ON mongo((obj->'key2')) WHERE (obj->'key2') IS NOT
> NULL;
> VACUUM ANALYZE mongo;
>
> SELECT * FROM mongo WHERE (obj->'key1')='271';
>  id  |            obj
> -+---
>  271 | "a"=>"271", "key1"=>"271"
> (1 ligne)
>
> EXPLAIN ANALYZE SELECT * FROM mongo WHERE (obj->'key1')='271';
>                                                     QUERY PLAN
> -
>  Index Scan using mongo_k1 on mongo  (cost=0.00..567.05 rows=513 width=36)
> (actual time=0.024..0.025 rows=1 loops=1)
>   Index Cond: ((obj -> 'key1'::text) = '271'::text)
>  Total runtime: 0.048 ms

why even  have multiple rows? just jam it all it there! :-D

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres refusing to use >1 core

2011-05-10 Thread Craig Ringer
On 05/11/2011 05:34 AM, Aren Cambre wrote:

> Using one thread, the app can do about 111 rows per second, and it's
> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows /
> 111 rows per second ~= 30 hours.
>
> I hoped to speed things up with some parallel processing.
>
> When the app is multithreaded, the app itself consumes about 3% CPU time
> waiting for Postgres, which is only hammering 1 core and barely
> exercising disk I/O (per two programs and HDD light).

OK, so before looking at parallelism, you might want to look at why
you're not getting much out of Pg and your app with even one thread. You
should be able to put a high load on the disk disk - or one cpu core -
without needing to split out work into multiple threads and parallel
workers.

I suspect your app is doing lots of tiny single-row queries instead of
efficiently batching things. It'll be wasting huge amounts of time
waiting for results. Even if every query is individually incredibly
fast, with the number of them you seem to be doing you'll lose a LOT of
time if you loop over lots of little SELECTs.

The usual cause of the kind of slow performance you describe is an app
that "chats" with the database continuously, so its pattern is:

loop:
  ask for row from database using SELECT
  retrieve result
  do a tiny bit of processing
  continue loop

This is incredibly inefficient, because Pg is always waiting for the app
to ask for something or the app is waiting for Pg to return something.
During each switch there are delays and inefficiencies. It's actually:


loop:
  ask for a single row from database using SELECT
  [twiddle thumbs while database plans and executes the query]
  retrieve result
  do a tiny bit of processing   [Pg twiddles its thumbs]
  continue loop

What you want is your app and Pg working at the same time.

Assuming that CPU is the limitation rather than database speed and disk
I/O I'd use something like this:

Thread 1:
  get cursor for selecting all rows from database
  loop:
 get 100 rows from cursor
 add rows to processing queue
 if queue contains over 1000 rows:
 wait until queue contains less than 1000 rows

Thread 2:
  until there are no more rows:
ask Thread 1 for 100 rows
for each row:
   do a tiny bit of processing


By using a producer/consumer model like that you can ensure that thread
1 is always talking to the database, keeping Pg busy, and thread 2 is
always working the CPUs. The two threads should share NOTHING except the
queue to keep the whole thing simple and clean. You must make sure that
the "get 100 rows" operation of the producer can happen even while the
producer is in the middle of getting some more rows from Pg (though not
necessarily in the middle of actually appending them to the queue data
structure) so you don't accidentally serialize on access to the producer
thread.

If the single producer thread can't keep, try reading in bigger batches
or adding more producer threads with a shared queue. If the single
consumer thread can't keep up with the producer, add more consumers to
use more CPU cores.

[producer 1] [producer 2] [...] [producer n]
|   |  ||
-
 |
   queue
 |
-
|  ||   |
[worker 1] [worker 2] [...]   [worker n]

... or you can have each worker fetch its own chunks of rows (getting
rid of the producer/consumer split) using its own connection and just
have lots more workers to handle all the wasted idle time. A
producer/consumer approach will probably be faster, though.

If the consumer threads produce a result that must be sent back to the
database, you can either have each thread write it to the database using
its own connection when it's done, or you can have them delegate that
work to another thread that's dedicated to INSERTing the results. If the
INSERTer can't keep up, guess what, you spawn more of them working off a
shared queue.

If the consumer threads require additional information from the database
to do their work, make sure they avoid the:

loop:
  fetch one row
  do work on row

pattern, instead fetching sets of rows from the database in batches. Use
joins if necessary, or the IN() criterion.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres refusing to use >1 core

2011-05-10 Thread Craig Ringer
On 11/05/11 05:34, Aren Cambre wrote:

> Using one thread, the app can do about 111 rows per second, and it's
> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows /
> 111 rows per second ~= 30 hours.

I don't know how I missed that. You ARE maxing out one cpu core, so
you're quite right that you need more threads unless you can make your
single worker more efficient.

Why not just spawn more copies of your program and have them work on
ranges of the data, though? Might that not be simpler than juggling
threading schemes?

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres NoSQL emulation

2011-05-10 Thread Scott Marlowe
On Tue, May 10, 2011 at 3:32 PM, Merlin Moncure  wrote:
> why even  have multiple rows? just jam it all it there! :-D

Exactly, serialize the object and stuff it into a simple key->value
table.  Way more efficient than EAV.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres refusing to use >1 core

2011-05-10 Thread Scott Marlowe
On Tue, May 10, 2011 at 7:35 PM, Craig Ringer
 wrote:
> On 11/05/11 05:34, Aren Cambre wrote:
>
>> Using one thread, the app can do about 111 rows per second, and it's
>> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000 rows /
>> 111 rows per second ~= 30 hours.
>
> I don't know how I missed that. You ARE maxing out one cpu core, so
> you're quite right that you need more threads unless you can make your
> single worker more efficient.
>
> Why not just spawn more copies of your program and have them work on
> ranges of the data, though? Might that not be simpler than juggling
> threading schemes?

I suggested that earlier.  But now I'm wondering if there's
efficiencies to be gained by moving all the heavy lifting to the db as
well as splitting thiings into multiple partitions to work on.  I.e.
don't grab 1,000 rows and work on them on the client side and then
insert data, do the data mangling in the query in the database.  My
experience has been that moving things like this into the database can
result in performance gains of several factors, taking hour long
processes and making them run in minutes.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-05-10 Thread Maria L. Wilson

thanks for taking a look at this  and it's never too late!!

I've tried bumping up work_mem and did not see any improvements -
All the indexes do exist that you asked see below
Any other ideas?

CREATE INDEX invsnsr_idx1
  ON invsensor
  USING btree
  (granule_id);

CREATE INDEX invsnsr_idx2
  ON invsensor
  USING btree
  (sensor_id);

CREATE UNIQUE INDEX granver_idx1
  ON gran_ver
  USING btree
  (granule_id);

thanks for the look -
Maria Wilson
NASA/Langley Research Center
Hampton, Virginia 23681
m.l.wil...@nasa.gov

On 5/10/11 1:38 PM, Robert Haas wrote:

On Tue, Apr 5, 2011 at 3:25 PM, Maria L. Wilson
  wrote:

Would really appreciate someone taking a look at the query below  Thanks
in advance!


this is on a linux box...
Linux dsrvr201.larc.nasa.gov 2.6.18-164.9.1.el5 #1 SMP Wed Dec 9 03:27:37
EST 2009 x86_64 x86_64 x86_64 GNU/Linux

explain analyze
select MIN(IV.STRTDATE), MAX(IV.ENDDATE)
from GRAN_VER GV
left outer join INVENTORY IV on GV.GRANULE_ID = IV.GRANULE_ID, INVSENSOR
INVS
where IV.INV_ID='65' and GV.GRANULE_ID = INVS.granule_id and
INVS.sensor_id='13'


"Aggregate  (cost=736364.52..736364.53 rows=1 width=8) (actual
time=17532.930..17532.930 rows=1 loops=1)"
"  ->   Hash Join  (cost=690287.33..734679.77 rows=336949 width=8) (actual
time=13791.593..17323.080 rows=924675 loops=1)"
"Hash Cond: (invs.granule_id = gv.granule_id)"
"->   Seq Scan on invsensor invs  (cost=0.00..36189.41 rows=1288943
width=4) (actual time=0.297..735.375 rows=1277121 loops=1)"
"  Filter: (sensor_id = 13)"
"->   Hash  (cost=674401.52..674401.52 rows=1270865 width=16) (actual
time=13787.698..13787.698 rows=1270750 loops=1)"
"  ->   Hash Join  (cost=513545.62..674401.52 rows=1270865
width=16) (actual time=1998.702..13105.578 rows=1270750 loops=1)"
"Hash Cond: (gv.granule_id = iv.granule_id)"
"->   Seq Scan on gran_ver gv  (cost=0.00..75224.90
rows=4861490 width=4) (actual time=0.008..1034.885 rows=4867542 loops=1)"
"->   Hash  (cost=497659.81..497659.81 rows=1270865
width=12) (actual time=1968.918..1968.918 rows=1270750 loops=1)"
"  ->   Bitmap Heap Scan on inventory iv
(cost=24050.00..497659.81 rows=1270865 width=12) (actual
time=253.542..1387.957 rows=1270750 loops=1)"
"Recheck Cond: (inv_id = 65)"
"->   Bitmap Index Scan on inven_idx1
(cost=0.00..23732.28 rows=1270865 width=0) (actual time=214.364..214.364
rows=1270977 loops=1)"
"  Index Cond: (inv_id = 65)"
"Total runtime: 17533.100 ms"

some additional info.
the table inventory is about 4481 MB and also has postgis types.
the table gran_ver is about 523 MB
the table INVSENSOR is about 217 MB

the server itself has 32G RAM with the following set in the postgres conf
shared_buffers = 3GB
work_mem = 64MB
maintenance_work_mem = 512MB
wal_buffers = 6MB

let me know if I've forgotten anything!  thanks a bunch!!

Late response here, but...

Is there an index on invsensor (sensor_id, granule_id)?  If not, that
might be something to try.  If so, you might want to try to figure out
why it's not being used.

Likewise, is there an index on gran_ver (granule_id)?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] help speeding up a query in postgres 8.4.5

2011-05-10 Thread Maria L. Wilson

haven't tested a composite index

invsensor is 2,003,980 rows and 219MB
granver is 5,138,730 rows and 556MB
the machine has 32G memory
seq_page_cost, random_page_costs & effective_cache_size are set to the 
defaults (1,4, and 128MB) - looks like they could be bumped up.

Got any recommendations?

Maria

On 5/10/11 1:59 PM, Robert Haas wrote:

[ woops, accidentally replied off-list, trying again ]

On Tue, May 10, 2011 at 1:47 PM, Maria L. Wilson
  wrote:

thanks for taking a look at this  and it's never too late!!

I've tried bumping up work_mem and did not see any improvements -
All the indexes do exist that you asked see below
Any other ideas?

CREATE INDEX invsnsr_idx1
  ON invsensor
  USING btree
  (granule_id);

CREATE INDEX invsnsr_idx2
  ON invsensor
  USING btree
  (sensor_id);

What about a composite index on both columns?


CREATE UNIQUE INDEX granver_idx1
  ON gran_ver
  USING btree
  (granule_id);

It's a bit surprising to me that this isn't getting used.  How big are
these tables, and how much memory do you have, and what values are you
using for seq_page_cost/random_page_cost/effective_cache_size?

...Robert


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance