Re: [PERFORM] More cores or higer frequency ?

2017-05-23 Thread Steve Crawford
On Tue, May 23, 2017 at 1:49 PM, Sven R. Kunze  wrote:

> On 23.05.2017 22:14, Jarek wrote:
>
>> I have pool of clients (~30) inserting to database about 50 records per
>> second (in total from all clients) and small numer (<10) clients
>> querying database for those records once per 10s.
>> Other queries are rare and irregular.
>> The biggest table has ~ 100mln records (older records are purged
>> nightly). Database size is ~13GB.
>> I near future I'm expecting ~150 clients and 250 inserts per second and
>> more clients querying database.
>> Server is handling also apache with simple web application written in
>> python.
>> For the same price, I can get 8C/3.2GHz or 14C/2.6GHz. Which one will be
>> better ?
>>
>>
I would start by trying a few things on your existing equipment.

If your inserts are coming from individual connections, say, via the
web-app in a connect-insert-disconnect fashion then pooling can be a huge
win. Connection overhead is a bigger factor than you might imagine and I've
seen as much as a 10x improvement in small queries when pooling was added.

If the every-10-second queries are running on the recently inserted data
then partitioning by time range could substantially improve the speed of
inserts, queries and purging. It's pretty easy to do, now, with pg_partman
or similar but built-in auto-partitioning is coming in version 10.

Fast commit to disk is a win - think SSD or RAID with BBU cache and with a
relatively modest 13GB database you should be able to spec enough RAM to
keep everything in memory.

Cheers,
Steve


Re: [PERFORM] More cores or higer frequency ?

2017-05-23 Thread Steve Crawford
The answer, as always, is "it depends."

Can you give us an overview of your setup? The appropriate setup for small
numbers of long-running analytical queries (typically faster CPUs) will be
different than a setup for handling numerous simultaneous connections
(typically more cores).

But CPU is often not the limiting factor. With a better understanding of
your needs, people here can offer suggestions for memory, storage, pooling,
network, etc.

Cheers,
Steve


On Tue, May 23, 2017 at 11:29 AM, Jarek  wrote:

> Hello!
>
> I've heavy loaded PostgreSQL server, which I want to upgrade, so it will
> handle more traffic. Can I estimate what is better: more cores or
> higher frequency ? I expect that pg_stat should give some tips, but
> don't know where to start...
>
> best regards
> Jarek
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Performance problems with 9.2.15

2016-05-27 Thread Steve Crawford
>
> ...(BTW, I wonder why you are moving only to 9.2 and not something more
>> recent.)
>>
>
> Well, 9.2.15 is what comes bundled with RHEL 7, so I decided to go with
> that to avoid dependency issues. But I could install a more fresh version
> from scratch if that would solve my problem.
>

Generally my first step is to get the latest stable directly from the
PostgreSQL Development Group, i.e.:
yum install
https://download.postgresql.org/pub/repos/yum/9.5/redhat/rhel-7-x86_64/pgdg-redhat95-9.5-2.noarch.rpm

Then I know I'm starting with the latest and greatest and will get critical
updates without worrying about any distribution packager delays.

Cheers,
Steve


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-06 Thread Steve Crawford

On 07/02/2015 07:01 AM, Wes Vaske (wvaske) wrote:


What about a RAID controller? Are RAID controllers even available for 
PCI-Express SSD drives, or do we have to stick with SATA if we need a 
battery-backed RAID controller? Or is software RAID sufficient for SSD 
drives?


Quite a few of the benefits of using a hardware RAID controller are 
irrelevant when using modern SSDs. The great random write performance 
of the drives means the cache on the controller is less useful and the 
drives you’re considering (Intel’s enterprise grade) will have full 
power protection for inflight data.




For what it's worth, in my most recent iteration I decided to go with 
the Intel Enterprise NVMe drives and no RAID. My reasoning was thus:


1. Modern SSDs are so fast that even if you had an infinitely fast RAID 
card you would still be severely constrained by the limits of SAS/SATA. 
To get the full speed advantages you have to connect directly into the bus.


2. We don't typically have redundant electronic components in our 
servers. Sure, we have dual power supplies and dual NICs (though 
generally to handle external failures) and ECC-RAM but no hot-backup CPU 
or redundant RAM banks and...no backup RAID card. Intel Enterprise SSD 
already have power-fail protection so I don't need a RAID card to give 
me BBU. Given the MTBF of good enterprise SSD I'm left to wonder if 
placing a RAID card in front merely adds a new point of failure and 
scheduled-downtime-inducing hands-on maintenance (I'm looking at you, 
RAID backup battery).


3. I'm streaming to an entire redundant server and doing regular backups 
anyway so I'm covered for availability and recovery should the SSD (or 
anything else in the server) fail.


BTW, here's an article worth reading: 
https://blog.algolia.com/when-solid-state-drives-are-not-that-solid/


Cheers,
Steve



Re: [PERFORM] Query running slow for only one specific id. (Postgres 9.3) version

2015-06-05 Thread Steve Crawford

On 06/05/2015 10:54 AM, Sheena, Prabhjot wrote:


Postgresql 9.3 Version

Guys

  Here  is the issue that I’m facing for couple of weeks now. 
I have table (size  7GB)


*If I run this query with this specific registration id it is using 
the wrong execution plan and takes more than a minute to complete. 
Total number of rows for this registration_id is only 414 in this table*


explain analyze SELECT max(last_update_date) AS last_update_date FROM 
btdt_responses WHERE registration_id = 8718704208 AND response != 4;


QUERY PLAN

--

Result  (cost=2902.98..2903.01 rows=1 width=0) (actual 
time=86910.730..86910.731 rows=1 loops=1)


   InitPlan 1 (returns $0)

 -  Limit  (cost=0.57..2902.98 rows=1 width=8) (actual 
time=86910.725..86910.725 rows=1 loops=1)


   -  Index Scan Backward using btdt_responses_n5 on 
btdt_responses  (cost=0.57..6425932.41 rows=2214 width=8) (actual 
time=86910.723..86910.723 rows=1 loops=1)


 Index Cond: (last_update_date IS NOT NULL)

 Filter: ((response  4) AND (registration_id = 
8718704208::bigint))


 Rows Removed by Filter: 52145434

Total runtime: 86910.766 ms

*Same query with any other registration id will come back in milli 
seconds *


explain analyze SELECT max(last_update_date) AS last_update_date FROM 
btdt_responses WHERE registration_id = 8688546267 AND response != 4;


QUERY PLAN

--

Aggregate  (cost=529.75..529.78 rows=1 width=8) (actual 
time=19.723..19.723 rows=1 loops=1)


   -  Index Scan using btdt_responses_u2 on btdt_responses  
(cost=0.57..529.45 rows=119 width=8) (actual time=0.097..19.689 
rows=72 loops=1)


 Index Cond: (registration_id = 8688546267::bigint)

 Filter: (response  4)

 Rows Removed by Filter: 22

Total runtime: 19.769 ms


A couple initial questions:

1. Does the result change if you analyze the table and rerun the query?

2. Are there any non-default settings for statistics collection on your 
database?


-Steve



Re: [PERFORM] Re: [GENERAL] Re: Query running slow for only one specific id. (Postgres 9.3) version

2015-06-05 Thread Steve Crawford

On 06/05/2015 12:28 PM, Steve Crawford wrote:

On 06/05/2015 11:38 AM, Sheena, Prabhjot wrote:


When I run vacuum analyze it fixes the problem but after 1 or 2 days 
the problem comes back




Is autovacuum running and using what settings?

(select name, setting from pg_settings where name ~ 'autovacuum' 
Konsole output or name ~ 'statistics';)


Cheers,
Steve

P.S. The convention on the PostgreSQL mailing lists it to bottom-post, 
not top-post replies.
Konsole outpor name ~ 'statistics';) 


And just to confirm, are there any table-specific overrides to the 
system-wide settings?


Cheers,
Steve


[PERFORM] Re: [GENERAL] Re: Query running slow for only one specific id. (Postgres 9.3) version

2015-06-05 Thread Steve Crawford

On 06/05/2015 11:38 AM, Sheena, Prabhjot wrote:


When I run vacuum analyze it fixes the problem but after 1 or 2 days 
the problem comes back




Is autovacuum running and using what settings?

(select name, setting from pg_settings where name ~ 'autovacuum' Konsole 
output or name ~ 'statistics';)


Cheers,
Steve

P.S. The convention on the PostgreSQL mailing lists it to bottom-post, 
not top-post replies.

Konsole outpor name ~ 'statistics';)


[PERFORM] New server optimization advice

2015-01-09 Thread Steve Crawford
I will soon be migrating to some recently acquired hardware and seek 
input from those who have gone before.


A quick overview: the dataset size is ~100GB, (~250-million tuples) with 
a workload that consists of about 2/3 writes, mostly single record 
inserts into various indexed tables, and 1/3 reads. Queries per second 
peak around 2,000 and our application typically demands fast response - 
for many of these queries the timeout is set to 2-seconds and the 
application moves forward and recovers later if that is exceeded.


Although by count they are minimal, every hour there are dozens both of 
import and of analysis queries involving multiple tables and tens of 
thousands of records. These queries may take up to a few minutes on our 
current hardware.


Old hardware is 4-core, 24GB RAM, battery-backed RAID-10 with four 15k 
drives.


New hardware is quite different. 2x10-core E5-2660v3 @2.6GHz, 128GB 
DDR4-2133 RAM and 800GB Intel DC P3700 NVMe PCIe SSD. In essence, the 
dataset will fit in RAM and will be backed by exceedingly fast storage.


This new machine is very different than any we've had before so any 
current thinking on optimization would be appreciated. Do I leave 
indexes as is and evaluate which ones to drop later? Any recommendations 
on distribution and/or kernels (and kernel tuning)? PostgreSQL tuning 
starting points? Whatever comes to mind.


Thanks,
Steve



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Turn off Hyperthreading! WAS: 60 core performance with 9.3

2014-08-21 Thread Steve Crawford

On 08/21/2014 03:51 PM, Josh Berkus wrote:

On 08/21/2014 02:26 PM, Scott Marlowe wrote:

I'm running almost the exact same setup in production as a spare. It
has 4 of those CPUs, 256G RAM, and is currently set to use HT. Since
it's a spare node I might be able to do some testing on it as well.
It's running a 3.2 kernel right now. I could probably get a later
model kernel on it even.

You know about the IO performance issues with 3.2, yes?

Were those 3.2 only and since fixed or are there issues persisting in 
3.2+? The 12.04 LTS release of Ubuntu Server was 3.2 but the 14.04 is 3.13.


Cheers,
Steve



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DB sessions 100 times of DB connections

2014-07-08 Thread Steve Crawford

On 07/03/2014 06:59 PM, Huang, Suya wrote:


BTW, I'm using the pgbadger report to check for peak connections/sessions.

*From:*pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] *On Behalf Of *Huang, Suya

*Sent:* Friday, July 04, 2014 11:44 AM
*To:* pgsql-performance@postgresql.org
*Subject:* [PERFORM] DB sessions 100 times of DB connections

Hi,

We've experienced a DB issue yesterday and after checked the log found 
that the peak sessions is 3000 while the peak DB connections is only 
around 30. The application is having problem of pulling data but no 
warnings in DB log as it doesn't exceed max_connections.


How could this happen? How does sessions/connections work in Postgres?


As handy as pgbadger is, I have found that its max-connections values 
don't pass the sniff test as it generally shows peak values that 
exceed the configured number of connections. I haven't dug in to find 
out why but could conjecture that the fact that log entries are 
generally truncated to the nearest second could cause this sort of thing.


Unexpected connection buildup is often a side-effect of something else 
like a large resource-intensive query, a query holding locks that 
prevent the other connections' queries from completing or a variety of 
other things.


If you are looking to solve/prevent the undescribed issue, please 
provide more detail.


-Steve



Re: [PERFORM] Regarding Hardware Tuning

2013-12-19 Thread Steve Crawford

On 12/18/2013 12:12 PM, prashant Pandey wrote:
Could you tell me each and every hardware parameters and OS parameters 
the performance depends on.
I need the complete list of all the required parameters and how to 
extract them on Linux through system calls and files.

Please it will be highly great full of you to do so.
Thank you and regards.


This is not even possible given the variety of hardware, OS variants, 
work-loads, database sizes, etc. The answer is really all of them. 
Those who specialize in tuning constantly run test cases to tease out 
clues to performance tuning and even they most likely couldn't answer this.


The closest you are likely to come is to read and reread PostgreSQL 
High Performance which is an invaluable resource.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance of complicated query

2013-05-28 Thread Steve Crawford

On 05/23/2013 05:21 PM, Jonathan Morra wrote:
Sorry for the messy query, I'm very new to writing these complex 
queries.  I'll try and make it easier to read by using WITH clauses. 
 However, just to clarify, the WITH clauses only increase readability 
and not performance in any way, right?


It depends. The planner is a tricky beast and sometimes rewriting a 
seeming identical query will result in a much more (or less) efficient 
plan. A classic case was the difference between where foo in (select 
bar from...)... vs. where exists (select 1 from bar where...) In an 
ideal world the planner would figure out that both are the same and 
optimize accordingly but there was a point where one was typically more 
efficient then it switched to the other being better for the planner. I 
don't recall the current state.


Casting can be important - sometimes the planner needs a nudge to use 
an index on, say, a varchar column being compared to, perhaps, a text 
value or column in which case casting to the exact data-type being 
indexed can be a big win.


Cheers,
Steve



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance of complicated query

2013-05-23 Thread Steve Crawford

On 05/23/2013 10:19 AM, Jonathan Morra wrote:
I am fairly new to squeezing performance out of Postgres, but I hope 
this mailing list can help me.  I have read the instructions found at 
http://wiki.postgresql.org/wiki/Slow_Query_Questions and have tried to 
abide by them the best that I can.  I am running PostgreSQL 9.1.7, 
compiled by Visual C++ build 1500, 64-bit on an x64 Windows 7 
Professional Service Pack 1 machine with 8 GB of RAM.


I'm not sure under what constraints you are operating but you will find 
most people on the list will recommend running live systems on 
Linux/Unix for a variety of reasons.



CREATE TABLE reads
...
ALTER TABLE reads
  OWNER TO postgres;


To avoid future grief you should set up a user (see CREATE ROLE...) for 
your database that is not the cluster superuser (postgres). I assume you 
set up a database (see CREATE DATABASE...) for your app. The base 
databases (postgres, template*) should be used for administrative 
purposes only.




...
Ultimately what I want to do is to find a sum of values for each 
patient.  The scenario is that each patient is assigned a device and 
they get incremental values on their device.  Since these values are 
incremental if a patient never switches devices, the reported value 
should be the last value for a patient.  However, if a patient 
switches devices then the reported value should be the sum of the last 
value for each device that the patient was assigned.


I'm afraid I'm a bit confused about what you are after due to switching 
between sum and last.


It sounds like a patient is issued a device which takes a number of 
readings. Do you want the sum of those readings for a given patient 
across all devices they have been issued, the sum of readings for a 
specific device, the most recent reading for a specific patient 
regardless of which device was in use for that reading, or the sum of 
the most recent readings on each device issued to a specific patient?


Are you looking to generate a report across all patients/devices or 
lookup information on a specific patient or device?


Cheers,
Steve





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance of complicated query

2013-05-23 Thread Steve Crawford

On 05/23/2013 10:57 AM, Jonathan Morra wrote:
Ultimately I'm going to deploy this to Heroku on a Linux machine (my 
tests have so far indicated that Heroku is MUCH slower than my 
machine), but I wanted to get it fast on my local machine first.  I 
agree with your role partitioning, however, this is only a dev machine.


For the sum vs. last, the idea is that each patient is issued a device 
and reads are recorded.  The nature of the reads are that they are 
incremental, so if a patient never changes devices there is no need 
for a sum.  However, patients will be changing devices, and the 
patient_device table records when each patient had a given device. 
 What I want to sum up is the total value for a patient regardless of 
how many times they changed devices


If the reads are always incremented - that is the read you want is 
always the largest read - then something along these lines might work 
well and be more readable (untested code);


-- distill out max value for each device
with device_maxreads as (
select
device_id,
max(value) as max_read
from
reads
group by
device_id)

-- then sum into a totals for each patient
patient_value as (
select
p.patient_id,
sum(max_read) patient_value
from
 device_maxreads d
 join patient_devices p on p.device_id = d.device_id
group by
p.patient_id
)

select
...whatever...
from
...your tables.
join patient_value p on p.patient_id = ...
;


If the values increment and decrement or patients are issued devices at 
overlapping times (i.e. using two devices at one time) then the query 
gets more complicated but with... is still a likely usable construct.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance database for backup/restore

2013-05-21 Thread Steve Crawford

On 05/21/2013 06:18 AM, Jeison Bedoya wrote:
Hi people, i have a database with 400GB running in a server with 128Gb 
RAM, and 32 cores, and storage over SAN with fiberchannel, the problem 
is when i go to do a backup whit pg_dumpall take a lot of 5 hours, 
next i do a restore and take a lot of 17 hours, that is a normal time 
for that process in that machine? or i can do something to optimize 
the process of backup/restore.
It would help to know what you wish to solve. I.e. setting up a test/dev 
server, testing disaster-recovery, deploying to a new server, etc. Also, 
are you dumping to a file then restoring from a file or dumping to a 
pipe into the restore?


If you use the custom format in pg_dump *and* are dumping to a file 
*and* restoring via pg_restore, you can set the -j flag to somewhat 
fewer than the number of cores (though at 128 cores I can't say where 
the sweet spot might be) to allow pg_restore to run things like index 
recreation in parallel to help your restore speed.


You can also *temporarily* disable fsync while rebuilding the database - 
just be sure to turn it back on afterward.


Copying the files is not the recommended method for backups but may work 
for certain cases. One is when you can shut down the database so the 
whole directory is quiescent while you copy the files. Also, depending 
on your SAN features, you *might* be able to do a snapshot of the 
running PostgreSQL data directory and use that.






Postgres version 9.2.2 ...

...has a nasty security issue. Upgrade. Now.

Cheers,
Steve



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-13 Thread Steve Crawford

On 03/13/2013 09:15 AM, John Lister wrote:

On 13/03/2013 15:50, Greg Jaskiewicz wrote:
SSDs have much shorter life then spinning drives, so what do you do 
when one inevitably fails in your system ?
Define much shorter? I accept they have a limited no of writes, but 
that depends on load. You can actively monitor the drives health 
level...


What concerns me more than wear is this:

InfoWorld Article:
http://www.infoworld.com/t/solid-state-drives/test-your-ssds-or-risk-massive-data-loss-researchers-warn-213715

Referenced research paper:
https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault

Kind of messes with the D in ACID.

Cheers,
Steve



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow concurrent processing

2013-03-12 Thread Steve Crawford

On 03/11/2013 08:55 PM, Misa Simic wrote:

Hi all,

We have one table with list of records for processing...

We loop trough that table and call one long runing function:

 do_the_math_for_record(record_id)...snip...

but - if replace do_the_math_and_save_results with pg_sleep(1); To 
simulate long running function so processing each row - last 1 sec...


Sequential processing last as expected 300 seconds!

Concurrent processing last faster with higher number of 
concurrent connections - about 30 seconds with 30 connections! (much 
faster - and expected...)


however, if we return our: do_the_math_and_save_results - we can't get 
better results in concurrent processing...


Sleep will not have any significant impact on CPU, memory or disk use 
and thus is not a simulation of actual processing.


All you have really shown us so far is a black box. Please provide an 
overview of your schemas and the type of processing you are attempting 
on them.


Cheers,
Steve



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow concurrent processing

2013-03-12 Thread Steve Crawford

On 03/12/2013 08:06 AM, Misa Simic wrote:

Thanks Steve

Well, the full story is too complex - but point was - whatever 
blackbox does - it last 0.5 to 2secs per 1 processed record (maybe I 
was wrong but I thought the reason why it takes the time how much it 
needs to actually do the task -CPU/IO/memory whatever is not that 
important) - so I really don't see difference between: call web 
service, insert row in the table (takes 3 secs) and sleep 3 seconds - 
insert result in the table...


if we do above task for two things sequential - it will last 6 
secs...but if we do it concurentelly - it should last 3 secs... (in 
theory :) )


Not at all - even in theory. Sleep involves little, if any, contention 
for resources. Real processing does. So if a process requires 100% of 
available CPU then one process gets it all while many running 
simultaneously will have to share the available CPU resource and thus 
each will take longer to complete. Or, if you prefer, think of a file 
download. If it takes an hour to download a 1GB file it doesn't mean 
that you can download two 1GB files concurrently in one hour even if 
simulating the process by a sleep(3600) suggests it is possible.


I should note, however, that depending on the resource that is limiting 
your speed there is often room for optimization through simultaneous 
processing - especially when processes are CPU bound. Since PostgreSQL 
associates each back-end with one CPU *core*, you can have a situation 
where one core is spinning and the others are more-or-less idle. In 
those cases you may see an improvement by increasing the number of 
simultaneous processes to somewhere shy of the number of cores.




I was guessed somewhere is lock - but wasn't clear where/why when 
there are no updates - just inserts...


But I haven't know that during INSERT is done row lock on refferenced 
tables as well - from FK columns...


So I guess now it is cause of the problem...

We will see how it goes with insert into unlogged tables with no FK...



It will almost certainly go faster as you have eliminated integrity and 
data-safety. This may be acceptable to you (non-real-time crunching of 
data that can be reloaded from external sources or temporary processing 
that is ultimately written back to durable storage) but it doesn't mean 
you have identified the actual cause.


One thing you didn't state. Is all this processing taking place in 
PostgreSQL? (i.e. update foo set bar = do_the_math(baz, zap, boom)) 
where do_the_math is a PL/pgSQL, PL/Python, ...  or are external 
processes involved?


Cheers,
Steve



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query with limit goes from few ms to hours

2012-10-23 Thread Steve Crawford

On 10/23/2012 11:33 AM, Kevin Grittner wrote:

henk de wit wrote:


Well, what do you know! That did work indeed. Immediately after the
ANALYZE on that parent table (taking only a few seconds) a fast
plan was created and the query executed in ms again. Silly me, I
should have tried that earlier.

Of course, if your autovacuum settings are aggressive enough, you
should gernerally not need to run ANALYZE explicitly. You should
double-check that autovacuum is turned on and configured at least as
aggressively as the default settings, or you will probably get little
surprises like this when you least expect them.


The exception I'd make to Kevin's good advice is for cases when a 
process makes substantial statistics-altering changes to your data (bulk 
insert/delete/update) immediately followed by a query against the 
updated table(s). In those cases there is a good possibility that the 
statistics will not have been automatically updated before the 
subsequent query is planned so an explicit ANALYZE between the update 
and the query can be of value.


Cheers,
Steve



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Scaling 10 million records in PostgreSQL table

2012-10-08 Thread Steve Crawford

On 10/08/2012 08:26 AM, Navaneethan R wrote:

Hi all,

   I have 10 million records in my postgres table.I am running the database 
in amazon ec2 medium instance. I need to access the last week data from the 
table.
It takes huge time to process the simple query.So, i throws time out exception 
error.

query is :
  select count(*) from dealer_vehicle_details where modified_on between 
'2012-10-01' and '2012-10-08' and dealer_id=270001;

After a lot of time it responds 1184 as count

what are the ways i have to follow to increase the performance of this query?
  
The insertion also going parallel since the daily realtime updation.


what could be the reason exactly for this lacking performace?


What version of PostgreSQL? You can use select version(); and note 
that 9.2 has index-only scans which can result in a substantial 
performance boost for queries of this type.


What is the structure of your table? You can use \d+ 
dealer_vehicle_details in psql.


Have you tuned PostgreSQL in any way? If so, what?

Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Two identical systems, radically different performance

2012-10-08 Thread Steve Crawford

On 10/08/2012 02:45 PM, Craig James wrote:
This is driving me crazy.  A new server, virtually identical to an old 
one, has 50% of the performance with pgbench.  I've checked everything 
I can think of.


The setups (call the servers old and new):

old: 2 x 4-core Intel Xeon E5620
new: 4 x 4-core Intel Xeon E5606

both:

  memory: 12 GB DDR EC
  Disks: 12x500GB disks (Western Digital 7200RPM SATA)
2 disks, RAID1: OS (ext4) and postgres xlog (ext2)
8 disks, RAID10: $PGDATA
Exact same model of disk, same on-board cache, same RAID-card RAM size, 
same RAID strip-size, etc.??


Cheers,
Steve



Re: [PERFORM] [ADMIN] Messed up time zones

2012-08-03 Thread Steve Crawford

On 08/03/2012 10:21 AM, Laszlo Nagy wrote:


All the above are the exact same point in time merely stated as 
relevant to each location. Note that given a timestamp with time zone 
and a zone, PostgreSQL returns a timestamp without time zone (you 
know the zone since you specified it). 

Yes, I know the zone. But I don't know the offset from UTC.

Example:

template1= set timezone to 'UTC';
SET
template1= select ('2011-10-30 01:00:00'::timestamptz) at time zone 
'Europe/Budapest';

  timezone
-
 2011-10-30 02:00:00-- Is it winter or summer time?
(1 row)

template1= select ('2011-10-30 00:00:00'::timestamptz) at time zone 
'Europe/Budapest';

  timezone
-
 2011-10-30 02:00:00-- Is it winter or summer time? What is the 
offset from UTC here? Can you tell me when it was in UTC?

(1 row)

template1=



I can not from the given information. Can you? The given information is 
ambiguous as are all times during the hour of fall-back everywhere. That 
leaves developers with a choice: choose an interpretation or throw an 
error. PostgreSQL chooses to use an interpretation.


It would be nice if there were a specification as to how such ambiguous 
data should be interpreted. Perhaps someone can point me to one and to 
any relevant documentation detailing how PostgreSQL handles such data. 
As it is, you need to be aware of how each part of your system deals 
with such. For example (using my local time zone) using the date command 
on Linux I see that

date -d '2012-11-04 0130'
returns
Sun Nov  4 01:30:00 PDT 2012 (Still in Daylight Saving Time)

But given the same input, PostgreSQL interprets it as standard time 
(offset -08):

select '2012-11-04 0130'::timestamptz;
  timestamptz

 2012-11-04 01:30:00-08


What is more:

template1= select (('2011-10-30 00:00:00'::timestamptz) at time zone 
'Europe/Budapest') is distinct from (('2011-10-30 
01:00:00'::timestamptz) at time zone 'Europe/Budapest');

 ?column?
--
 f
(1 row)

template1=

Yeah, we know what time zone it is in, but we don't know when it was, 
thanks a lot. :-( It would be unambiguous to store the UTC offset 
along with the value. But it is not how it was implemented.



So you took two distinct points in time, threw away some critical 
information, and are surprised why they are now equal? Then don't do 
that. It's the equivalent of being surprised that www.microsoft.com is 
the same as www.apple.com when comparing them on the short hostname 
only. If you want to know if two points in time differ, just compare them.


Spending a couple hours reading 
http://www.postgresql.org/docs/current/static/datatype-datetime.html 
will be time well spent.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Steve Crawford

On 07/03/2012 08:22 AM, Stanislaw Pankevich wrote:


 PostgreSQL db, 30 tables with number of rows  100 (not huge) - 
the fastest way to clean each non-empty table and reset unique 
identifier column of empty ones 


I wonder, what is the fastest way to accomplish this kind of task in 
PostgreSQL. I am interested in the fastest solutions ever possible.


It would help if we really understood your use-case. If you want to 
fully reset your database to a known starting state for test runs, why 
not just have a base database initialized exactly as you wish, say 
test_base, then just drop your test database and create the new 
database from your template:

drop database test;
create database test template test_base;

This should be very fast but it won't allow you to exclude individual 
tables.


Are you interested in absolute fastest as a mind-game or is there a 
specific use requirement, i.e. how fast is fast enough? This is the 
basic starting point for tuning, hardware selection, etc.


Truncate should be extremely fast but on tables that are as tiny as 
yours the difference may not be visible to an end-user. I just tried a 
delete from to empty a 10,000 record table and it took 14 milliseconds 
so you could do your maximum of 100 tables each containing 10-times your 
max number of records in less than two seconds.


Regardless of the method you choose, you need to be sure that nobody is 
accessing the database when you reset it. The drop/create database 
method will, of course, require and enforce that. Truncate requires an 
exclusive lock so it may appear to be very slow if it is waiting to get 
that lock. And even if you don't have locking issues, your reluctance to 
wrap your reset code in transactions means that a client could be 
updating some table or tables whenever the reset script isn't actively 
working on that same table leading to unexplained weird test results.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pgbouncer - massive overhead?

2012-06-20 Thread Steve Crawford

On 06/19/2012 09:00 AM, Strange, John W wrote:

Given a baseline postgresql.conf config and a couple DL580 40 core/256GB memory 
I noticed a large over head for pgbouncer, has  anyone seen this before?


$ pgbench -h `hostname -i` -j 32 -p 4320 -U asgprod -s 500 -c 32 -S -T 60 
pgbench_500
Scale option ignored, using pgbench_branches table count = 500
starting vacuum...end.
transaction type: SELECT only
scaling factor: 500
query mode: simple
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 1743073
tps = 29049.88 (including connections establishing)
tps = 29050.308194 (excluding connections establishing)

$ pgbench -h `hostname -i` -j 32 -p 4310 -U asgprod -s 500 -c 32 -S -T 60 
pgbench_500
Scale option ignored, using pgbench_branches table count = 500
starting vacuum...end.
transaction type: SELECT only
scaling factor: 500
query mode: simple
number of clients: 32
number of threads: 32
duration: 60 s
number of transactions actually processed: 8692204
tps = 144857.505107 (including connections establishing)
tps = 144880.181341 (excluding connections establishing)

processor   : 39
vendor_id   : GenuineIntel
cpu family  : 6
model   : 47
model name  :Intel(R) Xeon(R) CPU E7- 4860  @ 2.27GHz


I'm very dubious that the stats are meaningful as run. Were the above 
stats generated on consecutive runs on the same machine or was the test 
database fully returned to baseline between runs and the machine 
restarted to clear cache?


I doubt anyone here would trust the results of a 60-second pgbench run - 
especially a select-only test on a server that will likely end up with 
virtually everything ultimately in cache. Make sure each run is started 
from the same state and run for 30-60 minutes.


Still, you *are* adding a layer between the client and the server. 
Running the simplest of read-only queries against a fully-cached 
database on a fast many-core machine is likely to emphasize any latency 
introduced by pgbouncer. But it's also not a use-case for which 
pgbouncer is intended. If you were to add -C so each query required a 
new client connection a different picture would emerge. Same thing if 
you had 2000 client connections of which only a handful were running 
queries at any moment.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High load average in 64-core server , no I/O wait and CPU is idle

2012-05-24 Thread Steve Crawford

On 05/24/2012 05:58 AM, Rajesh Kumar. Mallah wrote:

Dear Andy ,

Following the discussion on load average we are now  investigating on some
other parts of the stack (other than db).

Essentially we are bumping up the limits (on appserver) so that more requests
goes to the DB server.

Which leads to the question: what, other than the db, runs on this machine?

Cheers,
Steve

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] auto-vacuum vs. full table update

2012-04-26 Thread Steve Crawford

On 04/26/2012 12:49 PM, Craig James wrote:

An update to our system means I'm going to be rewriting every row of
some large tables (20 million rows by 15 columns).  In a situation
like this, can auto-vacuum take care of it, or should I plan on
vacuum-full/reindex to clean up?

If you want to reclaim the space, a vacuum-full/reindex will do it. But 
you are probably better off using cluster. Way faster and you get new 
indexes as a by-product. Both methods require an exclusive lock on the 
table. If you can't afford the downtime, check out pg_reorg 
(http://pgfoundry.org/projects/reorg/)


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scale up (postgresql vs mssql)

2012-04-13 Thread Steve Crawford

On 04/13/2012 08:04 AM, Claudio Freire wrote:

...You might want to try unlogged temp tables, which more
closely resemble mssql temp tables.

If they are permanent tables used for temporary storage then making them 
unlogged may be beneficial. But actual temporary tables *are* unlogged 
and attempting to create an unlogged temporary table will raise an error.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scale up (postgresql vs mssql)

2012-04-13 Thread Steve Crawford

On 04/13/2012 09:43 AM, Claudio Freire wrote:

On Fri, Apr 13, 2012 at 1:36 PM, Steve Crawford
scrawf...@pinpointresearch.com  wrote:

If they are permanent tables used for temporary storage then making them
unlogged may be beneficial. But actual temporary tables *are* unlogged and
attempting to create an unlogged temporary table will raise an error.

Interesting, yes, I was wondering why PG didn't make temp tables
unlogged by default.

Then, I guess, the docs[0] have to mention it. Especially due to the
error condition. Right?

[0] http://www.postgresql.org/docs/9.1/static/sql-createtable.html

Well, the fact that temporary and unlogged cannot be simultaneously 
specified *is* documented:


CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF 
NOT EXISTS ] table_name


But it would probably be worth adding a note under the description of 
temporary tables that they are, in fact, unlogged.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Linux machine aggressively clearing cache

2012-04-12 Thread Steve Crawford

On 03/30/2012 05:51 PM, Josh Berkus wrote:


So this turned out to be a Linux kernel issue.  Will document it on
www.databasesoup.com.
Anytime soon? About to build two PostgreSQL servers and wondering if you 
have uncovered a kernel version or similar issue to avoid.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice sought : new database server

2012-03-06 Thread Steve Crawford

On 03/04/2012 03:50 AM, Michael Friedl wrote:

Hey!

On 04.03.2012 10:58, Rory Campbell-Lange wrote:

 1U chassis with 8 2.5 disk bays
 2x Intel Xeon E5630 Quad-Core / 4x 2.53GHz / 12MB cache
 8 channel Areca ARC-1880i (PCI Express x8 card)
   presumably with BBU (can't see it listed at present)
 2 x 300GB SAS  2.5 disks for operating system
   (Possibly also 300GB SATA VelociRaptor/10K RPM/32MB cache
   RAID 1
 4 x 300GB SAS  2.5 storage disks
   RAID 10
 48.0GB DDR3 1333MHz registered ECC (12x 4.0GB modules)


Sorry, no answer for your question and a bit offtopic.


Why do you take SAS disks for the OS and not much cheaper SATA ones?





Here's Intel's (very general) take. Your OS disks may not justify SAS on 
performance alone but other aspects may sway you.

http://www.intel.com/support/motherboards/server/sb/CS-031831.htm

Cheers,
Steve

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-24 Thread Steve Crawford

On 02/23/2012 12:39 AM, Reuven M. Lerner wrote:

Hi, everyone. ...
...at one installation, we're quickly running out of disk space.  The 
database is already taking up about 200 GB of space, and is growing by 
1 GB or so a day.


I've been following the discussion of approaches and tuning for bulk 
deletes and suddenly wondered if you have checked a couple other basics.


Do you know the source of the increases in DB size? Is it due strictly 
to inserted data or are there lots of updates as well?


Is autovacuum running properly?

Could you, due to bulk deletes and other issues, be suffering from 
table- or index-bloat? Heavily bloated tables/indexes will exacerbate 
both your disk-usage and performance problems.


If possible you might try clustering your tables and see what happens to 
disk usage and bulk-delete performance. Clusters are generally 
reasonably fast - way faster than VACUUM FULL, though they could still 
take a while on your very large tables.


As a bonus, cluster gives you shiny, new non-bloated indexes. They do 
require an exclusive lock and they do require sufficient disk-space to 
build the new, albeit smaller, table/indexes so it may not be an option 
if you are short on disk-space.  You may be able to start by clustering 
your smaller tables and move toward the larger ones as you free 
disk-space. Be sure to run ANALYZE on any table that you have CLUSTERed.


You might find it useful to make CLUSTER part of your regular maintenance.

Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-23 Thread Steve Crawford

On 02/23/2012 12:39 AM, Reuven M. Lerner wrote:

Hi, everyone...
This is basically what I'm trying to execute:

DELETE FROM  B
WHERE r_id IN (SELECT R.id
 FROM R, B
WHERE r.end_date  (NOW() - (interval '1 day' * 30))
  AND r.id = b.r_id


I don't recall which versions like which approach, but have you tried 
...WHERE EXISTS (SELECT... instead of WHERE IN? Depending on the version 
of PostgreSQL, one or the other may yield a superior result.



(2) I tried to grab the rows that *do* interest me, put them into a 
temporary table, TRUNCATE the existing table, and then copy the rows 
back.   I only tested that with a 1 GB subset of the data, but that 
took longer than other options.




Was the 1GB subset the part you were keeping or the part you were 
deleting? Which part was slow (creating the temp table or copying it back)?


Try running EXPLAIN on the SELECT query that creates the temporary table 
and try to optimize that. Also, when copying the data back, you are 
probably having to deal with index and foreign keys maintenance. It will 
probably be faster to drop those, copy the data back then recreate them.


I know you are a *nix-guy in a Windows org so your options are limited, 
but word-on-the-street is that for high-performance production use, 
install PostgreSQL on *nix.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] set autovacuum=off

2012-02-23 Thread Steve Crawford

On 02/23/2012 09:35 AM, Alessandro Gagliardi wrote:
I should have been more clear. I virtually never delete or do updates, 
but I insert /a lot/. So the table does change quite a bit, but only 
in one direction.


I was unable to disable autovacuum universally (due to the 
cant_change_runtime_param error) but I was able to disable it on 
individual tables. Still, I know this is heavy handed and sub-optimal. 
I tried set autovacuum_naptime='6min' but got the same 55P02 error. 
Should/can I set that per table?


I did look at autovacuum_vacuum_threshold 
and autovacuum_vacuum_scale_factor but couldn't make sense out of 
them. (Besides, I'd probably get the same 55P02 error if I tried to 
change them.)


See:
http://www.postgresql.org/docs/current/static/runtime-config-autovacuum.html

The documentation has information like This parameter can only be set 
in the postgresql.conf file or on the server command line. that will 
tell you in advance which settings will fail when you attempt to set 
them through SQL statements.


But autovacuum is pretty smart about not vacuuming tables until 
reasonably necessary. And beware that autovacuum is also controlling 
when to analyze a table. Mass inserts are probably changing the 
characteristics of your table such that it needs to be analyzed to allow 
the planner to properly optimize your queries.


Have you identified that vacuum is actually causing a problem? If not, 
I'd leave it alone. The system tables have a lot of information on table 
vacuuming and analyzing:


select
relname,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze,
vacuum_count,
autovacuum_count,
analyze_count,
autoanalyze_count
from
   pg_stat_user_tables;

Cheers,
Steve



Re: [PERFORM] set autovacuum=off

2012-02-23 Thread Steve Crawford

On 02/23/2012 10:38 AM, Alessandro Gagliardi wrote:
On Thu, Feb 23, 2012 at 10:01 AM, Steve Crawford 
scrawf...@pinpointresearch.com 
mailto:scrawf...@pinpointresearch.com wrote:


The documentation has information like This parameter can only be
set in the postgresql.conf file or on the server command line.
that will tell you in advance which settings will fail when you
attempt to set them through SQL statements.

Ah. I missed that. Sorry for asking stupid questions.
No problem and not stupid. With the manual running to hundreds of pages 
plus information on wikis and mailing-list histories spanning hundreds 
of thousands of messages sometimes knowing where to look is 90% of the 
battle.


But autovacuum is pretty smart about not vacuuming tables until
reasonably necessary. And beware that autovacuum is also
controlling when to analyze a table. Mass inserts are probably
changing the characteristics of your table such that it needs to
be analyzed to allow the planner to properly optimize your queries.

Okay, that makes more sense to me; because the stats would be changing 
quickly and so while vacuuming may not be necessary, analyzing would 
be. At the same time, I can't afford to analyze if it's causing my 
inserts to take over 50 ms. Something else I should add: if my selects 
are slow, that's annoying; but if my inserts are slow, that could 
be disastrous...


You need to rethink things a bit. Databases can fail in all sorts of 
ways and can slow down during bursts of activity, data dumps, etc. You 
may need to investigate some form of intermediate buffering.


...Apparently the last four columns don't exist in my database. As for 
the first four, that is somewhat illuminating
Then you are not running a current version of PostgreSQL so the first 
step to performance enhancement is to upgrade. (As a general rule - 
there are occasionally specific cases where performance decreases.)
So if it's not auto-vacuuming that's making my inserts so slow, what 
is it? I'm batching my inserts (that didn't seem to help at all 
actually, but maybe cause I had already turned off synchronous_commit 
anyway).
How are you batching them? Into a temp table that is copied to the main 
table? As a bunch of insert statements within a single connection (saves 
probably considerable time due to eliminating multiple connection 
setups)? With one PREPARE and multiple EXECUTE (to save repeated 
planning time - I'm not sure this will buy you much for simple inserts, 
though)? With COPY (much faster as many records are inserted in a single 
statement but if one fails, all fail)?


And what is the 50ms limit? Is that an average? Since you are batching, 
it doesn't sound like you need every statement to complete in 50ms. 
There is always a tradeoff between overall maximum throughput and 
maximum allowed latency.


I've gotten rid of a bunch of indices (especially those with low 
cardinality–that I did around the same time as disabling auto-vacuum, 
so that could account for the coincidental speed up).
Yes, inserts require the indexes to be updated so they can slow down 
inserts and updates.


I'm not sure what else I could be doing wrong. It's definitely better 
than it was a few days ago, but I still see LOG: duration: 77.315 ms 
statement: COMMIT every minute or two.


That's a huge topic ranging from hardware (CPU speed, RAM, 
spindle-count, disk-type, battery-backed write caching), OS (you *are* 
running on some sort of *nix, right?), OS tuning, PG tuning, etc. 
Fortunately the biggest benefit comes from some basic tuning.


I recommend you abandon this thread as it presupposes a now seemingly 
incorrect cause of the problem and start a new one titled something like 
Tuning for high insert rate where you describe the problem you want to 
solve. See http://wiki.postgresql.org/wiki/Guide_to_reporting_problems 
for a good guide to the information that will be helpful in diagnosis.


Cheers,
Steve



Re: [PERFORM] set autovacuum=off

2012-02-23 Thread Steve Crawford

On 02/23/2012 01:07 PM, Alessandro Gagliardi wrote:

The second one (a bunch of insert statements within a single 
connection). As I mentioned above, I was going to try the temp table 
thing, but that wasn't fast enough. COPY might be my next attempt.

insert into...;
insert into...;
insert into...;
... is really (ignoring statement preparation time):
begin;
insert into...;
commit;
begin;
insert into...;
commit;
begin;
insert into...;
commit;

It's possible that you might get a nice boost by wrapping the inserts 
into a transaction:

begin;
insert into...;
insert into...;
insert into...;
...
commit;

This only requires all that disk-intensive stuff that protects your data 
once at the end instead of 1000 times for you batch of 1000.


COPY is even better. I just ran a quick test by restoring a table on my 
desktop hacking db (untuned, few years old PC, single SATA disk, modest 
RAM and lots of resource competition). The 22+ million rows restored in 
282 seconds which is a rate somewhat north of 78,000 records/second or 
about 0.13ms/record.


You may want to eliminate that trigger, which only seems to exist to 
silence errors from uniqueness violations, and copy the incoming data 
into a temp table then move the data with a variant of:
INSERT INTO main_table (SELECT ... FROM incoming_table WHERE NOT EXISTS 
((SELECT 1 from main_table WHERE ...))


Cheers,
Steve

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why so slow?

2012-02-17 Thread Steve Crawford

On 02/17/2012 10:34 AM, Alessandro Gagliardi wrote:

Comparing
SELECT DISTINCT(user_id) FROM blocks JOIN seen_its USING (user_id) 
WHERE seen_its.created BETWEEN (now()::date - interval '8 
days')::timestamp AND now()::date::timestamp

to
SELECT DISTINCT(user_id) FROM seen_its WHERE created BETWEEN 
(now()::date - interval '8 days')::timestamp AND now()::date::timestamp

the difference is 100x.
...
Though I could figure it out, it would be helpful to actually specify 
which query is faster and to post the explain of *both* queries.


But in general, it is not terribly unusual to find that rewriting a 
query can lead the planner to generate a superior plan. Trying and 
testing different ways of writing a query is a standard tuning technique.


There are also version-specific issues with some versions of PostgreSQL 
preferring ...where foo in (select... and others preferring ...where 
exists (select...


If you are planning to ramp up to high volumes it is also *very* 
important to test and tune using the size of database you plan to have 
on the hardware you will use in production. You cannot extrapolate from 
a dev database on an i486 (?!?) machine to a production server with more 
spindles, different RAID setup, different CPU, more cores, vastly more 
memory, etc.


In the case of your queries, the second one eliminates a join and gives 
the planner an easy way to optimize using the available indexes so I'm 
not surprised it's faster.


Note: I am guessing that your seen_its table just grows and grows but is 
rarely, if ever, modified. If it is basically a log-type table it will 
be a prime candidate for partitioning on date and queries like this will 
only need to access a couple relatively small child tables instead of 
one massive one.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] table size is bigger than expected

2011-08-04 Thread Steve Crawford

On 08/04/2011 11:56 AM, Jian Shi wrote:


Hey,

  I'm a new user of PostgreSQL. I found one of my tables is taking 
unexpectedly large space:...


I did vaccum, reindex, the size is still the same. Is there anything 
else that I can do?



Did you try CLUSTER? A basic vacuum only identifies space as reusable, 
it doesn't actually shrink on-disk size.


If you have workloads that update or delete a small number of tuples per 
transaction, the autovacuum process should keep things reasonably under 
control. But if you run transactions that do bulk updates or deletes, 
you may need to intervene. The CLUSTER statement will completely rewrite 
and reindex your table (and will physically reorder the table based on 
the selected index). Note: CLUSTER requires an exclusive lock on the table.


Cheers,
Steve



Re: [PERFORM] Speeding up loops in pl/pgsql function

2011-05-26 Thread Steve Crawford

On 05/26/2011 05:36 AM, Merlin Moncure wrote:

...
got it:
select decode(regexp_replace('141142143', '([0-9][0-9][0-9])',
$q$\\\1$q$ , 'g'), 'escape');
  decode

  abc
(1 row)

merlin


Nice. A word of warning, in 9.0 this returns a hex string:

select decode(regexp_replace('141142143', '([0-9][0-9][0-9])', 
$q$\\\1$q$ , 'g'), 'escape');

  decode
--
 \x616263

See http://www.postgresql.org/docs/9.0/static/release-9-0.html:

E.5.2.3. Data Types
bytea output now appears in hex format by default (Peter Eisentraut)
The server parameter bytea_output can be used to select the 
traditional output format if needed for compatibility.


Another wrinkle, the function I wrote sort of ignored the bytea issue by 
using text. But text is subject to character-encoding (for both good and 
bad) while bytea is not so the ultimate solution will depend on whether 
the input string is the octal representation of an un-encoded sequence 
of bytes or represents a string of ASCII/UTF-8/whatever... encoded text.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Speeding up loops in pl/pgsql function

2011-05-25 Thread Steve Crawford

On 05/25/2011 11:45 AM, Reuven M. Lerner wrote:

Hi, Alex.  You wrote:

Have you tried something like:
SELECT  encode(regexp_replace('141142143', '(\d{3})', '\\\1',
'g')::bytea, 'escape');
Hmm, forgot about regexp_replace.  It might do the trick, but without 
a full-blown eval that I can run on the replacement side, it'll be a 
bit more challenging.  But that's a good direction to consider, for sure.


The function given didn't work exactly as written for me but it is on 
the right track. See if this works for you (input validation is left as 
an exercise for the reader...:)):


create or replace function octal_string_to_text(someoctal text) returns 
text as $$

declare
binstring text;
begin
execute 'select E''' || regexp_replace($1, E'(\\d{3})', E'\\1', 
'g') ||  into binstring;

return binstring;
end
$$ language plpgsql;

Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg_dumpall affecting performance

2011-02-15 Thread Steve Crawford

On 02/15/2011 10:41 AM, Mark Mikulec wrote:

Hello,

I was under the impression that pg_dumpall didn't affect database
performance when dumping while the db is live. However I have evidence to
the contrary now - queries that are run during the pg_dumpall time take 10
to a 100 times longer to execute than normal while pg_dumpall is running.
The strange thing is that this started after my database grew by about 25%
after a large influx of data due to user load. I'm wonder if there is a
tipping
point or a config setting I need to change now that the db is larger that
is
causing all this to happen.

Don't know where that impression came from. It is true that you can 
continue to *use* your database normally while running a dump but you 
are reading the entire database and either transmitting it over the 
network or writing it to a local drive so it shouldn't be surprising 
that performance is impacted.


There are tipping points - one big one is when you move from having all 
your data in RAM to needing to read disk. And it can be a whopper. If 
all your information, through PG or OS caching is in RAM then your dumps 
may run very quickly. The moment you cross the point that things don't 
quite fit you can see a sharp decline.


Consider a least-recently-used algorithm and a very simplistic scenario. 
You read the start data. It isn't cached so you go to disk *and* you 
put those blocks into cache pushing others than you would need later out 
of cache. This continues and you potentially end up having to read 
everything from disk plus incur the overhead of checking and updating 
the cache. Meanwhile, the data you needed for your query may have been 
pushed out of cache so there is more contention for disk.


Admittedly an over-simplified example but you see the problem.

Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-10-21 Thread Steve Crawford

On 10/20/2010 09:45 PM, Scott Marlowe wrote:

On Wed, Oct 20, 2010 at 8:25 PM, Joshua D. Drakej...@commandprompt.com  wrote:
   

On Wed, 2010-10-20 at 22:13 -0400, Bruce Momjian wrote:
 

Ben Chobot wrote:
   

On Oct 7, 2010, at 4:38 PM, Steve Crawford wrote:

 

I'm weighing options for a new server. In addition to PostgreSQL, this machine 
will handle some modest Samba and Rsync load.

I will have enough RAM so the virtually all disk-read activity will be cached. 
The average PostgreSQL read activity will be modest - a mix of single-record 
and fairly large (reporting) result-sets. Writes will be modest as well but 
will come in brief (1-5 second) bursts of individual inserts. The rate of 
insert requests will hit 100-200/second for those brief bursts.

So...

Am I likely to be better off putting $$$ toward battery-backup on the RAID or 
toward adding a second RAID-set and splitting off the WAL traffic? Or something 
else?
   

A BBU is, what, $100 or so? Adding one seems a no-brainer to me.
Dedicated WAL spindles are nice and all, but they're still spinning
media. Raid card cache is wy faster, and while it's best at bursty
writes, it sounds like bursty writes are precisely what you have.
 

Totally agree!
   

BBU first, more spindles second.
 

Agreed.  note that while you can get incredible burst performance from
a battery backed cache, due to both caching and writing out of order,
once the throughput begins to saturate at the speed of the disk array,
the bbu cache is now only re-ordering really, as it will eventually
fill up faster than the disks can take the writes, and you'll settle
in at some percentage of your max tps you get for a short benchmark
run.  It's vitally important that once you put a BBU cache in place,
you run a very long running transactional test (pgbench is a simple
one to start with) that floods the io subsystem so you see what you're
average throughput is with the WAL and data store getting flooded.  I
know on my system pgbench runs of a few minutes can be 3 or 4 times
faster than runs that last for the better part of an hour.

   
Thanks for all the replies. This is what I suspected but since I can't 
just buy one of everything to try, I wanted a sanity-check before 
spending the $$$.


I am not too worried about saturating the controller cache as the 
current much lower spec machine can handle the sustained load just fine 
and the bursts are typically only 1-3 seconds long spaced a minute or 
more apart.


Cheers,
Steve




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] BBU Cache vs. spindles

2010-10-07 Thread Steve Crawford
I'm weighing options for a new server. In addition to PostgreSQL, this 
machine will handle some modest Samba and Rsync load.


I will have enough RAM so the virtually all disk-read activity will be 
cached. The average PostgreSQL read activity will be modest - a mix of 
single-record and fairly large (reporting) result-sets. Writes will be 
modest as well but will come in brief (1-5 second) bursts of individual 
inserts. The rate of insert requests will hit 100-200/second for those 
brief bursts.


So...

Am I likely to be better off putting $$$ toward battery-backup on the 
RAID or toward adding a second RAID-set and splitting off the WAL 
traffic? Or something else?


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Occasional giant spikes in CPU load

2010-04-07 Thread Steve Crawford



...Can you
correlate the spikes with anything like that?


Not that I know of.  Just regular web traffic.  On the backup server 
these events happen occasionally even when there is little or no web 
traffic, and nobody logged in doing maintenance.
What, if anything, are you logging in the PostgreSQL logs? Anything 
interesting, there?


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DB is slow until DB is reloaded

2010-01-04 Thread Steve Crawford

Madison Kelly wrote:

Hi all,

  I've got a fairly small DB...

  It slows down over time and I can't seem to find a way to get the
performance to return without doing a dump and reload of the database...


Some questions:

Is autovacuum running? This is the most likely suspect. If not, things 
will bloat and you won't be getting appropriate analyze runs. Speaking 
of which, what happens if you just run analyze?


And as long as you are dumping and reloading anyway, how about version 
upgrading for bug reduction, performance improvement, and cool new features.


Cheers,
Steve

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DB is slow until DB is reloaded

2010-01-04 Thread Steve Crawford

Madison Kelly wrote:

Steve Crawford wrote:

Madison Kelly wrote:

Hi all,

  I've got a fairly small DB...

  It slows down over time and I can't seem to find a way to get the
performance to return without doing a dump and reload of the 
database...



Yup, I even tried manually running 'VACUUM FULL' and it didn't help.
That's because VACUUM reclaims space (er, actually marks space that is 
available for reuse) while ANALYZE refreshes the statistics that the 
planner uses.



As for upgrading;

a) I am trying to find a way around the dump/reload. I am doing it as 
a last resort only.
Agreed - it is the last resort. But since you were doing it I was just 
suggesting that you could combine with a upgrade and get more benefits.

b) I want to keep the version in CentOS' repo.
Depends on reasoning. If you absolutely require a fully vanilla 
particular version of CentOS for some reason then fine. But telling 
CentOS to use the PostgreSQL Development Group pre-built releases for 
CentOS is a very easy one-time process (it's what I do on my CentOS 
machines). From memory (but read to end for warnings):


Download the setup rpm:
wget http://yum.pgsqlrpms.org/reporpms/8.4/pgdg-centos-8.4-1.noarch.rpm

Install it:
rpm -i pgdg-centos-8.4-1.noarch.rpm

Note: This does not install PostgreSQL - it just updates your repository 
list to add the repository containing PostgreSQL binaries. Now make sure 
that you get your updates from PostgreSQL, not CentOS:


Edit /etc/yum.repos.d/CentOS-Base.repo and add exclude=postgresql* to 
the  [base] and [updates] sections.


Now you can use yum as normal and you will get PostgreSQL 8.4 and 
updates thereto rather than using 8.1.


BUT!! I have only done this on new installs. I have not tried it on an 
already running machine. As always, test first on a dev machine and do 
your pre-update dump using the new version of the pg_dump utilities, not 
the old ones.


Cheers,
Steve




I'd not tried simply updating the stats via ANALYZE... I'll keep an 
eye on performance and if it starts to slip again, I will run ANALYZE 
and see if that helps. If there is a way to run ANALYZE against a 
query that I am missing, please let me know.
If you stick with 8.1x, you may want to edit postgresql.conf and change 
default_statistics_target to 100 if it is still at the previous default 
of 10. 100 is the new default setting as testing indicates that it tends 
to yield better query plans with minimal additional overhead.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DB is slow until DB is reloaded

2010-01-04 Thread Steve Crawford

Madison Kelly wrote:


  I wanted to get ahead of the problem, hence my question here. :) 
I've set this to run at night ('iwt' being the DB in question):


su postgres -c psql iwt -c \VACUUM ANALYZE VERBOSE\ 


And why not the vacuumdb command?:

su postgres -c vacuumdb --analyze --verbose iwt


But this is duct-tape and bailing-wire. You REALLY need to make sure 
that autovacuum is running - you are likely to have much better results 
with less pain.


Cheers,
Steve




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DB is slow until DB is reloaded

2010-01-04 Thread Steve Crawford

Madison Kelly wrote:


You are right, autovacuum is not running after all. From your comment, 
I am wondering if you'd recommend I turn it on or not?...



I see you are considering an upgrade but FWIW on your 8.1 instance, my 
remaining 8.1 server has been running for years with it on. Read up on 
it at:

http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM

Basically you need to turn on some stats stuff so autovacuum can 
determine when to run (in postgresql.conf):

stats_start_collector = on
stats_row_level = on

And you need to enable autovacuum (in postgresql.conf):
autovacuum = on
autovacuum_naptime = 300# time between autovacuum runs, 
in secs


Then you can tune it if you need to but at least it will be looking for 
things that are vacuumworthy every 5 minutes.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] RAID card recommendation

2009-11-25 Thread Steve Crawford

Greg Smith wrote:

Jochen Erwied wrote:

- Promise Technology Supertrak ES4650 + additional BBU
- Adaptec RAID 5405 SGL/256 SATA/SAS + additional BBU
- Adaptec RAID 5405Z SGL/512 SATA/SAS
  
I've never seen a Promise controller that had a Linux driver you would 
want to rely on under any circumstances...Easier to just buy from a 
company that has always cared about good Linux support, like 3ware.

+1

I haven't tried Promise recently, but last time I did I determined that 
they got the name because they Promise the Linux driver for your card 
will be available real-soon-now. Actually got strung along for a couple 
months before calling my supplier and telling him to swap it out for a 
3ware. The 3ware just works. I currently have a couple dozen Linux 
servers, including some PostgreSQL machines, running the 3ware cards.


Cheers,
Steve

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] database size growing continously

2009-10-29 Thread Steve Crawford

Peter Meszaros wrote:

Hi All,

I use postgresql 8.3.7 as a huge queue. There is a very simple table
with six columns and two indices, and about 6 million records are
written into it in every day continously commited every 10 seconds from
8 clients. The table stores approximately 120 million records, because a
cron job daily deletes those ones are older than 20 day. Autovacuum is
on and every settings is the factory default except some unrelated ones
(listen address, authorization). But my database is growing,
characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
or even 0!!!)...


Can you try running against 8.4.1? I believe there are a number of 
improvements that should help in your case. For one thing, the 
max_fsm_pages and max_fsm_relation knobs are gone - it happens 
automagically. I believe there are some substantial improvements in 
space reuse along with numerous improvements not directly related to 
your question.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] database size growing continously

2009-10-29 Thread Steve Crawford

Peter Meszaros wrote:

Hi All,

I use postgresql 8.3.7 as a huge queue. There is a very simple table
with six columns and two indices, and about 6 million records are
written into it in every day continously commited every 10 seconds from
8 clients. The table stores approximately 120 million records, because a
cron job daily deletes those ones are older than 20 day.
You may be an ideal candidate for table partitioning - this is 
frequently used for rotating log table maintenance.


Use a parent table and 20 child tables. Create a new child every day and 
drop the 20-day-old table. Table drops are far faster and lower-impact 
than delete-from a 120-million row table. Index-bloat is limited to 
one-day of inserts and will be eliminated in 20-days. No deletes means 
no vacuum requirement on the affected tables. Single tables are limited 
to about 6-million records. A clever backup scheme can ignore 
prior-days' static child-tables (and you could keep 
historical-data-dumps off-line for later use if desired).


Read up on it here: 
http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Steve Crawford

My additional comments:

t...@fuzzy.cz wrote:

...
For future upgrade, what is the basic steps?



  
0. Create test database - work out bugs and performance issues before 
going live.

1. create database
  
...cluster. You only need to create the individual database if the 
options you select for the dump do not create the database(s).

2. dump the data from the old database
  
...using the dump tools from the *new* version. With several cores, you 
might want to consider using the binary dump options in pg_dump if you 
want to use the new parallel restore feature in pg_restore with a 
possible dramatic increase in restore speed (benchmarks I've seen 
suggest that with 8 cores you may even see an almost 8x restore speedup 
so it's worth the effort). The manual suggests that setting --jobs to 
the number of cores on the server is a good first approximation. See the 
-Fc options on pg_dump and the --jobs option in pg_restore for details.


Cheers,
Steve



Re: [PERFORM] PostgreSQL 8.4 performance tuning questions

2009-07-30 Thread Steve Crawford

Kevin Grittner wrote:
Steve Crawford scrawf...@pinpointresearch.com wrote: 
 
  

benchmarks I've seen suggest that with 8 cores you may even see an
almost 8x restore speedup

 
I'm curious what sort of data in what environment showed that ratio.
 
  
Was going on memory from a presentation I watched. Reports on the web 
have shown anything from a 3x increase using 8 cores to other 
non-detailed reports of up to 8x improvement. If you have one big 
table, don't expect much if any improvement. If you have lots of smaller 
tables/indexes then parallel restore will probably benefit you. This is 
all based on the not-atypical assumption that your restore will be CPU 
bound. I don't think parallel restore will be much use beyond the point 
you hit IO limits.


Cheers,
Steve



Re: [PERFORM] query slow only after reboot

2009-02-09 Thread Steve Crawford

Wei Yan wrote:

Hi:

Our queries are extremely slow only after db server reboot, not after 
restart postgres db only. The difference is about 10 mins vs. 8 secs. 
Not acceptable. I have browsed around , set the postgres db parameters 
as suggested. Still the same.


Any suggestion on how to tackle the problem?


What OS, amount of RAM, DB size...?

If the queries get progressively faster as you use the system then slow 
again after a reboot, my initial guess would be that you are getting 
more and more disk-cache hits the longer you use the system.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] speeding up table creation

2008-10-14 Thread Steve Crawford

Rainer Mager wrote:


I have an interesting performance improvement need. As part of the 
automatic test suite we run in our development environment, we 
re-initialize our test database a number of times in order to ensure 
it is clean before running a test. We currently do this by dropping 
the public schema and then recreating our tables (roughly 30 tables 
total). After that we do normal inserts, etc, but never with very much 
data. My question is, what settings can we tweak to improve 
performance is this scenario? Specifically, if there was a way to tell 
Postgres to keep all operations in memory, that would probably be ideal.




What is the test part? In other words, do you start with a known initial 
database with all empty tables then run the tests or is part of the test 
itself the creation of those tables? How much data is in the initial 
database if the tables aren't empty. Creating 30 empty tables should 
take a trivial amount of time. Also, are there other schemas than public?


A couple ideas/comments:

You cannot keep the data in memory (that is, you can't disable writing 
to the disk). But since you don't care about data loss, you could turn 
off fsync in postgresql.conf. From a test perspective you should be fine 
- it will only be an issue in the event of a crash and then you can just 
restart with a fresh load. Remember, however, that any performance 
benchmarks won't translate to production use (of course they don't 
translate if you are using ramdisk anyway).


Note that the system tables are updated whenever you add/delete/modify 
tables. Make sure they are being vacuumed or your performance will 
slowly degrade.


My approach is to create a database exactly as you want it to be at the 
start of your tests (fully vacuumed and all) and then use it as a 
template to be used to create the testdb each time. Then you can just 
(with appropriate connection options) run dropdb thetestdb followed by 
createdb --template thetestdbtemplate thetestdb which is substantially 
faster than deleting and recreating tables - especially if they contain 
much data.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 2GB or not 2GB

2008-05-28 Thread Steve Crawford

Josh Berkus wrote:

Folks,

Subsequent to my presentation of the new annotated.conf at pgCon last week,...


Available online yet? At?...

Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Where do a novice do to make it run faster?

2008-04-28 Thread Steve Crawford

A B wrote:

So, it is time to improve performance, it is running to slow.
AFAIK (as a novice) there are a few general areas:

1) hardware
2) rewriting my queries and table structures
3) using more predefined queries
4) tweek parameters in the db conf files

Of these points:
1) is nothing I can do about right now, but in the future perhaps.
2) will be quite hard right now since there is more code than time.
3) almost like 2 but perhaps more do-able with the current constraints.
4) This seems to be the easiest one to start with...

So what should I do/read concerning point 4?
If you have other good suggestions  I'd be very interested in that.
  
Go back to step zero - gather information that would be helpful in 
giving advice. For starters:

- What hardware do you currently have?
- What OS and version of PG?
- How big is the database?
- What is the nature of the workload (small queries or data-mining, how 
many simultaneous clients, transaction rate, etc.)?

- Is PG sharing the machine with other workloads?

Then edit your postgresql.conf file to gather data (see 
http://www.postgresql.org/docs/8.3/interactive/monitoring-stats.html). 
With stat collection enabled, you can often find some low-hanging fruit 
like indexes that aren't used (look in pg_stat_user_indexes) - sometime 
because the query didn't case something in the where-clause correctly.


Also look at 
http://www.postgresql.org/docs/8.3/interactive/runtime-config-logging.html 
- especially the log_min_duration_statement setting to find long-running 
queries. You will probably need to try different settings and watch the 
log. Logging impacts performance so don't just set to log everything and 
forget. You need to play with it.


Don't discount step 2 - you may find you can rewrite one inefficient but 
frequent query. Or add a useful index on the server.


Cheers,
Steve




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] update 600000 rows

2007-12-14 Thread Steve Crawford

[EMAIL PROTECTED] wrote:

Hello

i have a python script to update 60 rows to one table from a csv file in my
postgres database and it takes me 5 hours to do the transaction...

  

Let's see if I guessed correctly.

Your Python script is stepping through a 600,000 row file and updating 
information in a table (of unknown rows/columns) by making 600,000 
individual updates all wrapped in a big transaction. If correct, that 
means you are doing 600,000/(3,600 * 5) = 33 queries/second. If this is 
correct, I'd first investigate simply loading the csv data into a 
temporary table, creating appropriate indexes, and running a single 
query to update your other table.



First when i run htop i see that the memory used is never more than 150 MB.
I don't understand in this case why setting shmall and shmmax kernel's
parameters to 16 GB of memory (the server has 32 GB) increase the rapidity of
the transaction a lot compared to a shmall and shmax in (only) 2 GB ?!
  
Are you saying that you did this and the performance improved or you are 
wondering if it would?



The script is run with only one transaction and pause by moment to let the time
to postgres to write data to disk.
  
This doesn't make sense. If the transaction completes successfully then 
PostgreSQL has committed the data to disk (unless you have done 
something non-standard and not recommended like turning off fsync). If 
you are adding pauses between updates, don't do that - it will only slow 
you down. If the full transaction doesn't complete, all updates will be 
thrown away anyway and if it does complete then they were committed.

If the data were writed at the end of the transaction will be the perfomance
better ? i wan't that in production data regulary writed to disk to prevent
loosinf of data but it there any interest to write temporary data in disk in a
middle of a transaction ???

  
See above. Actual disk IO is handled by the server. PostgreSQL is good 
at the D in ACID. If your transaction completes, the data has been 
written to disk. Guaranteed.


Cheers,
Steve

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware for PostgreSQL

2007-11-01 Thread Steve Crawford
Ketema wrote:
 I am trying to build a very Robust DB server that will support 1000+
 concurrent users (all ready have seen max of 237 no pooling being
 used).  I have read so many articles now that I am just saturated.  I
 have a general idea but would like feedback from others.

Describe a bit better. 1,000 users or 1,000 simultaneous connections?
Ie, do you have a front-end where someone logs on, gets a connection,
and keeps it for the duration or is it a web-type app where each request
might connect-query-disconnect? If the latter, are your connections
persistent? How many queries/second do you expect?

How complex are the queries (retrieve single record or data-mining)?
Read-only or lots of updates? Do the read-queries need to be done every
time or are they candidates for caching?

 RAM?  The more the merrier right?

Generally, true. But once you reach the point that everything can fit in
RAM, more is just wasted $$$. And, according to my reading, there are
cases where more RAM can hurt - basically if you manage to create a
situation where your queries are large enough to just flush cache so you
don't benefit from caching but are hurt by spending time checking cache
for the data.

 Who has built the biggest baddest Pg server out there and what do you
 use?

Not me.

Someone just showed me live system monitoring data on one of his several
PG machines. That one was clocking multi-thousand TPS on a server
(Sun??) with 128GB RAM. That much RAM makes top look amusing.

Several of the social-networking sites are using PG - generally
spreading load over several (dozens) of servers. They also make heavy
use of pooling and caching - think dedicated memcached servers offering
a combined pool of several TB RAM.

For pooling, pgbouncer seems to have a good reputation. Tests on my
current production server show it shaving a few ms off every
connect-query-disconnect cycle. Connects are fairly fast in PG but that
delay becomes a significant issue under heavy load.

Test pooling carefully, though. If you blindly run everything through
your pooler instead of just selected apps, you can end up with
unexpected problems when one client changes a backend setting like set
statement_timeout to 5. If the next client assigned to that backend
connection runs a long-duration analysis query, it is likely to fail.

Cheers,
Steve

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Hardware for PostgreSQL

2007-11-01 Thread Steve Crawford
Magnus Hagander wrote:
 Ow Mun Heng wrote:
 You're likely better off (performance-wise) putting it on the same disk
 as the database itself if that one has better RAID, for example.
 I'm thinking along the lines of since nothing much writes to the OS
 Disk, I should(keyword) be safe.
 
 Unless it's *always* in the cache (not so likely), reads will also move
 the heads...

And if you aren't mounted noatime, reads will also cause a write.

Cheers,
Steve

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] [GENERAL] PostgreSQL Performance Tuning

2007-04-26 Thread Steve Crawford
Shohab Abdullah wrote:
 
 Dear,
 We are facing performance tuning problem while using PostgreSQL Database
 over the network on a linux OS.
 Our Database consists of more than 500 tables with an average of 10K
 records per table with an average of 20 users accessing the database
 simultaneously over the network. Each table has indexes and we are
 querying the database using Hibernate.
 The biggest problem is while insertion, updating and fetching of
 records, ie the database performance is very slow. It take a long time
 to respond in the above scenario.
 Please provide me with the tuning of the database. I am attaching my
 *postgresql.conf* file for the reference of our current configuration

Have you changed _anything_ from the defaults? The defaults are set so
PG will run on as many installations as practical. They are not set for
performance - that is specific to your equipment, your data, and how you
need to handle the data. Assuming the record sizes aren't huge, that's
not a very large data set nor number of users.

Look at these for starters:
http://www.varlena.com/GeneralBits/Tidbits/perf.html
http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html

You might try setting the logging parameters to log queries longer than
x (where x is in milliseconds - you will have to decide the
appropriate value for too long) and start looking into those first.

Make sure that you are running analyze if it is not being run by
autovacuum.

Use EXPLAIN your query to see how the query is being planned - as a
first-pass assume that on any reasonably sized table the words
sequential scan means fix this. Note that you may have to cast
variables in a query to match the variable in an index in order for the
planner to figure out that it can use the index.

Read the guidelines then take an educated stab at some settings and see
how they work - other than turning off fsync, there's not much in
postgresql.conf that will put your data at risk.

Cheers,
Steve

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly