Re: [PERFORM] Long Running Update - My Solution

2011-06-28 Thread Harry Mantheakis

Hello Kevin

 If you use EXPLAIN with both statements...

Yes, the plans are indeed very different.

Here is the statement, set to update up to 100,000 records, which took 
about 5 seconds to complete:



UPDATE
  table_A
SET
  field_1 = table_B.field_1
, field_2 = table_B.field_2
FROM
  table_B
WHERE
  table_B.tb_id = 0
AND
  table_B.tb_id = 10
AND
  table_B.tb_id = table_A.ta_id
;


The query plan for the above is:


Nested Loop  (cost=0.00..2127044.47 rows=73620 width=63)
  -  Index Scan using table_B_pkey on table_B  (cost=0.00..151830.75 
rows=73620 width=20)

Index Cond: ((tb_id = 0) AND (tb_id = 10))
  -  Index Scan using table_A_pkey on table_A  (cost=0.00..26.82 
rows=1 width=47)

Index Cond: (table_A.ta_id = table_B.tb_id)


Now, if I change the first AND clause to update 1M records, as follows:


table_B.id = 100


I get the following - quite different - query plan:


Hash Join  (cost=537057.49..8041177.88 rows=852150 width=63)
  Hash Cond: (table_A.ta_id = table_B.tb_id)
  -  Seq Scan on table_A  (cost=0.00..3294347.71 rows=145561171 width=47)
  -  Hash  (cost=521411.62..521411.62 rows=852150 width=20)
-  Bitmap Heap Scan on table_B  (cost=22454.78..521411.62 
rows=852150 width=20)

  Recheck Cond: ((tb_id = 0) AND (tb_id = 100))
  -  Bitmap Index Scan on table_B_pkey  
(cost=0.00..22241.74 rows=852150 width=0)

Index Cond: ((tb_id = 0) AND (tb_id = 100))


Note: When I tried updating 1M records, the command was still running 
after 25 minutes before I killed it.


The sequential scan in the later plan looks expensive, and (I think) 
supports what others have since mentioned, namely that when the 
optimizer moves to using sequential scans (working off the disk) things 
get a lot slower.


For me, the penny has finally dropped on why I should use EXPLAIN for 
bulk operations.


Thanks too, to Greg Smith, Robert Klemme and Thomas for all the feedback.

Kind regards

Harry Mantheakis
London, UK




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow performance when querying millions of rows

2011-06-28 Thread Craig McIlwee
Hello,

I have a handful of queries that are performing very slowly.  I realize that I 
will be hitting hardware limits at some point, but want to make sure Ive 
squeezed out every bit of performance I can before calling it quits.

Our database is collecting traffic data at the rate of about 3 million rows a 
day.  The most important columns in the table being queried are timestamp, 
volume, occupancy, and speed.  I have also denormalized the table by adding 
road name, direction, mile marker, and lane type values to eliminate joins with 
other tables that contain information about the devices that collect this 
information.  A typical query will involve segments of roadway (i.e. road 
names, directions, and mile marker bounds) over a certain period of time (e.g. 
morning rush hour), and will have filters to exclude questionable data such 
(e.g. speed  100 MPH).  Unfortunately, there are also a few cases in which a 
user will query data for many full days on all roadways, essentially querying 
everything for a large period of time.  One other thing to note is that we only 
ever query readings with lane_type = through_lanes, although we are collecting 
ramp and reversible lane data to facilitate future reporting needs.

Table Metadata:
- Volume typically ranges anywhere from 0 to 20, averages around 4  5.  A small 
percentage of the rows have null volume.
- Occupancy ranges from 1 to 10, averages around 1 or 2
- Speed is about what you would expect, ranging from 30  70 with an average 
somewhere near the middle
- There are 17 roads
- There are 2 directions per road
- Mile marker ranges vary by roadway, typical ranges are something like 0 to 40 
or 257 to 290
- Most (80 to 90% +) of the readings have lane_type = through_lanes
- Size of a daily table is about 360MB, a half month table is 5 to 6 GB

Full Table and Index Schema:

Ive experimented with partitioning using a table per day and 2 tables per month 
(1st through 15th, 16th to end of month).  2 tables/month was the original 
approach to keep the number of tables from growing too rapidly, and shows about 
3x slower performance.  Using daily tables incurs extra planning overhead as 
expected, but isnt all that bad.  Im OK with taking a 1 second planning hit if 
my overall query time decreases significantly.  Furthermore, we will only be 
storing raw data for about a year and can aggregate old data.  This means that 
I can do daily tables for raw data and larger/fewer tables for older data.  The 
table and index structure is below, which is identical between daily and ½ 
month tables with a couple of exceptions:
- Daily tables have a fill factor of 100, ½ month tables are default
- Only the 4 column indexes were created for the daily tables since the others 
never get used

CREATE TABLE vds_detector_data
(
  reading_timestamp timestamp without time zone,
  vds_id integer,
  detector_id integer,
  status smallint,
  speed numeric(12,9),
  volume numeric(12,9),
  confidence smallint,
  occupancy numeric(12,9),
  loadid bigint,
  road_name character varying(150),
  road_dir character varying(2),
  mile_marker numeric(7,2),
  lane_number integer,
  lane_type character varying(32),
  CONSTRAINT vds_detector_vdsid_fkey FOREIGN KEY (vds_id, detector_id)
  REFERENCES ref_vds_detector_properties (device_id, detector_id) MATCH 
SIMPLE
  ON UPDATE CASCADE ON DELETE RESTRICT
)
WITH (
  OIDS=FALSE
);

CREATE INDEX vds_detector_data_dir_idx
  ON vds_detector_data
  USING btree
  (road_dir);

CREATE INDEX vds_detector_data_lane_idx
  ON vds_detector_data
  USING btree
  (lane_number);

CREATE INDEX vds_detector_data_mm_idx
  ON vds_detector_data
  USING btree
  (mile_marker);

CREATE INDEX vds_detector_data_occupancy_idx
  ON vds_detector_data
  USING btree
  (occupancy);

CREATE INDEX vds_detector_data_road_idx
  ON vds_detector_data
  USING btree
  (road_name);

CREATE INDEX vds_detector_data_road_ts_mm_dir_idx
  ON vds_detector_data
  USING btree
  (road_name, reading_timestamp, mile_marker, road_dir);

CREATE INDEX vds_detector_data_speed_idx
  ON vds_detector_data
  USING btree
  (speed);

CREATE INDEX vds_detector_data_timestamp_idx
  ON vds_detector_data
  USING btree
  (reading_timestamp);

CREATE INDEX vds_detector_data_ts_road_mm_dir_idx
  ON vds_detector_data
  USING btree
  (reading_timestamp, road_name, mile_marker, road_dir);

CREATE INDEX vds_detector_data_volume_idx
  ON vds_detector_data
  USING btree
  (volume);

EXPLAIN ANALYZE:

Query:
select cast(reading_timestamp as Date) as date, floor(extract(hour from 
reading_timestamp) / 1.0) * 1.0  as hour, floor(extract(minute from 
reading_timestamp) / 60) * 60 as min,
  count(*), sum(vdd.volume) as totalVolume, 
sum(vdd.occupancy*vdd.volume)/sum(vdd.volume) as avgOcc, 
sum(vdd.speed*vdd.volume)/sum(vdd.volume) as avgSpeed,
  avg(vdd.confidence) as avgConfidence, min(vdd.detector_id) as detectorId, 
vdd.vds_id as vdsId
from vds_detector_data vdd
where (vdd.reading_timestamp between '2011-4-01 

Re: [PERFORM] Slow performance when querying millions of rows

2011-06-28 Thread Tomas Vondra
Dne 28.6.2011 23:28, Craig McIlwee napsal(a):
 Daily table explain analyze: http://explain.depesz.com/s/iLY
 Half month table explain analyze: http://explain.depesz.com/s/Unt

Are you sure those two queries are exactly the same? Because the daily
case output says the width is 50B, while the half-month case says it's
75B. This might be why the sort/aggregate steps are switched, and that
increases the amount of data so that it has to be sorted on disk (which
is why the half-month is so much slower).

Haven't you added some columns to the half-month query?

 Postgres version:
 PostgreSQL 8.4.8, compiled by Visual C++ build 1400, 32-bit
 
 History:
 None, this is a new database and application
 
 Hardware:
 - 2 Intel Xeon 2.13GHz processors with 8 cores each
 - 8GB RAM
 - Disks are in a RAID 5 configuration with a PERC H700 Integrated RAID
 Controller 512MB Cache
 - 5 disks, Seagate 7200 RPM SAS, 500GB each for a total capacity of
 about 2TB
 - Windows Server 2008 R2 64-bit (but 32-bit postgres)
 - Hardware upgrades arent an option at this point due to budget and time
 constraints

Not much experience with PostgreSQL on Windows, but this looks good to
me. Not sure if RAID5 is a good choice, especially because of write
performance - this is probably one of the reasons why the disk sort is
so slow (in the half-month case).

And it's nice you have 8 cores, but don't forget each query executes on
a single background process, i.e. it may use single core. So the above
query can't use 8 cores - that's why the in-memory sort takes so long I
guess.

 Maintenance Setup:
 Autovacuum is disabled for these tables since the data is never
 updated.  The tables that we are testing with at the moment will not
 grow any larger and have been both clustered and analyzed.  They were
 clustered on the vds_detector_data_timestamp_idx index.
 
 GUC Settings:
 effective_cache_size: 2048MB
 work_mem: 512MB
 shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
 plan and took the same amount of time to execute give or take a few seconds
 
 Summary:
 
 The time to get the raw data (before aggregation and sorting) is
 relatively similar between the daily and half month tables.  It would
 appear that the major difference is the ordering of sort and
 aggregation, the daily tables aggregate first so the amount of data
 sorted is significantly less.

Yes, the ordering is the problem. The amount of data to sort is so huge
(3GB) it does not fit into work_mem and has to be sorted on disk. Not
sure why this happens, the only difference I've noticed is the 'width'
(50B vs. 75B). Are those two queries exactly the same?

 Since the daily tables are only 360MB, I would hope that the entire
 table could be pulled into memory with one large sequential read.  Of
 course this assumes that the file pieces are stored contiguously, but
 auto defrag is enabled and shows low fragmentation so Im trusting (as
 much as one can) Windows to do the right thing here.  My drives have a
 150MB/s sustained max throughput, and considering that data is spread
 across 5 drives I would hope to at least be able to reach the single
 disk theoretical limit and read an entire table plus the index into
 memory about 4 to 5 seconds.  Based on the analyze output, each daily
 table averages 6 to 7 seconds, so Im pretty close there and maybe just
 limited by disk speed?

Well, you have 30 partitions and 7 seconds for each means 210 secons in
total. Which is about the time you get (before the aggregate/sort).

You have to check where the bottleneck is - is it the I/O or CPU? I'd
guess the CPU, but I may be wrong. On unix I'd use something like
iostat/vmstat/top to see what's going on - not sure what to use on
Windows. I guess there is a some console or maybe Process Explorer from
sysinternals.

 In both cases, the row estimates vs actual are way off.  Ive increased
 statistics on the reading_timestamp and road_name columns to 100 and
 then 1000 with no change.  I ran an ANALYZE after each statistics
 change.  Should I be upping stats on the non-indexed columns as well? 
 Ive read documentation that says I should be able to set statistics
 values for an entire table as opposed to per column, but havent found
 how to do that.  I guess I was either too lazy to update statistics on
 each column or just didnt think it would help much.

The estimates seem pretty good to me - 10x difference is not that much.
Could be better, but I don't think you can get a better plan, is seems
very reasonable to me.

 So, any pointers for performance improvement?

Three ideas what might help

1) partial indexes

How much do the parameters in the query change? If there are parameters
that are always the same, you may try to create partial indexes. For
example if the 'vdd.volume' always has to be '0', then you can create
the index like this

 CREATE INDEX vds_detector_data_dir_idx
   ON vds_detector_data
   USING btree
   (road_dir)
   WHERE (vdd.volume!=0);

That way only the rows with 

Re: [PERFORM] Slow performance when querying millions of rows

2011-06-28 Thread Greg Smith

On 06/28/2011 05:28 PM, Craig McIlwee wrote:
Autovacuum is disabled for these tables since the data is never 
updated.  The tables that we are testing with at the moment will not 
grow any larger and have been both clustered and analyzed.


Note that any such prep to keep from ever needing to maintain these 
tables in the future should include the FREEZE option, possibly with 
some parameters tweaked first to make it more aggressive.  Autovacuum 
will eventually revisit them in order to prevent transaction ID 
wrap-around, even if it's disabled.  If you're going to the trouble of 
prepping them so they are never touched again, you should do a freeze 
with the right parameters to keep this from happening again.



work_mem: 512MB
shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query 
plan and took the same amount of time to execute give or take a few 
seconds


shared_buffers doesn't normally impact the query plan; it impacts how 
much churn there is between the database and the operating system cache, 
mainly important for making write-heavy work efficient.  On Windows, 
you'll probably be safe to set this to 512MB and forget about it.  It 
doesn't benefit from large values anyway.


This is a very large work_mem setting however, so be careful that you 
won't have many users connecting at once if you're going to use it.  
Each connection can use a multiple of work_mem, making it quite possible 
you could run out of memory with this configuration.  If that low user 
count is true, you may want to make sure you're enforcing it by lowering 
max_connections, as a safety measure to prevent problems.


Since the daily tables are only 360MB, I would hope that the entire 
table could be pulled into memory with one large sequential read.  Of 
course this assumes that the file pieces are stored contiguously, but 
auto defrag is enabled and shows low fragmentation so Im trusting (as 
much as one can) Windows to do the right thing here.  My drives have a 
150MB/s sustained max throughput, and considering that data is spread 
across 5 drives I would hope to at least be able to reach the single 
disk theoretical limit and read an entire table plus the index into 
memory about 4 to 5 seconds.  Based on the analyze output, each daily 
table averages 6 to 7 seconds, so Im pretty close there and maybe just 
limited by disk speed?


One thing to note is that your drive speed varies based on what part of 
the disk things are located at; the slower parts of the drive will be 
much less than 150MB/s.


On Linux servers it's impossible to reach something close to the disk's 
raw speed without making the operating system read-ahead feature much 
more aggressive than it is by default.  Because PostgreSQL fetches a 
single block at a time, to keep the drive completely busy something has 
to notice the pattern of access and be reading data ahead of when the 
database even asks for it.  You may find a parameter you can tune in the 
properties for the drives somewhere in the Windows Control Panel.  And 
there's a read-ahead setting on your PERC card that's better than 
nothing you may not have turned on (not as good as the Linux one, but 
it's useful).  There are two useful settings there (on and adaptive 
if I recall correctly) that you can try, to see which works better.


Ive read documentation that says I should be able to set statistics 
values for an entire table as opposed to per column, but havent found 
how to do that.  I guess I was either too lazy to update statistics on 
each column or just didnt think it would help much.


You can adjust the statistics target across the entire database using 
the default_statistics_target setting, or you can tweak them per column 
using ALTER TABLE.  There is no table-level control.  I find it 
difficult to answer questions about whether there is enough stats or not 
without actually looking at pg_stats to see how the database is 
interpreting the data, and comparing it against the real distribution.  
This is an area where flailing about trying things doesn't work very 
well; you need to be very systematic about the analysis and testing 
strategy if you're going to get anywhere useful.  It's not easy to do.


As a larger commentary on what you're trying to do, applications like 
this often find themselves at a point one day where you just can't allow 
arbitrary user queries to run against them anymore.  What normally 
happens then is that the most common things that people really need end 
up being run one and stored in some summary form, using techniques such 
as materialized views:  http://wiki.postgresql.org/wiki/Materialized_Views


In your case, I would start now on trying to find the common patters to 
the long running reports that people generate, and see if it's possible 
to pre-compute some portion of them and save that summary.  And you may 
find yourself in a continuous battle with business requests regardless.  
It's often key decision makers who feel they 

Re: [PERFORM] Slow performance when querying millions of rows

2011-06-28 Thread Craig McIlwee
 Dne 28.6.2011 23:28, Craig McIlwee napsal(a):
  Daily table explain analyze: http://explain.depesz.com/s/iLY
  Half month table explain analyze: http://explain.depesz.com/s/Unt
 
 Are you sure those two queries are exactly the same? Because the daily
 case output says the width is 50B, while the half-month case says it's
 75B. This might be why the sort/aggregate steps are switched, and that
 increases the amount of data so that it has to be sorted on disk (which
 is why the half-month is so much slower).
 
 Haven't you added some columns to the half-month query?

The daily tables were created using CREATE TABLE AS from the half month tables, 
structure is the same with the exception of fill factor.  Queries are identical 
except for the name of the master table that they select from.

 
  Postgres version:
  PostgreSQL 8.4.8, compiled by Visual C++ build 1400, 32-bit
  
  History:
  None, this is a new database and application
  
  Hardware:
  - 2 Intel Xeon 2.13GHz processors with 8 cores each
  - 8GB RAM
  - Disks are in a RAID 5 configuration with a PERC H700 Integrated RAID
  Controller 512MB Cache
  - 5 disks, Seagate 7200 RPM SAS, 500GB each for a total capacity of
  about 2TB
  - Windows Server 2008 R2 64-bit (but 32-bit postgres)
  - Hardware upgrades arent an option at this point due to budget and time
  constraints
 
 Not much experience with PostgreSQL on Windows, but this looks good to
 me. Not sure if RAID5 is a good choice, especially because of write
 performance - this is probably one of the reasons why the disk sort is
 so slow (in the half-month case).

Yes, the data import is painfully slow but I hope to make up for that with the 
read performance later.

 
 And it's nice you have 8 cores, but don't forget each query executes on
 a single background process, i.e. it may use single core. So the above
 query can't use 8 cores - that's why the in-memory sort takes so long I
 guess.
 
  Maintenance Setup:
  Autovacuum is disabled for these tables since the data is never
  updated.  The tables that we are testing with at the moment will not
  grow any larger and have been both clustered and analyzed.  They were
  clustered on the vds_detector_data_timestamp_idx index.
  
  GUC Settings:
  effective_cache_size: 2048MB
  work_mem: 512MB
  shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
  plan and took the same amount of time to execute give or take a few
 seconds
  
  Summary:
  
  The time to get the raw data (before aggregation and sorting) is
  relatively similar between the daily and half month tables.  It would
  appear that the major difference is the ordering of sort and
  aggregation, the daily tables aggregate first so the amount of data
  sorted is significantly less.
 
 Yes, the ordering is the problem. The amount of data to sort is so huge
 (3GB) it does not fit into work_mem and has to be sorted on disk. Not
 sure why this happens, the only difference I've noticed is the 'width'
 (50B vs. 75B). Are those two queries exactly the same?
 
  Since the daily tables are only 360MB, I would hope that the entire
  table could be pulled into memory with one large sequential read.  Of
  course this assumes that the file pieces are stored contiguously, but
  auto defrag is enabled and shows low fragmentation so Im trusting (as
  much as one can) Windows to do the right thing here.  My drives have a
  150MB/s sustained max throughput, and considering that data is spread
  across 5 drives I would hope to at least be able to reach the single
  disk theoretical limit and read an entire table plus the index into
  memory about 4 to 5 seconds.  Based on the analyze output, each daily
  table averages 6 to 7 seconds, so Im pretty close there and maybe just
  limited by disk speed?
 
 Well, you have 30 partitions and 7 seconds for each means 210 secons in
 total. Which is about the time you get (before the aggregate/sort).
 
 You have to check where the bottleneck is - is it the I/O or CPU? I'd
 guess the CPU, but I may be wrong. On unix I'd use something like
 iostat/vmstat/top to see what's going on - not sure what to use on
 Windows. I guess there is a some console or maybe Process Explorer from
 sysinternals.
 
  In both cases, the row estimates vs actual are way off.  Ive increased
  statistics on the reading_timestamp and road_name columns to 100 and
  then 1000 with no change.  I ran an ANALYZE after each statistics
  change.  Should I be upping stats on the non-indexed columns as well? 
  Ive read documentation that says I should be able to set statistics
  values for an entire table as opposed to per column, but havent found
  how to do that.  I guess I was either too lazy to update statistics on
  each column or just didnt think it would help much.
 
 The estimates seem pretty good to me - 10x difference is not that much.
 Could be better, but I don't think you can get a better plan, is seems
 very reasonable to me.
 
  So, any pointers for performance improvement?
 

Re: [PERFORM] Slow performance when querying millions of rows

2011-06-28 Thread Craig McIlwee
 On 06/28/2011 05:28 PM, Craig McIlwee wrote:
  Autovacuum is disabled for these tables since the data is never 
  updated.  The tables that we are testing with at the moment will not 
  grow any larger and have been both clustered and analyzed.
 
 Note that any such prep to keep from ever needing to maintain these 
 tables in the future should include the FREEZE option, possibly with 
 some parameters tweaked first to make it more aggressive.  Autovacuum 
 will eventually revisit them in order to prevent transaction ID 
 wrap-around, even if it's disabled.  If you're going to the trouble of 
 prepping them so they are never touched again, you should do a freeze 
 with the right parameters to keep this from happening again.
 
  work_mem: 512MB
  shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query 
  plan and took the same amount of time to execute give or take a few 
  seconds
 
 shared_buffers doesn't normally impact the query plan; it impacts how 
 much churn there is between the database and the operating system cache, 
 mainly important for making write-heavy work efficient.  On Windows, 
 you'll probably be safe to set this to 512MB and forget about it.  It 
 doesn't benefit from large values anyway.

I was thinking that shared buffers controlled the amount of data, primarily 
table and index pages, that the database could store in memory at once.  Based 
on that assumption, I thought that a larger value would enable an entire table 
+ index to be in memory together and speed up the query.  Am I wrong?

 
 This is a very large work_mem setting however, so be careful that you 
 won't have many users connecting at once if you're going to use it.  
 Each connection can use a multiple of work_mem, making it quite possible 
 you could run out of memory with this configuration.  If that low user 
 count is true, you may want to make sure you're enforcing it by lowering 
 max_connections, as a safety measure to prevent problems.

I plan on lowering this quite a bit since I haven't seen much of a boost by 
increasing it.

 
  Since the daily tables are only 360MB, I would hope that the entire 
  table could be pulled into memory with one large sequential read.  Of 
  course this assumes that the file pieces are stored contiguously, but 
  auto defrag is enabled and shows low fragmentation so Im trusting (as 
  much as one can) Windows to do the right thing here.  My drives have a 
  150MB/s sustained max throughput, and considering that data is spread 
  across 5 drives I would hope to at least be able to reach the single 
  disk theoretical limit and read an entire table plus the index into 
  memory about 4 to 5 seconds.  Based on the analyze output, each daily 
  table averages 6 to 7 seconds, so Im pretty close there and maybe just 
  limited by disk speed?
 
 One thing to note is that your drive speed varies based on what part of 
 the disk things are located at; the slower parts of the drive will be 
 much less than 150MB/s.
 
 On Linux servers it's impossible to reach something close to the disk's 
 raw speed without making the operating system read-ahead feature much 
 more aggressive than it is by default.  Because PostgreSQL fetches a 
 single block at a time, to keep the drive completely busy something has 
 to notice the pattern of access and be reading data ahead of when the 
 database even asks for it.  You may find a parameter you can tune in the 
 properties for the drives somewhere in the Windows Control Panel.  And 
 there's a read-ahead setting on your PERC card that's better than 
 nothing you may not have turned on (not as good as the Linux one, but 
 it's useful).  There are two useful settings there (on and adaptive 
 if I recall correctly) that you can try, to see which works better.

Looks like they are set to adaptive read-ahead now.  If the database is 
executing many concurrent queries, is it reasonable to suspect that the IO 
requests will compete with each other in such a way that the controller would 
rarely see many sequential requests since it is serving many processes?  The 
controller does have an 'on' option also that forces read-ahead, maybe that 
would solve the issue if we can rely on the data to survive in the cache until 
the actual read request takes place.

 
  Ive read documentation that says I should be able to set statistics 
  values for an entire table as opposed to per column, but havent found 
  how to do that.  I guess I was either too lazy to update statistics on 
  each column or just didnt think it would help much.
 
 You can adjust the statistics target across the entire database using 
 the default_statistics_target setting, or you can tweak them per column 
 using ALTER TABLE.  There is no table-level control.  I find it 
 difficult to answer questions about whether there is enough stats or not 
 without actually looking at pg_stats to see how the database is 
 interpreting the data, and comparing it against the real distribution.  
 

Re: [PERFORM] Slow performance when querying millions of rows

2011-06-28 Thread Tomas Vondra
Dne 29.6.2011 01:26, Craig McIlwee napsal(a):
 Dne 28.6.2011 23:28, Craig McIlwee napsal(a):
 Are you sure those two queries are exactly the same? Because the daily
 case output says the width is 50B, while the half-month case says it's
 75B. This might be why the sort/aggregate steps are switched, and that
 increases the amount of data so that it has to be sorted on disk (which
 is why the half-month is so much slower).

 Haven't you added some columns to the half-month query?
 
 The daily tables were created using CREATE TABLE AS from the half month
 tables, structure is the same with the exception of fill factor. 
 Queries are identical except for the name of the master table that they
 select from.

Hm, I'm not sure where this width value comes from but I don't think
it's related to fillfactor.

 Not much experience with PostgreSQL on Windows, but this looks good to
 me. Not sure if RAID5 is a good choice, especially because of write
 performance - this is probably one of the reasons why the disk sort is
 so slow (in the half-month case).
 
 Yes, the data import is painfully slow but I hope to make up for that
 with the read performance later.

Generally you're right that RAID10 is going to be slower than RAID5 when
reading (and faster when writing) the data, but how big the gap is
really depends on the controller. It's not that big I guess - see for
example this:

http://www.kendalvandyke.com/2009/02/disk-performance-hands-on-part-5-raid.html

The first test shows that RAID10 is about 10% slower on reads but about
60% faster on writes.

BTW have you tuned the GUC settings for write (increasing checkpoint
segments may give much better write performance).

 The 0 volume is the only thing that will always be present, but those
 records do account for 10 to 15% of the data.  I'll give this a shot,
 I'm really interested in seeing what impact this had.  For some reason I
 was under the impression that partial indexes were used for text
 searches, so I completely overlooked this.

Or you might actually do two partitions for each day - one for volume=0
and the other one for volume!=0. Not sure if that is worth the effort.

One more thing to try in this case - it's not that important how many
rows suffice the condition, much more important is how many blocks need
to be read from the disk. If those 10% rows are distributed evenly
through the table (i.e. there's at least one in each 8kB block), the I/O
still needs to be done.

And it's very likely the case, as you've clustered the tables according
to the timestamp. Try to cluster the tables according to 'volume' and
check the difference.

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow performance when querying millions of rows

2011-06-28 Thread Tomas Vondra
Dne 29.6.2011 01:50, Craig McIlwee napsal(a):
  work_mem: 512MB
  shared_buffers: 64MB, 512MB, and 1024MB, each yielded the same query
  plan and took the same amount of time to execute give or take a few
  seconds

 shared_buffers doesn't normally impact the query plan; it impacts how
 much churn there is between the database and the operating system cache,
 mainly important for making write-heavy work efficient.  On Windows,
 you'll probably be safe to set this to 512MB and forget about it.  It
 doesn't benefit from large values anyway.
 
 I was thinking that shared buffers controlled the amount of data,
 primarily table and index pages, that the database could store in memory
 at once.  Based on that assumption, I thought that a larger value would
 enable an entire table + index to be in memory together and speed up the
 query.  Am I wrong?

Well, you're right and wrong at the same time. The shared buffers really
controls the amount of data that may be read into the database cache,
that's true. But this value is not used when building the execution
plan. There's another value (effective_cache_size) that is used when
planning a query.

  Ive read documentation that says I should be able to set statistics
  values for an entire table as opposed to per column, but havent found
  how to do that.  I guess I was either too lazy to update statistics on
  each column or just didnt think it would help much.

Link to the docs? According to

http://www.postgresql.org/docs/current/static/sql-altertable.html

it's possible to set this only at the column level. And of course
there's a GUC default_statistics_target that defines default value.

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow performance when querying millions of rows

2011-06-28 Thread Greg Smith

On 06/28/2011 07:26 PM, Craig McIlwee wrote:
Yes, the data import is painfully slow but I hope to make up for that 
with the read performance later.


You can probably improve that with something like this:

shared_buffers=512MB
checkpoint_segments=64

Maybe bump up maintenance_work_mem too, if the vacuum part of that is 
the painful one.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
Comprehensive and Customized PostgreSQL Training Classes:
http://www.2ndquadrant.us/postgresql-training/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow performance when querying millions of rows

2011-06-28 Thread Greg Smith

On 06/28/2011 07:50 PM, Craig McIlwee wrote:
I was thinking that shared buffers controlled the amount of data, 
primarily table and index pages, that the database could store in 
memory at once.  Based on that assumption, I thought that a larger 
value would enable an entire table + index to be in memory together 
and speed up the query.  Am I wrong?


It does to some extent.  But:

a) This amount doesn't impact query planning as much if you've set a 
large effective_cache_size


b) The operating system is going to cache things outside of PostgreSQL, too

c) Data read via a sequential scan sometimes skips going into 
shared_buffers, to keep that cache from being swamped with any single scan


d) until the data has actually made its way into memory, you may be 
pulling it in there by an inefficient random process at first.  By the 
time the cache is populated, the thing you wanted a populated cache to 
accelerate may already have finished.


It's possible to get insight into this all using pg_buffercache to 
actually see what's in the cache, and I've put up some talks and scripts 
to help with that at http://projects.2ndquadrant.com/talks you might 
find useful.


Looks like they are set to adaptive read-ahead now.  If the database 
is executing many concurrent queries, is it reasonable to suspect that 
the IO requests will compete with each other in such a way that the 
controller would rarely see many sequential requests since it is 
serving many processes?  The controller does have an 'on' option also 
that forces read-ahead, maybe that would solve the issue if we can 
rely on the data to survive in the cache until the actual read request 
takes place.


I've never been able to find good documentation on just what the 
difference between the adaptive and on modes of that controller really 
are, which is why I suggested you try both and see.  Linux has a 
uniquely good read-ahead model that was tuned with PostgreSQL 
specifically in mind.  And you still have to tweak it upwards from the 
defaults in order for the database to fetch things as fast as the drives 
are capable sometimes.  So your idea that you will meet/exceed the 
drive's capabilities for bulk sequential scans is less likely than you 
might think.  RAID5 in theory should give you 2X or more of the speed of 
any single disk when reading a clustered table, but the way PostgreSQL 
does it may make that hard to realize on Windows.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
Comprehensive and Customized PostgreSQL Training Classes:
http://www.2ndquadrant.us/postgresql-training/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] change sample size for statistics

2011-06-28 Thread Robert Haas
On Mon, Jun 13, 2011 at 6:33 PM, Willy-Bas Loos willy...@gmail.com wrote:
 On Fri, Jun 10, 2011 at 9:58 PM, Josh Berkus j...@agliodbs.com wrote:

 It's not 10%.  We use a fixed sample size, which is configurable on the
 system, table, or column basis.

 It seems that you are referring to alter column set statistics and
 default_statistics_target, which are the number of percentiles in the
 histogram  (and MCV's) .
 I mean the number of records that are scanned by analyze to come to the
 statistics for the planner, especially n_disctict.

In 9.0+ you can do ALTER TABLE .. ALTER COLUMN .. SET (n_distinct = ...);

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance