[PERFORM] indexes in partitioned tables - again

2010-04-06 Thread Samuel Gendler
I know this problem crops up all the time and I have read what I could
find, but I'm still not finding an answer to my problem.  This is all
postgres 8.3. Yes, I've enabled constraint_exclusion.  Yes, there are
indexes on the partitions, not just on the parent.

I've got a table with 1 month partitions.  As it happens, I've only
got 2 partitions at the moment, one with 12 million rows and the other
with 5 million.  I only discovered all of the caveats surrounding
indexes and partitioned tables when I executed a very simple query,
saw that it took far too long to run, and started looking at what the
query planner did. In this case, I simply want the set of distinct
values for a particular column, across all partitions.  The set of
distinct values is very small (3) and there is an index on the column,
so I'd expect an index scan to return the 3 values almost
instantaneously.  I turns out that when I query the partitions
directly, the planner does an index scan.  When I query the parent
table, I get full table scans instead of merged output from n index
scans.  Even worse, instead of getting the distinct values from each
partition and merging those, it merges each partition in its entirety
and then sorts and uniques, which is pretty much the pathological
execution order.

 I'll give the queries, then the schema, then the various explain outputs.

(parent table) select distinct probe_type_num from
day_scale_radar_performance_fact; (30 seconds)
(partition) select distinct probe_type_num from
day_scale_radar_performace_fact_20100301_; (6 seconds)
(partition) select distinct probe_type_num from
day_scale_radar_performance_fact_20100401_; (1 second)

(manual union) select distinct probe_type_num from (select distinct
probe_type_num from day_scale_radar_performace_fact_20100301_
UNION select distinct probe_type_num from
day_scale_radar_performace_fact_20100401_) t2; (7 seconds)

In part, I'm surprised that the index scan takes as long as it does,
since I'd think an index would be able to return the set of keys
relatively quickly.  But that's a secondary issue.

Parent table:
cedexis_v2=# \d day_scale_radar_performance_fact;
   Table "perf_reporting.day_scale_radar_performance_fact"
   Column   |    Type | Modifiers
+-+---
 count  | bigint  | not null
 total_ms   | bigint  | not null
 time   | timestamp without time zone | not null
 market_num | integer | not null
 country_num    | integer | not null
 autosys_num    | integer | not null
 provider_owner_zone_id | integer | not null
 provider_owner_customer_id | integer | not null
 provider_id    | integer | not null
 probe_type_num | integer | not null
Indexes:
    "temp1_probe_type_num" btree (probe_type_num)


partition:
cedexis_v2=# \d day_scale_radar_performance_fact_20100301_;
Table "perf_reporting.day_scale_radar_performance_fact_20100301_"
   Column   |Type | Modifiers
+-+---
 count  | bigint  | not null
 total_ms   | bigint  | not null
 time   | timestamp without time zone | not null
 market_num | integer | not null
 country_num| integer | not null
 autosys_num| integer | not null
 provider_owner_zone_id | integer | not null
 provider_owner_customer_id | integer | not null
 provider_id| integer | not null
 probe_type_num | integer | not null
Indexes:
"day_scale_radar_performance_fact_20100301__asn" btree (autosys_num)
"day_scale_radar_performance_fact_20100301__cty" btree (country_num)
"day_scale_radar_performance_fact_20100301__mkt" btree (market_num)
"day_scale_radar_performance_fact_20100301__p" btree (provider_id)
"day_scale_radar_performance_fact_20100301__poc" btree
(provider_owner_customer_id)
"day_scale_radar_performance_fact_20100301__poz" btree
(provider_owner_zone_id)
"day_scale_radar_performance_fact_20100301__pt" btree (probe_type_num)
"day_scale_radar_performance_fact_20100301__time" btree ("time")
Check constraints:
"day_scale_radar_performance_fact_20100301__time_check" CHECK
("time" >= '2010-03-01 00:00:00'::timestamp without time zone AND
"time" < '2010-04-01 00:00:00'::timestamp without time zone)
Inherits: day_scale_radar_performance

[PERFORM] ideal storage configuration

2010-06-29 Thread Samuel Gendler
I've been reading this list for a couple of weeks, so I've got some
sense of what you folks are likely to recommend, but I'm curious what
is considered an ideal storage solution if building a database system
from scratch.  I just got an exploratory call from my boss, asking
what my preference would be, and I didn't really have a great answer
ready. Budget is certainly NOT unlimited, but with the right
justification, I don't need to pinch pennies, either.

The workload:

It is a combination of OLTP and data warehouse, but the OLTP workload
is trivially light.  All of the load comes from the constant data
insertion and reporting queries over the that data.  Inserts are all
performed via COPY.  The dataset size is kept small at the moment via
very aggressive aggregation and then dropping older, more granular
data but I'd like to be able to expand the quantity of data that I
keep at each level of aggregation. Inserts are currently occurring at
a rate of about 85,000 rows per minute, executed via 3 copy statements
of about 5, 3, and 5000 rows each into 3 different tables. The
copy statements execute in a small fraction of the minute in which
they occur.  I don't have timings handy, but no more than a couple of
seconds.

All fact tables are partitioned over time. Data comes into the db
already aggregated by minute.  I keep 2 hours of minute scale data in
a table partitioned by hour.  Once per hour, the previous hour of data
is aggregated up into an hour scale table.  I keep 2 days of hour
scale data in a table partitioned by day.  Once per day, that gets
aggregated up into a day scale table that is partitioned by month.  We
keep 2 months of day scale data.  Once per month, that gets aggregated
up into a month scale table and that data is kept indefinitely, at the
moment, but may eventually be limited to 3 years or so.  All old data
gets removed by dropping older partitions.  There are no updates at
all.

Most reporting is done from the 2 hours of minute scale data and 2
months of day scale data tables, which are 4 million and 47 million
rows, respectively. I'm not sure the partitioning gets us much, other
than making removal of old data much more efficient, since queries are
usually over the most recent 60 minutes and most recent 30 days, so
tend to involve both partitions to some degree in every query except
in the last minute and last day of each time period. We haven't put a
lot of effort into tuning the queries since the dataset was MUCH
smaller, so there is likely some work to be done just in tuning the
system as it stands, but queries are definitely taking longer than
we'd like them to, and we expect the volume of data coming into the db
to grow in coming months.  Ideally, I'd also like to be keeping a much
longer series of minute scale data, since that is the data most useful
for diagnosing problems in the run time system that is generating the
data, though we may still limit queries on that data set to the last 2
hours.

I inherited the hardware and have changed absolutely nothing to date.

Current hardware -
Looks like I've got 4 of these on the host:
# cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 15
model name  : Intel(R) Xeon(R) CPU5110  @ 1.60GHz
stepping: 6
cpu MHz : 1600.002
cache size  : 4096 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl
vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow
bogomips: 3192.31
clflush size: 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual

so that's 8 cores total

8 GB of RAM, but it is capable of handling 128GB and I get no
resistance when I suggest going to 64GB of RAM.

6 internal drives on battery backed raid (I don't know what RAID level
- is there a way to discover this?), all in a single filesystem, so
WAL and data are on the same filesystem.  I don't believe that we are
taking advantage of the battery backed controller, since I only see
this in /etc/fstab:

UUID=12dcd71d-8aec-4253-815c-b4883195eeb8 /   ext3
   defaults1 1

But inserts are happening so rapidly that I don't imagine that getting
rid of fsync is going to change performance of the reporting queries
too dramatically.

Total available storage is 438GB. Database currently occupies 91GB on disk.

So my question is, what would be the recommended storage solution,
given what I've said above?  And at what kind of price points?  I have
no idea at what price point I'll start to get resistance at the
moment.  It could be $10K, it could be 5 times that.  I really h

Re: [PERFORM] Need help in performance tuning.

2010-07-08 Thread Samuel Gendler
On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
 wrote:
> If you're not using a connection pool, start using one.
>
> Do you really need 100 *active* working query threads at one time? Because
> if you do, you're going to need a scary-big disk subsystem and a lot of
> processors.

I see this issue and subsequent advice cross this list awfully
frequently.  Is there in architectural reason why postgres itself
cannot pool incoming connections in order to eliminate the requirement
for an external pool?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-09 Thread Samuel Gendler
On Fri, Jul 9, 2010 at 2:08 AM, Russell Smith  wrote:
> On 09/07/10 02:31, Ryan Wexler wrote:
>
>
> The only other difference between the boxes is the postgresql version.  The
> new one has 8.4-2 from the yum install instructions on the site:
> http://yum.pgrpms.org/reporpms/repoview/pgdg-centos.html
>
> Any more thoughts?
>
> Really dumb idea, you don't happen to have the build of the RPM's that had
> debug enabled do you?  That resulted in significant performance problem?
>

The OP mentions that the new system underperforms on a straight dd
test, so it isn't the database config or postgres build.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-09 Thread Samuel Gendler



Sent from my iPhone

On Jul 9, 2010, at 18:25, Josh Berkus  wrote:



So while adding (b) to core alone would be very useful for some users,
ironically it's generally for the more advanced users which are not  
the

ones we're trying to help on this thread.


It would seem from evidence presented on this thread that the more  
appropriate conversation would maybe be with package maintainers, to  
perhaps get them to include a connection pool or provide a package  
that comes with a pool preconfigured and installed, along with  
improving existing documentation so that it encourages the use of a  
pool as a first class installation choice since it seems to be  
something of a first class problem for a lot of novice users.


Just to give some background on my perspective - my prior familiarity  
with a connection pool was entirely on the client side, where I've  
been using them for years go keep resource consumption down on the  
client.. But it never occurred to me to consider one on the other end  
of those connections, despite the fact that I usually have a cluster  
of app hosts all talking to the same db. I assumed low connection  
count was desirable, but not mandatory, since surely the db server  
limited its own resource consumption, much the way a well written  
client app will. I basically assumed that the postgres devs used the  
same logic I did when I pooled my connections at the client side in  
order to minimize resource consumption there. I've got no truck with  
the reasons presented against doing so, since they make perfectly good  
sense to me.


However, I suspect there are lots of engineers like myself - folks  
working without the benefit of a dedicated dba or a dba who is new to  
the postgres platform - who make naive assumptions that aren't  
immediately or obviously corrected by the docs (I may be sticking my  
foot in my mouth here. I haven't read the standard docs in a very long  
time). With this issue in particular, the fix is fairly trivial and  
brings other benefits as well. But it sucks to discover it only after  
you've started to get errors on a running app, no matter how easy the  
fix.


So probably this is really only a bug in communication and can be  
fixed there. That's great. Easier to fix bugs are hard to find. I have  
yet to contribute to postgres development, so I guess, if no one  
objects, I'll see what I can do about improving the documentation of  
this issue, both in the official docs and just making sure it gets  
better billing in other sources of postgres documentation. But you'll  
have to bear with me, as I do have a more-than-full-time other job,  
and no experience with the pg developer community other than a couple  
of weeks on the mailing lists. But I do like to contribute to projects  
I use. It always winds up making me a more proficient user.


(for the record, if I wasn't limited to my phone at the moment I would  
actually check the state of existing documentation before sending  
this, so if I'm talking out of my ass on the lack of documentation,  
please go easy on me. I mean no offense)


--sam






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] write barrier question

2010-08-18 Thread Samuel Gendler
I'm just starting the process of trying to tune new hardware, which is
2x quad core xeon, 48GB RAM, 8x300GB SAS 15K drives in RAID 1+0,
2x72GB 15K SAS drives in RAID 1 for WAL and system.  It is a PERC 6/i
card with BBU. Write-back cache is enabled.  The system volume is
ext3. The large data partition is ext4.

current config changes are as follows (but I've been experimenting
with a variety of settings):

default_statistics_target = 50 # pgtune wizard 2010-08-17
maintenance_work_mem = 1GB # pgtune wizard 2010-08-17
constraint_exclusion = on # pgtune wizard 2010-08-17
checkpoint_completion_target = 0.9 # pgtune wizard 2010-08-17
effective_cache_size = 36GB # sam
work_mem = 288MB # pgtune wizard 2010-08-17
wal_buffers = 8MB # pgtune wizard 2010-08-17
#checkpoint_segments = 16 # pgtune wizard 2010-08-17
checkpoint_segments = 30 # sam
shared_buffers = 11GB # pgtune wizard 2010-08-17
max_connections = 80 # pgtune wizard 2010-08-17
cpu_tuple_cost = 0.0030 # sam
cpu_index_tuple_cost = 0.0010   # sam
cpu_operator_cost = 0.0005  # sam
#random_page_cost = 2.0 # sam

It will eventually be a mixed-use db, but the OLTP load is very light.
 ETL for the warehouse side of things does no updates or deletes.
Just inserts and partition drops. I know that
default_statistics_target isn't right for a warehouse workload, but I
haven't gotten to the point of tuning with a production workload, yet,
so I'm leaving the pgtune default.

When running pgbench on a db which fits easily into RAM (10% of RAM =
-s 380), I see transaction counts a little less than 5K.  When I go to
90% of RAM (-s 3420), transaction rate dropped to around 1000 ( at a
fairly wide range of concurrencies).  At that point, I decided to
investigate the performance impact of write barriers. I tried building
and running the test_fsync utility from the source distribution but
really didn't understand how to interpret the results.  So I just
tried the same test with write barriers on and write barriers off (on
both volumes).

With barriers off, I saw a transaction rate of about 1200.  With
barriers on, it was closer to 1050.  The test had a concurrency of 40
in both cases.  From what I understand of the write barrier problem, a
misbehaving controller will flush the cache to disk with every
barrier, so I assume performance would drop a heck of a lot more than
13%. I assume the relatively small performance reduction is just
contention on the write barriers between the 40 backends.  I was
hoping someone could confirm this (I could test on and off with lower
concurrency, of course, but that will take hours to complete).  It
occurs to me that the relatively small drop in performance may also be
the result of the relatively small db size. Our actual production db
is likely to be closer to 200% of RAM, but the most active data should
be a lot closer to 90% of RAM.  Anyway, I could test all of this, but
the testing takes so long (I'm running 30 minutes per test in order to
get any kind of consistency of results) that it is likely faster to
just ask the experts.

I'd also welcome any other tuning suggestions.

Thanks

--sam

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] write barrier question

2010-08-18 Thread Samuel Gendler
I am.  I was giving mean numbers

On Wed, Aug 18, 2010 at 12:56 PM, Craig James
 wrote:
> On 8/18/10 12:24 PM, Samuel Gendler wrote:
>>
>> With barriers off, I saw a transaction rate of about 1200.  With
>> barriers on, it was closer to 1050.  The test had a concurrency of 40
>> in both cases.
>
> I discovered there is roughly 10-20% "noise" in pgbench results after
> running the exact same test over a 24-hour period on a machine with no other
> activity.  Be sure you run your tests enough times to get good statistics
> unless you're looking at much larger differences.
>
> Craig
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] write barrier question

2010-08-18 Thread Samuel Gendler
On Wed, Aug 18, 2010 at 1:25 PM, Yeb Havinga  wrote:
> Samuel Gendler wrote:
>>
>> When running pgbench on a db which fits easily into RAM (10% of RAM =
>> -s 380), I see transaction counts a little less than 5K.  When I go to
>> 90% of RAM (-s 3420), transaction rate dropped to around 1000 ( at a
>> fairly wide range of concurrencies).  At that point, I decided to
>> investigate the performance impact of write barriers.
>
> At 90% of RAM you're probable reading data as well, not only writing.
> Watching iostat -xk 1 or vmstat 1 during a test should confirm this. To find
> the maximum database size that fits comfortably in RAM you could try out
> http://github.com/gregs1104/pgbench-tools - my experience with it is that it
> takes less than 10 minutes to setup and run and after some time you get
> rewarded with nice pictures! :-)

Yes.  I've intentionally sized it at 90% precisely so that I am
reading as well as writing, which is what an actual production
environment will resemble.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] in-memory sorting

2010-08-18 Thread Samuel Gendler
I've got this explain: http://explain.depesz.com/s/Xh9

And these settings:
default_statistics_target = 50 # pgtune wizard 2010-08-17
maintenance_work_mem = 1GB # pgtune wizard 2010-08-17
constraint_exclusion = on # pgtune wizard 2010-08-17
checkpoint_completion_target = 0.9 # pgtune wizard 2010-08-17
effective_cache_size = 36GB # sam
work_mem = 288MB # pgtune wizard 2010-08-17
wal_buffers = 8MB # pgtune wizard 2010-08-17
#checkpoint_segments = 16 # pgtune wizard 2010-08-17
checkpoint_segments = 30 # sam
shared_buffers = 11GB # pgtune wizard 2010-08-17
max_connections = 80 # pgtune wizard 2010-08-17
cpu_tuple_cost = 0.0030 # sam
cpu_index_tuple_cost = 0.0010   # sam
cpu_operator_cost = 0.0005  # sam
#random_page_cost = 2.0 # sam

I'm not understanding why it is sorting on disk if it would fit within
a work_mem segment - by a fairly wide margin.  Is there something else
I can do to get that sort to happen in memory?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] in-memory sorting

2010-08-18 Thread Samuel Gendler
Answered my own question.  Cranking work_mem up to 350MB revealed that
the in-memory sort requires more memory than the disk sort.

On Wed, Aug 18, 2010 at 10:23 PM, Samuel Gendler
 wrote:
> I've got this explain: http://explain.depesz.com/s/Xh9
>
> And these settings:
> default_statistics_target = 50 # pgtune wizard 2010-08-17
> maintenance_work_mem = 1GB # pgtune wizard 2010-08-17
> constraint_exclusion = on # pgtune wizard 2010-08-17
> checkpoint_completion_target = 0.9 # pgtune wizard 2010-08-17
> effective_cache_size = 36GB # sam
> work_mem = 288MB # pgtune wizard 2010-08-17
> wal_buffers = 8MB # pgtune wizard 2010-08-17
> #checkpoint_segments = 16 # pgtune wizard 2010-08-17
> checkpoint_segments = 30 # sam
> shared_buffers = 11GB # pgtune wizard 2010-08-17
> max_connections = 80 # pgtune wizard 2010-08-17
> cpu_tuple_cost = 0.0030                 # sam
> cpu_index_tuple_cost = 0.0010           # sam
> cpu_operator_cost = 0.0005              # sam
> #random_page_cost = 2.0                 # sam
>
> I'm not understanding why it is sorting on disk if it would fit within
> a work_mem segment - by a fairly wide margin.  Is there something else
> I can do to get that sort to happen in memory?
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] yet another q

2010-08-18 Thread Samuel Gendler
Please forgive the barrage of questions.  I'm just learning how to tune
things in postgres and I've still got a bit of learning curve to get over,
apparently.  I have done a lot of reading, though, I swear.

I've got two identical queries except for a change of one condition which
cuts the number of rows in half - which also has the effect of eliminating
one partition from consideration (partitioned over time and I cut the time
period in half).  The query plans are considerably different as a result.
The net result is that the fast query is 5x faster than the slow query.  I'm
curious if the alternate query plan is inherently faster or is it just a
case of the algorithm scaling worse than linearly with the row count, which
certainly wouldn't be surprising.  The big win, for me, is that the sort
uses vastly less memory.  The slow plan requires work_mem to be 1500MB to
even make it 5x worse.  With a more reasonable work_mem (400MB), it drops to
something like 15x worse because it has to sort on disk.

fast plan: http://explain.depesz.com/s/iZ
slow plan: http://explain.depesz.com/s/Dv2

query:


EXPLAIN ANALYZE SELECT
   t_lookup.display_name as group,
   to_char(t_fact.time, 'DD/MM HH24:MI') as category,
   substring(t_lookup.display_name from 1 for 20) as label,
   round(sum(t_fact.total_ms)/sum(t_fact.count)) as value
   FROM
   portal.providers t_lookup,
   day_scale_radar_performance_fact t_fact
   WHERE
   t_fact.probe_type_num < 3
   and t_lookup.provider_id = t_fact.provider_id
   and t_lookup.provider_owner_customer_id =
t_fact.provider_owner_customer_id
   and t_fact.provider_owner_customer_id = 0
   and t_fact.time between timezone('UTC', '2010-08-18 15:00:00') -
interval '30 day' and timezone('UTC', '2010-08-18 15:00:00')
   GROUP BY
   t_fact.provider_owner_customer_id, t_fact.provider_id,
   t_lookup.display_name,
   t_fact.time
   ORDER BY
   t_fact.time

table structure:

   Table "perf_reporting.abstract_radar_performance_fact"
  Column   |Type | Modifiers
+-+---
count  | bigint  | not null
total_ms   | bigint  | not null
time   | timestamp without time zone | not null
market_num | integer | not null
country_num| integer | not null
autosys_num| integer | not null
provider_owner_zone_id | integer | not null
provider_owner_customer_id | integer | not null
provider_id| integer | not null
probe_type_num | integer | not null

with individual indexes on the everything from time to the bottom on the
child tables

and


Table "portal.providers"
  Column   |Type |   Modifiers

+-+
btime  | timestamp without time zone | not null default
now()
mtime  | timestamp without time zone | not null default
now()
version| integer | not null default
1
provider_id| integer | not null
provider_owner_zone_id | integer | not null
provider_owner_customer_id | integer | not null
provider_category_id   | integer | not null
name   | character varying(255)  | not null
display_name   | character varying(255)  | not null

with indexes on every column with name ending in '_id'


Re: [PERFORM] in-memory sorting

2010-08-18 Thread Samuel Gendler
Yeah, although with 48GB of available memory and not that much concurrency,
I'm not sure it matters that much. But point taken, I'll see about modifying
the app such that work_mem gets set on a per-query basis.


On Wed, Aug 18, 2010 at 11:24 PM, Scott Marlowe wrote:

> On Wed, Aug 18, 2010 at 11:45 PM, Samuel Gendler
>  wrote:
> > Answered my own question.  Cranking work_mem up to 350MB revealed that
> > the in-memory sort requires more memory than the disk sort.
>
> Note that unless you run VERY few client connections, it's usually
> better to leave work_mem somewhere in the 1 to 32Meg range and have
> the connection or user or database that needs 350Meg be set there.
>
> I.e.
>
> 
> set work_mem='512MB';
> 
> OR
>
> alter user memoryhog set work_mem='512MB';
>
> OR
>
> alter database memhogdb set work_mem='512MB';
>


Re: [PERFORM] yet another q

2010-08-18 Thread Samuel Gendler
On Wed, Aug 18, 2010 at 11:14 PM, Samuel Gendler
wrote:

> Please forgive the barrage of questions.  I'm just learning how to tune
> things in postgres and I've still got a bit of learning curve to get over,
> apparently.  I have done a lot of reading, though, I swear.
>
> I've got two identical queries except for a change of one condition which
> cuts the number of rows in half - which also has the effect of eliminating
> one partition from consideration (partitioned over time and I cut the time
> period in half).  The query plans are considerably different as a result.
> The net result is that the fast query is 5x faster than the slow query.  I'm
> curious if the alternate query plan is inherently faster or is it just a
> case of the algorithm scaling worse than linearly with the row count, which
> certainly wouldn't be surprising.  The big win, for me, is that the sort
> uses vastly less memory.  The slow plan requires work_mem to be 1500MB to
> even make it 5x worse.  With a more reasonable work_mem (400MB), it drops to
> something like 15x worse because it has to sort on disk.
>
> fast plan: http://explain.depesz.com/s/iZ
> slow plan: http://explain.depesz.com/s/Dv2
>
> query:
>
>
> EXPLAIN ANALYZE SELECT
>t_lookup.display_name as group,
>to_char(t_fact.time, 'DD/MM HH24:MI') as category,
>substring(t_lookup.display_name from 1 for 20) as label,
>round(sum(t_fact.total_ms)/sum(t_fact.count)) as value
>FROM
>portal.providers t_lookup,
>day_scale_radar_performance_fact t_fact
>WHERE
>t_fact.probe_type_num < 3
>and t_lookup.provider_id = t_fact.provider_id
>and t_lookup.provider_owner_customer_id =
> t_fact.provider_owner_customer_id
>and t_fact.provider_owner_customer_id = 0
>and t_fact.time between timezone('UTC', '2010-08-18 15:00:00') -
> interval '30 day' and timezone('UTC', '2010-08-18 15:00:00')
>GROUP BY
>t_fact.provider_owner_customer_id, t_fact.provider_id,
>t_lookup.display_name,
>t_fact.time
>ORDER BY
>t_fact.time
>
> table structure:
>
>Table "perf_reporting.abstract_radar_performance_fact"
>   Column   |Type | Modifiers
> +-+---
> count  | bigint  | not null
> total_ms   | bigint  | not null
> time   | timestamp without time zone | not null
> market_num | integer | not null
> country_num| integer | not null
> autosys_num| integer | not null
> provider_owner_zone_id | integer | not null
> provider_owner_customer_id | integer | not null
> provider_id| integer | not null
> probe_type_num | integer | not null
>
> with individual indexes on the everything from time to the bottom on the
> child tables
>
> and
>
>
> Table "portal.providers"
>   Column   |Type |   Modifiers
>
>
> +-+
> btime  | timestamp without time zone | not null default
> now()
> mtime  | timestamp without time zone | not null default
> now()
> version| integer | not null default
> 1
> provider_id| integer | not null
> provider_owner_zone_id | integer | not null
> provider_owner_customer_id | integer | not null
> provider_category_id   | integer | not null
> name   | character varying(255)  | not null
> display_name   | character varying(255)  | not null
>
> with indexes on every column with name ending in '_id'
>
>
It gets more complicated:

When I dropped to a query over 15 days instead of 30 days, I saw a huge bump
in performance (about 16 secs), the query plan for which is here:

http://explain.depesz.com/s/iaf

note: the query is identical to the one below, but with the interval changed
to 15 days from 30 days, which also keeps the query within a single
partition.  Note that the sort requires almost no memory and occurs after
the aggregation

Re: [PERFORM] yet another q

2010-08-18 Thread Samuel Gendler
The full set of conf changes that were in use during these tests are as
follows:

default_statistics_target = 100 # pgtune wizard 2010-08-17
maintenance_work_mem = 1GB # pgtune wizard 2010-08-17
constraint_exclusion = on # pgtune wizard 2010-08-17
checkpoint_completion_target = 0.9 # pgtune wizard 2010-08-17
effective_cache_size = 36GB # sam
work_mem = 1500MB # pgtune wizard 2010-08-17
wal_buffers = 8MB # pgtune wizard 2010-08-17
#checkpoint_segments = 16 # pgtune wizard 2010-08-17
checkpoint_segments = 30 # sam
shared_buffers = 8GB # pgtune wizard 2010-08-17
max_connections = 80 # pgtune wizard 2010-08-17
cpu_tuple_cost = 0.0030 # sam
cpu_index_tuple_cost = 0.0010   # sam
cpu_operator_cost = 0.0005  # sam
random_page_cost = 2.0  # sam


On Wed, Aug 18, 2010 at 11:50 PM, Samuel Gendler
wrote:

> On Wed, Aug 18, 2010 at 11:14 PM, Samuel Gendler <
> sgend...@ideasculptor.com> wrote:
>
>> Please forgive the barrage of questions.  I'm just learning how to tune
>> things in postgres and I've still got a bit of learning curve to get over,
>> apparently.  I have done a lot of reading, though, I swear.
>>
>> I've got two identical queries except for a change of one condition which
>> cuts the number of rows in half - which also has the effect of eliminating
>> one partition from consideration (partitioned over time and I cut the time
>> period in half).  The query plans are considerably different as a result.
>> The net result is that the fast query is 5x faster than the slow query.  I'm
>> curious if the alternate query plan is inherently faster or is it just a
>> case of the algorithm scaling worse than linearly with the row count, which
>> certainly wouldn't be surprising.  The big win, for me, is that the sort
>> uses vastly less memory.  The slow plan requires work_mem to be 1500MB to
>> even make it 5x worse.  With a more reasonable work_mem (400MB), it drops to
>> something like 15x worse because it has to sort on disk.
>>
>> fast plan: http://explain.depesz.com/s/iZ
>> slow plan: http://explain.depesz.com/s/Dv2
>>
>> query:
>>
>>
>> EXPLAIN ANALYZE SELECT
>>t_lookup.display_name as group,
>>to_char(t_fact.time, 'DD/MM HH24:MI') as category,
>>substring(t_lookup.display_name from 1 for 20) as label,
>>round(sum(t_fact.total_ms)/sum(t_fact.count)) as value
>>FROM
>>portal.providers t_lookup,
>>day_scale_radar_performance_fact t_fact
>>WHERE
>>t_fact.probe_type_num < 3
>>and t_lookup.provider_id = t_fact.provider_id
>>and t_lookup.provider_owner_customer_id =
>> t_fact.provider_owner_customer_id
>>and t_fact.provider_owner_customer_id = 0
>>and t_fact.time between timezone('UTC', '2010-08-18 15:00:00')
>> - interval '30 day' and timezone('UTC', '2010-08-18 15:00:00')
>>GROUP BY
>>t_fact.provider_owner_customer_id, t_fact.provider_id,
>>t_lookup.display_name,
>>t_fact.time
>>ORDER BY
>>t_fact.time
>>
>> table structure:
>>
>>Table "perf_reporting.abstract_radar_performance_fact"
>>   Column   |Type | Modifiers
>> +-+---
>> count  | bigint  | not null
>> total_ms   | bigint  | not null
>> time   | timestamp without time zone | not null
>> market_num | integer | not null
>> country_num| integer | not null
>> autosys_num| integer | not null
>> provider_owner_zone_id | integer | not null
>> provider_owner_customer_id | integer | not null
>> provider_id| integer | not null
>> probe_type_num | integer | not null
>>
>> with individual indexes on the everything from time to the bottom on the
>> child tables
>>
>> and
>>
>>
>> Table "portal.providers"
>>   Column   |Type |   Modifiers
>>
>>
>> +-+
>> btime  | time

Re: [PERFORM] in-memory sorting

2010-08-19 Thread Samuel Gendler
Incidentally, if I set values on the connection before querying, is there an
easy way to get things back to default values or will my code need to know
the prior value and explicitly set it back?  Something like


set work_mem = '512MB'
query
set value = 'default'


or maybe


BEGIN;
set work_mem='512MB'
select query
ROLLBACK;


On Wed, Aug 18, 2010 at 11:52 PM, Scott Marlowe wrote:

> Exactly, it's about the concurrency.  I have a server with 128G ram
> but it runs dozens of queries at a time for hundreds of clients a
> second.  The chance that something big for work_mem might jump up and
> bite me are pretty good there.  Even so, at 16Meg it's not really big
> for that machine, and I might test cranking it up. Note that large
> work_mem can cause the kernel to flush its cache, which means going to
> disk for everybody's data, and all the queries are slow instead of
> one.  Keep an eye on how high work_mem affects your kernel cache.
>
> On Thu, Aug 19, 2010 at 12:38 AM, Samuel Gendler
>  wrote:
> > Yeah, although with 48GB of available memory and not that much
> concurrency,
> > I'm not sure it matters that much. But point taken, I'll see about
> modifying
> > the app such that work_mem gets set on a per-query basis.
> >
> > On Wed, Aug 18, 2010 at 11:24 PM, Scott Marlowe  >
> > wrote:
> >>
> >> On Wed, Aug 18, 2010 at 11:45 PM, Samuel Gendler
> >>  wrote:
> >> > Answered my own question.  Cranking work_mem up to 350MB revealed that
> >> > the in-memory sort requires more memory than the disk sort.
> >>
> >> Note that unless you run VERY few client connections, it's usually
> >> better to leave work_mem somewhere in the 1 to 32Meg range and have
> >> the connection or user or database that needs 350Meg be set there.
> >>
> >> I.e.
> >>
> >> 
> >> set work_mem='512MB';
> >>  >>
> >> OR
> >>
> >> alter user memoryhog set work_mem='512MB';
> >>
> >> OR
> >>
> >> alter database memhogdb set work_mem='512MB';
> >
> >
>
>
>
> --
> To understand recursion, one must first understand recursion.
>


Re: [PERFORM] in-memory sorting

2010-08-19 Thread Samuel Gendler
On Thu, Aug 19, 2010 at 12:06 AM, Samuel Gendler
wrote:

> Incidentally, if I set values on the connection before querying, is there
> an easy way to get things back to default values or will my code need to
> know the prior value and explicitly set it back?  Something like
>
> 
> set work_mem = '512MB'
> query
> set value = 'default'
> 
>
> or maybe
>
> 
> BEGIN;
> set work_mem='512MB'
> select query
> ROLLBACK;
> 
>
>
I guess I'm getting the hang of this whole postgres thing because those were
both wild guesses and both of them appear to work.

set work_mem=default sets it to the value in the config file, and setting
within a transaction and rolling back also restores the original value.



>
> On Wed, Aug 18, 2010 at 11:52 PM, Scott Marlowe 
> wrote:
>
>> Exactly, it's about the concurrency.  I have a server with 128G ram
>> but it runs dozens of queries at a time for hundreds of clients a
>> second.  The chance that something big for work_mem might jump up and
>> bite me are pretty good there.  Even so, at 16Meg it's not really big
>> for that machine, and I might test cranking it up. Note that large
>> work_mem can cause the kernel to flush its cache, which means going to
>> disk for everybody's data, and all the queries are slow instead of
>> one.  Keep an eye on how high work_mem affects your kernel cache.
>>
>> On Thu, Aug 19, 2010 at 12:38 AM, Samuel Gendler
>>  wrote:
>> > Yeah, although with 48GB of available memory and not that much
>> concurrency,
>> > I'm not sure it matters that much. But point taken, I'll see about
>> modifying
>> > the app such that work_mem gets set on a per-query basis.
>> >
>> > On Wed, Aug 18, 2010 at 11:24 PM, Scott Marlowe <
>> scott.marl...@gmail.com>
>> > wrote:
>> >>
>> >> On Wed, Aug 18, 2010 at 11:45 PM, Samuel Gendler
>> >>  wrote:
>> >> > Answered my own question.  Cranking work_mem up to 350MB revealed
>> that
>> >> > the in-memory sort requires more memory than the disk sort.
>> >>
>> >> Note that unless you run VERY few client connections, it's usually
>> >> better to leave work_mem somewhere in the 1 to 32Meg range and have
>> >> the connection or user or database that needs 350Meg be set there.
>> >>
>> >> I.e.
>> >>
>> >> 
>> >> set work_mem='512MB';
>> >> > >>
>> >> OR
>> >>
>> >> alter user memoryhog set work_mem='512MB';
>> >>
>> >> OR
>> >>
>> >> alter database memhogdb set work_mem='512MB';
>> >
>> >
>>
>>
>>
>> --
>> To understand recursion, one must first understand recursion.
>>
>
>


Re: [PERFORM] Is disableing nested_loops a bad idea ?

2010-09-16 Thread Samuel Gendler
Without knowing more about your queries and table structure, it is hard to
say if there is a better solution.  But one thing you should probably
consider doing is just finding the queries where disabling nested loops is
verifiably effective and then just disabling nested loops on that connection
before running the query and then reset after the query completes.  That
way, you won't impact queries that legitimately use nested loops.  Someone
with more experience than I have in tuning the general postgres config may
be able to offer a better solution for getting the query planner to make
better decisions with the global config, but they'll surely need to know a
lot more about your queries in order to do so.

On Thu, Sep 16, 2010 at 1:23 AM, Franck Routier wrote:

> Hi,
>
> I am confronted with a use case where my database mainly does big
> aggregate select (ROLAP), a bunch of batch jobs, and quite few OLTP.
>
> I come into cases where the planner under-estimates the number of rows
> in some relations, chooses to go for nested loops, and takes forever to
> complete the request. (Notice as the side note that Oracle (10g or 11g)
> is not any better on this workload and will sometime go crazy and choose
> a plan that takes hours...)
>
> I've played with statistics, vacuum and so on, but at the end the
> planner is not accurate enough when evaluating the number of rows in
> some complex queries.
>
> Disableing nested loops most of the time solves the performance issues
> in my tests... generally going from 30 sec. down to 1 sec.
>
> So my question is : would it be a very bad idea to disable nested loops
> in production ?
> The way I see it is that it could be a little bit less optimal to use
> merge join or hash join when joining on a few rows, but this is peanuts
> compared to how bad it is to use nested loops when the number of rows
> happens to be much higher than what the planner thinks.
>
> Is this stupid, ie are there cases when merge join or hash join are much
> slower than nested loops on a few rows ?
>
> Thanks in advance,
>
> Franck
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] How does PG know if data is in memory?

2010-09-29 Thread Samuel Gendler
2010/9/29 Fabrício dos Anjos Silva 

>
>
>When setting seq_page_cost and random_page_cost, do I have to consider
> the probability that data will be in memory? Or does seq_page_cost mean
> "sequential access on disk" and random_page_cost mean "random access on
> disk"?
>
>
The reason seq_page_cost and random_page_cost exist as parameters is so that
you can inform the optimizer what the relative costs of those actions are,
which is directly related to the expected size of the filesystem cache,
ratio of total db size to available cache memory, and the performance of
your disk i/o subsystems (and any other disk-related work the host may be
doing).  effective_cache_size allows you to tell postgres how big you
believe all available cache memory is - shared_buffers and OS cache.

As to your question about increasing shared_buffers to be some significant
proportion of available RAM - apparently, that is not a good idea.  I've
seen advice that said you shouldn't go above 8GB for shared_buffers and I've
also seen 12GB suggested as an upper limit, too.  On my host with 48GB of
RAM, I didn't see much difference between 8GB and 12GB on a fairly wide
variety of tests, so mine is set at 8GB with an efective_cache_size of 36GB.


>I appreciate if someone could clear this out.
>
>Thanks!
>
> Fabrício dos Anjos Silva
> LinkCom Soluções em T.I.
>
>


Re: [PERFORM] How does PG know if data is in memory?

2010-10-01 Thread Samuel Gendler
2010/10/1 Fabrício dos Anjos Silva 

>Craig,
>
>I agree with you. Not completely, but I do.
>
>I'm just stuck in a situation where I can't figure out what values to
> use for the parameters. I can't even think of a way on how to test and
> discover that.
>I followed Josh Berkus' GUC spreadsheet and some tutorials on PG wiki,
> but how do I test if my configuration is good or bad? I see in PG log that
> some queries have bad plans, but should I do in order to tell PG to make
> better decisions? I tried different values with no success.
>
> You can set different values for most configuration params on individual db
> connections.  You can test different values for individual slow-running
> queries.  Rather than disabling whole features in the entire database -
> which may well make lots of other queries run less quickly - you can, at the
> very least, just disable those features before running the queries that are
> known to be slow and for which you could not find global values which worked
> well.  Disable sequence plans just before running query x, or boost work_mem
> to a very high value just for query y.  It is also possible that you've
> simply outstripped your hardware's capability.  We had a database with a
> number of tables containing tens of millions of rows and queries which
> frequently required aggregating over whole tables.  Moving from 8Gb of RAM
> to 48GB of RAM (so that a large chunk of the db fits in memory) and from 6
> spindles to 12, and then just modifying the global config to suit the new
> hardware gave us a huge performance boost that we could never have gotten on
> the old hardware, no matter how much tuning of individual queries we did.  I
> was actually able to drop all of the custom config tweaks that we had on
> individual queries, though I'm sure I'll eventually wind up adding some back
> - queries that aggregate over large tables really benefit from a lot of
> work_mem - more than I want to configure globally.
>


Re: [PERFORM] Slow count(*) again...

2010-10-09 Thread Samuel Gendler
On Sat, Oct 9, 2010 at 7:44 PM, Mladen Gogala wrote:

>  The architects of Postgres database would be well advised to operate under
> the assumption that every production database has a competent DBA keeping an
> eye on the database.
>

I'd actually go so far as to say that they have already made this
assumption.  The out of the box config needs modification for all but the
most low-volume applications and postgres really benefits from having some
attention paid to performance.  Not only does tuning the db provide enormous
gains, but it is often possible to dramatically improve query responsiveness
by simply restructuring a query (assuming an aggregating query over a fairly
large table with a few joins thrown in).  My team does not have a competent
DBA (though I've got 15+ years of experience developing on top of various
dbs and certainly don't make overly naive assumptions about how things work)
and the gains that we made, when I finally just sat down and read everything
I could get my hands on about postgres and started reading this list, were
really quite impressive.  I intend to take some of the courses offered by
some of the companies that are active on this list when my schedule allows
in order to expand my knowledge even farther, as a DBA is a luxury we cannot
really afford at the moment.


Re: [PERFORM] Slow count(*) again...

2010-10-11 Thread Samuel Gendler
On Mon, Oct 11, 2010 at 12:54 PM, Neil Whelchel wrote:

>
> 2. You need a slice of the data which requires another scan to the table to
> get, and using the same WHERE clause as above. This seems like a total
> waste,
> because we just did that with the exception of actually fetching the data.
>
> Why do it twice when if there was a way to get a slice using OFFSET and
> LIMIT
> and  get the amount of rows that matched before the OFFSET and LIMIT was
> applied you could do the scan once? I think that this would improve things
> and
> give Postgres an edge over other systems.
>
>
I'd go even farther with number 2 and suggest that a form of offset/limit
which can return the total count OR have a total count be passed in to be
returned the same way as if total count were being computed would make the
use of that api even easier, since you could keep re-using the number
returned the first time without changing the api that gets used depending
upon context.  Of course, you could contrive to set that up via a stored
proc relatively easily by simply doing the count(*) once, then appending it
to each row of the offset/limit query by including it in the select
statement.  Let it optionally receive the total to be used as an input
parameter, which if not null will result in the count(*) block being skipped
in the proc.  You'd incur the full cost of the table scan plus offset/limit
query once, but then not for each and every page.  Since the modified api
you suggest for offset/limit would surely have to perform the table scan
once, that solution really isn't giving much more value than  implementing
as a stored proc other than the flexibility of executing an arbitrary query.
  Modified offset/limit combined with the count_estimate functionality would
be very useful in this circumstance, though - especially if the estimate
would just do a full count if the estimate is under a certain threshold.  A
25% discrepancy when counting millions of rows is a lot less of an issue
than a 25% discrepancy when counting 10 rows.

One issue with an estimation is that you must be certain that the estimate
>= actual count or else the app must always attempt to load the page BEYOND
the last page of the estimate in order to determine if the estimate must be
revised upward. Otherwise, you risk leaving rows out entirely.  Probably ok
when returning search results.  Not so much when displaying a list of
assets.


Re: [PERFORM] Slow count(*) again...

2010-10-11 Thread Samuel Gendler
On Mon, Oct 11, 2010 at 7:19 PM, Greg Smith  wrote:
>
>
> This is a problem for the operating system to solve, and such solutions out
> there are already good enough that PostgreSQL has little reason to try and
> innovate in this area.  I routinely see seq scan throughput double on Linux
> just by tweaking read-ahead from the tiny defaults to a sane value.
>

I spent some time going through the various tuning docs on the wiki whie
bringing some new hardware up and I can't remember seeing any discussion of
tweaking read-ahead at all in the normal performance-tuning references.  Do
you have any documentation of the kinds of tweaking you have done and its
effects on different types of workloads?


Re: [PERFORM] Slow count(*) again...

2010-10-11 Thread Samuel Gendler
On Mon, Oct 11, 2010 at 9:06 PM, Scott Carey wrote:

> I can't speak to documentation, but it is something that helps as your I/O
> subsystem gets more powerful, and how much it helps depends more on your
> hardware, which may have adaptive read ahead on its own, and your file
> system which may be more or less efficient at sequential I/O.  For example
> ext3 out of the box gets a much bigger gain from tuning read-ahead than XFS
> on a DELL PERC6 RAID card (but still ends up slower).
>
>
Geez.  I wish someone would have written something quite so bold as 'xfs is
always faster than ext3' in the standard tuning docs.  I couldn't find
anything that made a strong filesystem recommendation.  How does xfs compare
to ext4?  I wound up on ext4 on a dell perc6 raid card when an unexpected
hardware failure on a production system caused my test system to get thrown
into production before I could do any serious testing of xfs.  If there is a
strong consensus that xfs is simply better, I could afford the downtime to
switch.

As it happens, this is a system where all of the heavy workload is in the
form of sequential scan type load. The OLTP workload is very minimal (tens
of queries per minute on a small number of small tables), but there are a
lot of reporting queries that wind up doing sequential scans of large
partitions (millions to tens of millions of rows).  We've sized the new
hardware so that the most commonly used partitions fit into memory, but if
we could speed the queries that touch less frequently used partitions, that
would be good.  I'm the closest thing our team has to a DBA, which really
only means that I'm the one person on the dev team or the ops team to have
read all of the postgres docs and wiki and the mailing lists.  I claim no
actual DBA experience or expertise and have limited cycles to devote to
tuning and testing, so if there is an established wisdom for filesystem
choice and read ahead tuning, I'd be very interested in hearing it.


Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Samuel Gendler
On Tue, Oct 12, 2010 at 9:02 AM, Scott Carey wrote:
>
>
> However, for large reporting queries and sequential scans, XFS will win in
> the long run if you use the online defragmenter.  Otherwise, your sequential
> scans won't be all that sequential on any file system over time if your
> tables aren't written once, forever, serially.   Parallel restore will
> result in a system that is fragmented -- ext4 will do best at limiting this
> on the restore, but only xfs has online defragmentation.  We schedule ours
> daily and it noticeably improves sequential scan I/O.
>
>
Our reporting tables are written sequentially and left unmodified until
entire partitions are dropped.  However, equivalent partitions tend to get a
little bit larger over time, so newer partitions won't necessarily fit into
the gaps left by prior partition drops, so it is possible that partitions
will be split into two sections, but should still be very sequential, if not
perfectly so.  It would seem that we stumbled into an ideal architecture for
doing this kind of work - mostly by virtue of starting with 8.2.x and having
huge problems with autovacuum and vacuum taking forever and dragging the db
to halt, which caused us to move to an architecture which aggregates and
then drops older data in entire partitions instead of updating aggregates
individually and then deleting rows. Partitions are sized such that most
reporting queries run over entire partitions, too (which was completely
accidental since I had not yet delved into individual query optimization at
the time), so even though we are doing sequential scans, we at least run as
few of them as possible and are able to keep hot data in memory.

--sam


[PERFORM] bulk load performance question

2010-10-12 Thread Samuel Gendler
I've got a few tables that periodically get entirely refreshed via a COPY. I
don't really have a good mechanism for detecting only rows which have
changed so even though the differences are small, a full copy is easiest.
 However, the data includes a primary key column, so I can't simply load
into the existing table and then drop older rows.  So we load into a table
with a different name and then, within a transaction, drop the old and
rename the new.  However, while the transaction will cause a query against
that table to block until the transaction commits, when the transaction
commits, the blocked query will fail with an error message like: ERROR:
 could not open relation with OID 17556

Is there some way to do the drop+rename in a manner which will preserve the
OID or otherwise allow blocked queries to execute correctly once they
unblock?

A secondary issue is that if permissions were granted to a role on the old
table, the new table does not acquire those permissions and they must be
granted again.

The biggest table that gets updated like this is a couple hundred thousand
rows, with maybe a few thousand rows actually changing or being added with
each load.  Suggestions for alternative mechanisms for doing the loading are
welcome.  I'd really rather avoid updating every row in a several hundred
thousand row table, especially without easy upsert functionality. The data
is small enough that selecting everything and then comparing in memory
before updating modified rows is doable, but sure seems like a lot of work
if it can be avoided.

Writing this caused me to think of a possible solution, which appears to
work correctly, but I'd like to confirm it with folks in the know:

Instead of this:

CREATE TABLE mytable_temp...;
COPY INTO mytable_temp...;
BEGIN;
DROP TABLE mytable;
ALTER TABLE mytable_temp RENAME TO mytable;
COMMIT;

Which will cause any overlapping queries to pick up the wrong OID for
mytable and then fail when the transaction commits, I tested this:

COPY INTO mytable_temp;
BEGIN;
ALTER TABLE mytable RENAME TO mytable_old;
ALTER TABLE mytable_temp RENAME TO mytable;
COMMIT;
DROP TABLE mytable_old;

It would appear that any query that uses mytable which overlaps with the
transaction will pick up the OID of the original mytable and then block
until the transaction commits.  WHen the transaction commits, those queries
will successfully run against the original OID (no queries write to this
table except for the bulk load) and will complete, at which time, the table
drop will finally complete.  Meanwhile, any queries which don't overlap (or
perhaps any queries which start after the rename from mytable_temp to
mytable has occurred) will successfully complete against the new table.

The net result appears to be that I will no longer suffer the missing OID
error, which seemed to periodically completely hose a db connection,
requiring that the connection be closed since no subequent queries would
ever succeed, whether they touched the table in question or not. I've only
seen that erroneous behaviour on 8.3 (so far - we only recently upgraded to
8.4.4), but it was fairly mysterious because I've never been able to
replicate it in testing.  I could get a single missing OID error, but never
one that would break all subsequent queries.

Are my assumptions about this correct?


Re: [PERFORM] Query much faster with enable_seqscan=0

2010-10-12 Thread Samuel Gendler
On Tue, Sep 21, 2010 at 4:30 PM, Tom Lane  wrote:

> Ogden  writes:
> > SELECT tr.id, tr.sid
> > FROM
> > test_registration tr,
> > INNER JOIN test_registration_result r on (tr.id =
> r.test_registration_id)
> > WHERE.
> >
> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
> > GROUP BY tr.id, tr.sid
>
> Seeing that tr.id is a primary key, I think you might be a lot better
> off if you avoided the inner join and group by.  I think what you really
> want here is something like
>
> SELECT tr.id, tr.sid
>FROM
>test_registration tr
> WHERE
>
>  tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
> AND EXISTS(SELECT 1 FROM test_registration_result r
>   WHERE tr.id = r.test_registration_id)
>
>regards, tom lane
>
>
Could you explain the logic behind why this structure is better than the
other? Is it always the case that one should just always use the
'exists(select 1 from x...)' structure when trying to strip rows that don't
join or is it just the case when you know that the rows which do join are a
fairly limited subset?  Does the same advantage exist if filtering rows in
the joined table on some criteria, or is it better at that point to use an
inner join and add a where clause to filter the joined rows.

select table1.columns
from  table1, table2
where table1.column = 'some_value'
   and table1.fk = table2.pk
  AND table2.column = 'some_other_value'

versus

select table1.columns
  from table1
where table1.column = 'some_value'
   and exists(select 1 from table2 where table1.fk = table2.pk
  and table2.column ='some_other_value')


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-15 Thread Samuel Gendler
On Thu, Oct 14, 2010 at 8:59 PM, Mladen Gogala wrote:

>  If working with partitioning, be very aware that PostgreSQL optimizer has
> certain problems with partitions, especially with group functions. If you
> want speed, everything must be prefixed with partitioning column: indexes,
> expressions, joins. There is no explicit star schema and creating hash
> indexes will not buy you much, as a matter of fact, Postgres community is
> extremely suspicious of the hash indexes and I don't see them widely used.
> Having said that, I was able to solve the problems with my speed and
> partitioning.
>
>
Could you elaborate on this, please? What do you mean by 'everythin must be
prefixed with partitioning column?'

--sam


Re: [PERFORM] No hash join across partitioned tables?

2010-10-15 Thread Samuel Gendler
On Fri, Oct 15, 2010 at 10:22 PM, Tom Lane  wrote:

> Alvaro Herrera  writes:
> > If we analyze the parent, do we also update the children stats, or is it
> > just that we keep two stats for the parent, one with children and one
> > without, both being updated when the parent is analyzed?
>
> The latter.
>
> The trick here is that we need to fire an analyze on the parent even
> though only its children may have had any updates.
>
> > If the latter's the case, maybe we should modify ANALYZE a bit more, so
> > that we can analyze the whole hierarchy in one go, and store the lot of
> > stats with a single pass (each child alone, the parent alone, the parent
> > plus children).  However it's not real clear how would this work with
> > multiple inheritance levels.
>

An issue with automatically analyzing the entire hierarchy is 'abstract'
table definitions.  I've got a set of tables for storing the same data at
different granularities of aggregation.  Within each granularity, I've got
partitions, but because the set of columns is identical for each
granularity, I've got an abstract table definition that is inherited by
everything.  I don't need or want statistics kept on that table because I
never query across the abstract table, only the parent table of each
aggregation granularity

create table abstract_fact_table (
time timestamp,
measure1 bigint,
measure2 bigint,
measure3 bigint,
fk1 bigint,
fk2 bigint
);

create table minute_scale_fact_table (
} inherits abstract_fact_table;

// Then there are several partitions for minute scale data

create table hour_scale_fact_table (
) inherits abstract_fact_table;

// then several partitions for hour scale data

etc.  I do run queries on the minute_scale_fact_table and
hour_scale_fact_table but never do so on abstract_fact_table.  I could
certainly modify my schema such that the abstract table goes away entirely
easily enough, but I find this easier for new developers to come in and
comprehend, since the similarity between the table definitions is explicit.

I'm glad this topic came up, as I was unaware that I need to run analyze on
the parent partitions separately - and no data is every inserted directly
into the top level of each granularity hierarchy, so it will never fire by
itself.

If I am using ORM and I've got functionality in a common baseclass in the
source code, I'll often implement its mapping in the database via a parent
table that the table for any subclass mapping can inherit from.  Again, I
have no interest in maintaining statistics on the parent table, since I
never query against it directly.


Re: [PERFORM] No hash join across partitioned tables?

2010-10-17 Thread Samuel Gendler
On Sat, Oct 16, 2010 at 8:29 AM, Alvaro Herrera
wrote:

> Excerpts from Samuel Gendler's message of sáb oct 16 02:35:46 -0300 2010:
>
> > An issue with automatically analyzing the entire hierarchy is 'abstract'
> > table definitions.  I've got a set of tables for storing the same data at
> > different granularities of aggregation.  Within each granularity, I've
> got
> > partitions, but because the set of columns is identical for each
> > granularity, I've got an abstract table definition that is inherited by
> > everything.  I don't need or want statistics kept on that table because I
> > never query across the abstract table, only the parent table of each
> > aggregation granularity
>
> Hmm, I think you'd be better served by using LIKE instead of regular
> inheritance.
>
>
Yep.  I inherited the architecture, though, and changing it hasn't been a
high priority.

--sam


Re: [PERFORM] Index scan is not working, why??

2010-10-21 Thread Samuel Gendler
please provide non-default config options on this host plus the same from a
host which is using an index scan, please.  Also, postgresql version, OS,
and all of the other stuff that is asked for in this document:
http://wiki.postgresql.org/wiki/SlowQueryQuestions.  It is impossible to say
why the query planner might be choosing a particular plan without any
insight whatsoever as to how the server is configured.



On Wed, Oct 20, 2010 at 10:25 PM, AI Rumman  wrote:

> I don't know why seq scan is running on the following query where the same
> query is giving index scan on other servers:
> explain analyze
> select *
> from act
> where act.acttype in ( 'Meeting','Call','Task');
>   QUERY PLAN
>
> 
>  Seq Scan on act (cost=0.00..13386.78 rows=259671 width=142) (actual
> time=0.013..484.572 rows=263639 loops=1)
>   Filter: (((acttype)::text = 'Meeting'::text) OR ((acttype)::text =
> 'Call'::text) OR ((acttype)::text = 'Task'::text))
>  Total runtime: 732.956 ms
> (3 rows)
>
> The above query is giving index scan on other servers and even if I rewrite
> the query as follows I got index scan:
> explain analyze
> select *
> from act
> where act.acttype = 'Meeting'
> or act.acttype = 'Call';
>   QUERY PLAN
>
> --
>  Bitmap Heap Scan on act (cost=17.98..1083.80 rows=2277 width=142) (actual
> time=1.901..9.722 rows=4808 loops=1)
>   Recheck Cond: (((acttype)::text = 'Meeting'::text) OR ((acttype)::text =
> 'Call'::text))
>   -> BitmapOr (cost=17.98..17.98 rows=2281 width=0) (actual
> time=1.262..1.262 rows=0 loops=1)
>   -> Bitmap Index Scan on act_acttype_idx (cost=0.00..8.99 rows=1141
> width=0) (actual time=0.790..0.790 rows=3181 loops=1)
>   Index Cond: ((acttype)::text = 'Meeting'::text)
>   -> Bitmap Index Scan on act_acttype_idx (cost=0.00..8.99 rows=1141
> width=0) (actual time=0.469..0.469 rows=1630 loops=1)
>   Index Cond: ((acttype)::text = 'Call'::text)
>  Total runtime: 14.227 ms
> (8 rows)
>
>
> \d act
>   Table "public.act"
>   Column | Type | Modifiers
>
> --++---
>  actid | integer | not null default 0
>  subject | character varying(250) | not null
>  semodule | character varying(20) |
>  acttype | character varying(200) | not null
>  date_start | date | not null
>  due_date | date |
>  time_start | character varying(50) |
>  time_end | character varying(50) |
>  sendnotification | character varying(3) | not null default '0'::character
> varying
>  duration_hours | character varying(2) |
>  duration_minutes | character varying(200) |
>  status | character varying(200) |
>  eventstatus | character varying(200) |
>  priority | character varying(200) |
>  location | character varying(150) |
>  notime | character varying(3) | not null default '0'::character varying
>  visibility | character varying(50) | not null default 'all'::character
> varying
>  recurringtype | character varying(200) |
>  end_date | date |
>  end_time | character varying(50) |
> Indexes:
>   "act_pkey" PRIMARY KEY, btree (actid)
>   "act_acttype_idx" btree (acttype)
>   "act_date_start_idx" btree (date_start)
>   "act_due_date_idx" btree (due_date)
>   "act_eventstatus_idx" btree (eventstatus)
>   "act_status_idx" btree (status)
>   "act_subject_idx" btree (subject)
>   "act_time_start_idx" btree (time_start)
>
> Any idea please.
>


Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-05 Thread Samuel Gendler
On Fri, Nov 5, 2010 at 12:23 PM, Samuel Gendler
wrote:

> On Thu, Nov 4, 2010 at 8:07 AM, Vitalii Tymchyshyn wrote:
>
>> 04.11.10 16:31, Nick Matheson написав(ла):
>>
>>  Heikki-
>>>
>>>>
>>>> Try COPY, ie. "COPY bulk_performance.counts TO STDOUT BINARY".
>>>>
>>>>  Thanks for the suggestion. A preliminary test shows an improvement
>>> closer to our expected 35 MB/s.
>>>
>>> Are you familiar with any Java libraries for decoding the COPY format?
>>> The spec is clear and we could clearly write our own, but figured I would
>>> ask. ;)
>>>
>> JDBC driver has some COPY support, but I don't remember details. You'd
>> better ask in JDBC list.
>>
>>
>>
> The JDBC driver support works fine.  You can pass a Reader or InputStream
> (if I recall correctly, the InputStream path is more efficient.  Or maybe
> the Reader path was buggy.  Regardless, I wound up using an InputStream in
> the driver which I then wrap in a Reader in order to get it line-by-line.
>
> You can write a COPY statement to send standard CSV format - take a look at
> the postgres docs for the COPY statement to see the full syntax.  I then
> have a subclass of BufferedReader which parses each line of CSV and does
> something interesting with it.  I've had it working very reliably for many
> months now, processing about 500 million rows per day (I'm actually COPYing
> out, rather than in, but the concept is the same, rgardless - my
> outputstream is wrapper in a writer, which reformats data on the fly).
>
>
>
I should mention that I found basically no documentation of the copy api in
the jdbc driver in 8.4.  I have no idea if that has changed with 9.x. I had
to figure it out by reading the source code.  Fortunately, it is very
simple:

return ((PGConnection) con).getCopyAPI().copyIn(sql, this.fis);


Where this.fis is an InputStream. There's an alternative copyIn
implementation that takes a Reader instead.  I'm sure the copyOut methods
are the same.


Note: my earlier email was confusing.  copyIn, copies into the db and
receives an InputStream that will deliver data when it is read.  copyOut
copies data from the db and receives an OutputStream which will receive the
data.  I inverted those in my earlier email.


You can look at the source code to the CopyAPI to learn more about the
mechanism.


Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-05 Thread Samuel Gendler
On Thu, Nov 4, 2010 at 8:07 AM, Vitalii Tymchyshyn  wrote:

> 04.11.10 16:31, Nick Matheson написав(ла):
>
>  Heikki-
>>
>>>
>>> Try COPY, ie. "COPY bulk_performance.counts TO STDOUT BINARY".
>>>
>>>  Thanks for the suggestion. A preliminary test shows an improvement
>> closer to our expected 35 MB/s.
>>
>> Are you familiar with any Java libraries for decoding the COPY format? The
>> spec is clear and we could clearly write our own, but figured I would ask.
>> ;)
>>
> JDBC driver has some COPY support, but I don't remember details. You'd
> better ask in JDBC list.
>
>
>
The JDBC driver support works fine.  You can pass a Reader or InputStream
(if I recall correctly, the InputStream path is more efficient.  Or maybe
the Reader path was buggy.  Regardless, I wound up using an InputStream in
the driver which I then wrap in a Reader in order to get it line-by-line.

You can write a COPY statement to send standard CSV format - take a look at
the postgres docs for the COPY statement to see the full syntax.  I then
have a subclass of BufferedReader which parses each line of CSV and does
something interesting with it.  I've had it working very reliably for many
months now, processing about 500 million rows per day (I'm actually COPYing
out, rather than in, but the concept is the same, rgardless - my
outputstream is wrapper in a writer, which reformats data on the fly).


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-22 Thread Samuel Gendler
On Sun, Nov 21, 2010 at 10:21 PM, Humair Mohammed wrote:

>
> Correct, the optimizer did not take the settings with the pg_ctl reload
> command. I did a pg_ctl restart and work_mem now displays the updated value.
> I had to bump up all the way to 2047 MB to get the response below (with
> work_mem at 1024 MB I see 7 seconds response time) and with 2047 MB (which
> is the max value that can be set for work_mem - anything more than that
> results in a FATAL error because of the limit) the results are below. The
> batches and memory usage are reflecting the right behavior with these
> settings. Thanks for everyones input, the result is now matching what SQL
> Server was producing.
>
>
I believe you can set work_mem to a different value just for the duration of
a single query, so you needn't have work_mem set so high if for every query
on the system.  A single query may well use a multiple of work_mem, so you
really probably don't want it that high all the time unless all of your
queries are structured similarly.  Just set work_mem='2047MB'; query; reset
all;

But you should wait until someone more knowledgable than I confirm what I
just wrote.


Re: [PERFORM] Wrong docs on wal_buffers?

2011-01-07 Thread Samuel Gendler
On Thu, Jan 6, 2011 at 8:37 PM, Greg Smith  wrote:

> Josh Berkus wrote:
>
>> We talked about bumping it to 512kB or 1MB for 9.1.  Did that get in?
>> Do I need to write that patch?
>>
>>
>
> If it defaulted to 3% of shared_buffers, min 64K & max 16MB for the auto
> setting, it would for the most part become an autotuned parameter.  That
> would make it 0.75 to 1MB at the standard anemic Linux default kernel
> parameters.  Maybe more than some would like, but dropping shared_buffers
> from 24MB to 23MB to keep this from being ridiculously undersized is
> probably a win.  That percentage would reach 16MB by the time shared_buffers
> was increased to 533MB, which also seems about right to me.  On a really bad
> setup (brief pause to flip off Apple) with only 4MB to work with total,
> you'd end up with wal_buffers between 64 and 128K, so very close to the
> status quo.
>
> Code that up, and we could probably even remove the parameter as a tunable
> altogether.  Very few would see a downside relative to any sensible
> configuration under the current situation, and many people would notice
> better automagic performance with one less parameter to tweak.  Given the
> recent investigations about the serious downsides of tiny wal_buffers values
> on new Linux kernels when using open_datasync, a touch more aggression about
> this setting seems particularly appropriate to consider now.  That's been
> swapped out as the default, but it's still possible people will switch to
> it.
>
>
Does it not seem that this insistence on shipping a default config that
works out of the box on every system incurs a dramatic penalty when it comes
to getting a useful postgres config for a production system?  It seems like
postgres is forcing users to learn all of the fairly specialized and
intricate details of how shared memory is utilized by the write ahead log,
rather than asking them to modify the shared memory settings as part of the
installation procedure on a handful of systems - changes which are
relatively common and easily documented on affected systems. Most sysadmins
will not be unfamiliar with modifying shared memory settings while none
without postgres expertise will have a clue about configuring postgres WAL
logs, shared buffers, and checkpoint segments.  If we're trying to provide
an easy first-use experience for inexperienced users, doesn't it actually
make more sense to require a reasonable amount of shared memory rather than
constraining the install to function with only a tiny amount of shared
memory in a time when it is common even for laptops to have 4 or more
gigabytes of RAM?

I'm sure this argument has probably been done to death on this list (I'm a
relatively recent subscriber), but issues with configuration options with
nearly useless values as a result of shared memory constraints in the
default config sure seem to crop up a lot. Wouldn't so many issues be
resolved if postgres shipped with useful defaults for a modern hardware
config along with instructions for how to adjust shared memory constraints
so that the config will function on each system?

--sam


Re: [PERFORM] Wrong docs on wal_buffers?

2011-01-07 Thread Samuel Gendler
On Fri, Jan 7, 2011 at 7:07 AM, Tom Lane  wrote:

> Samuel Gendler  writes:
> > Does it not seem that this insistence on shipping a default config that
> > works out of the box on every system incurs a dramatic penalty when it
> comes
> > to getting a useful postgres config for a production system?
>
> > I'm sure this argument has probably been done to death on this list (I'm
> a
> > relatively recent subscriber),
>
> No kidding.  Please review the archives.
>
> The short answer is that even though modern machines tend to have plenty
> of RAM, they don't tend to have correspondingly large default settings
> of SHMMAX etc.  If we crank up the default shared-memory-usage settings
> to the point where PG won't start in a couple of MB, we won't accomplish
> a thing in terms of "making it work out of the box"; we'll just put
> another roadblock in front of newbies getting to try it at all.
>
>
Yes, I understand that.  I was trying to make the point that, in an attempt
to make things very easy for novice users, we are actually making them quite
a bit more complex for novice users who want to do anything besides start
the server.  But no need to have the debate again.

--sam


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-01 Thread Samuel Gendler
On Tue, Feb 1, 2011 at 3:44 PM, Mladen Gogala wrote:

> On 2/1/2011 6:03 PM, Andrew Dunstan wrote:
>
>> Whether or not it's bad application design, it's ubiquitous, and we
>> should make it work as best we can, IMNSHO. This often generates
>> complaints about Postgres, and if we really plan for world domination
>> this needs to be part of it.
>>
>
> There are many other things to fix first. One of them would be optimizer
> decisions when a temp table is involved. I would also vote for wait event
> interface, tracing and hints, much rather than speeding up count(*). World
> domination will not be achieved by speeding up count(*), it will be achieved
> by providing overall performance akin to what the player who has already
> achieved the world domination. I believe that the company is called "Oracle
> Corp." or something like that?
>
>
> Mladen Gogala
> Sr. Oracle DBA
>

Don't listen to him.  He's got an oracle bias.  Slashdot already announced
that NoSQL is actually going to dominate the world, so postgres has already
lost that battle.  Everything postgres devs do now is just an exercise in
relational masturbation.  Trust me.


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-01 Thread Samuel Gendler
On Tue, Feb 1, 2011 at 7:40 PM, Mladen Gogala wrote:

> Samuel Gendler wrote:
>
>>
>>  Don't listen to him.  He's got an oracle bias.
>>
> And bad sinuses, too.
>
>   Slashdot already announced that NoSQL is actually going to dominate the
>> world, so postgres has already lost that battle.  Everything postgres devs
>> do now is just an exercise in relational masturbation.  Trust me.
>>
>>  I knew that there is some entertainment value on this list. Samuel, your
> point of view is very..., er, refreshing. Trust me.
>
>
You get that that was sarcasm, right?


>
>


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Samuel Gendler
Neat.  That was my 'you learn something every day' moment.  Thanks.

On Thu, Feb 3, 2011 at 9:06 PM, David Wilson wrote:

>
>
> On Thu, Feb 3, 2011 at 6:05 PM, Robert Haas  wrote:
>
>>
>> If you want to randomly pick 10,000 rows out of all the rows that are
>> going to be inserted in the table without knowing in advance how many
>> there will be, how do you do that?
>>
>
> Reservoir sampling, as the most well-known option:
> http://en.wikipedia.org/wiki/Reservoir_sampling
>
> --
> - David T. Wilson
> david.t.wil...@gmail.com
>


Re: [PERFORM] Anyone tried Flashcache with PostgreSQL?

2011-03-02 Thread Samuel Gendler
On Wed, Mar 2, 2011 at 7:29 AM, Robert Haas  wrote:

> On Mon, Feb 28, 2011 at 2:09 PM, Josh Berkus  wrote:
> > Does anyone have the hardware to test FlashCache with PostgreSQL?
> >
> > http://perspectives.mvdirona.com/2010/04/29/FacebookFlashcache.aspx
> >
> > I'd be interested to hear how it performs ...
>
> It'd be a lot more interesting if it were a write-through cache rather
> than a write-back cache, wouldn't it?
>

Well, it is open source...


Re: [PERFORM] Reason of Slowness of query

2011-03-23 Thread Samuel Gendler
On Tue, Mar 22, 2011 at 11:28 PM, Adarsh Sharma wrote:

> *
> *I perform a join query on it as :
>
> * explain analyze select distinct(p.crawled_page_id) from page_content p ,
> clause2  c where p.crawled_page_id != c.source_id ;
>
> *What it takes more than 1 hour to complete. As I issue the explain
> analyze command and cannot able to wait for output but I send my explain
> output as :
>


please describe what your query is trying to select, as it is possible that
query isn't doing what you think it is.  joining 2 tables where id1 != id2
will create a cross multiple of the two tables such that every row from the
first table is matched with every single row from the second table that
doesn't have a matching id.  Then you are looking for distinct values on
that potentially enormous set of rows.

db_v2=# select * from table1;
 id | value
+---
  1 | 1
  2 | 2
  3 | 3
(3 rows)

db_v2=# select * from table2;
 id | value
+---
  1 | 4
  2 | 5
  3 | 6
(3 rows)

db_v2=# select t1.id, t1.value, t2.id, t2.value from table1 t1, table2 t2
where t1.id != t2.id;
 id | value | id | value
+---++---
  1 | 1 |  2 | 5
  1 | 1 |  3 | 6
  2 | 2 |  1 | 4
  2 | 2 |  3 | 6
  3 | 3 |  1 | 4
  3 | 3 |  2 | 5

So if you have a couple of million rows in each table, you are selecting
distinct over a potentially huge set of data.   If you are actually trying
to find all ids from one table which have no match at all in the other
table, then you need an entirely different query:

db_v2=# insert into table2 (value) values (7);
INSERT 0 1

db_v2=# select * from table2;
 id | value
+---
  1 | 4
  2 | 5
  3 | 6
  4 | 7

db_v2=# select t2.id, t2.value from table2 t2 where not exists (select 1
from table1 t1 where t1.id = t2.id);
 id | value
+---
  4 | 7


[PERFORM] multiple table scan performance

2011-03-29 Thread Samuel Gendler
I've got some functionality that necessarily must scan a relatively large
table.  Even worse, the total workload is actually 3 similar, but different
queries, each of which requires a table scan.  They all have a resultset
that has the same structure, and all get inserted into a temp table.  Is
there any performance benefit to revamping the workload such that it issues
a single:

insert into (...) select ... UNION select ... UNION select

as opposed to 3 separate "insert into () select ..." statements.

I could figure it out empirically, but the queries are really slow on my dev
laptop and I don't have access to the staging system at the moment.  Also,
it requires revamping a fair bit of code, so I figured it never hurts to
ask.  I don't have a sense of whether postgres is able to parallelize
multiple subqueries via a single scan


Re: [PERFORM] multiple table scan performance

2011-03-29 Thread Samuel Gendler
On Tue, Mar 29, 2011 at 5:05 PM, Marti Raudsepp  wrote:

> On Wed, Mar 30, 2011 at 01:16, Samuel Gendler 
> wrote:
>
> You can trick Postgres (8.3.x and newer) into doing it in parallel
> anyway: open 3 separate database connections and issue each of these
> 'INSERT INTO ... SELECT' parts separately.  This way all the queries
> should execute in about 1/3 the time, compared to running them in one
> session or with UNION ALL.
>

That's a good idea, but forces a lot of infrastructural change on me.  I'm
inserting into a temp table, then deleting everything from another table
before copying over.  I could insert into an ordinary table, but then I've
got to deal with ensuring that everything is properly cleaned up, etc.
 Since nothing is actually blocked, waiting for the queries to return, I
think I'll just let them churn for now. It won't make much difference in
production, where the whole table will fit easily into cache.  I just wanted
things to be faster in my dev environment.



>
> Regards,
> Marti
>


Re: [PERFORM] COPY with high # of clients, partitioned table locking issues?

2011-03-30 Thread Samuel Gendler
On Wed, Mar 30, 2011 at 5:48 PM, Bob Lunney  wrote:

> John,
>
> Sorry to hear you're struggling with such underpowered hardware.  ;-)  A
> little more information would be helpful, though:
>
> 1.  What version of PG are you running?
> 2.  What are the constraints on the child tables?
> 3.  How many rows does each copy insert?
> 4.  Are these wrapped in transactions?
> 5.  are the child tables created at the same time the copies are taking
> place?  In the same transaction?
> 6.  Are the indexes in place on the child table(s) when the copies are
> running?  Do they have to be to validate the data?
> 7.  What are the configuration settings for the database?  (Just the ones
> changed from the default, please.)
> 8.  Which file system are you running for the database files?  Mount
> options?
> 9.  Are the WAL files on the same file system?
>
>
10. are you copying directly into the child tables or into the parent and
then redirecting to child tables via a trigger?


Re: [PERFORM] Is there a way to selective dump of records in Postgres 9.0.3?

2011-04-18 Thread Samuel Gendler
On Mon, Apr 18, 2011 at 8:11 AM, Nikolas Everett  wrote:

> This probably isn't the right place to ask that question but you may as
> well try `pg_dump -t PATTERN`.  Man pg_dump for more information on how to
> form that pattern.
>
>
> On Mon, Apr 18, 2011 at 11:05 AM, Sethu Prasad 
> wrote:
>
>> Hi List,
>> I am using PostgreSQL 9.0.3 and I have a need to dump only the selective
>> data from partial list of tables of a database. Is there a straight way to
>> do it with pg_dump or any alternative work around to suggest here?!
>>
>
Or if you need partial data from one table - a WHERE clause - then you can
do:

COPY (select * from whatever where column=value) TO '/tmp/dump.csv' WITH CSV
HEADER

in combination with

pg_dump -f whatever.sql -s -t whatever db

to dump the DDL for the 'whatever' table into whatever.sql.

http://www.postgresql.org/docs/current/static/sql-copy.html

If it is a lot of data, you'll want to edit the whatever.sql file to remove
the CREATE INDEX statements until after you've loaded the table and then
depeneding upon how many indexes there are and how many rows you havem you
may want to parallelize the CREATE INDEX statements by running them in
parallel in multiple psql sessions (and possibly with an artificially large
maintenance_work_mem if that speeds things up)


Re: [PERFORM] index usage on queries on inherited tables

2011-04-27 Thread Samuel Gendler
On Wed, Apr 27, 2011 at 2:11 PM, Joseph Shraibman wrote:

> On 04/27/2011 04:32 PM, Robert Haas wrote:
> > In the first case, PostgreSQL evidently thinks that using the indexes
> > will be slower than just ignoring them.  You could find out whether
> > it's right by trying it with enable_seqscan=off.
>
> My point is that this is just a problem with inherited tables.  It
> should be obvious to postgres that few rows are being returned, but in
> the inherited tables case it doesn't use indexes.  This was just an
> example.  In a 52 gig table I have a "select id from table limit 1 order
> by id desc" returns instantly, but as soon as you declare a child table
> it tries to seq scan all the tables.
>
>
If I'm understanding correctly, this kind of obviates the utility of
partitioning if you structure a warehouse in a traditional manner.  Assuming
a fact table partitioned by time, but with foreign keys to a time dimension,
it is now not possible to gain any advantage from the partitioning if
selecting on columns in the time dimension.

"select * from fact_table f join time_dimension t on f.time_id = t.time_id
where t.quarter=3 and t.year = 2010" will scan all partitions of the fact
table despite the fact that all of the rows would come from 3 partitions,
assuming a partitioning schema that uses one partition for each month.

I use a time id that is calculable from the from the timestamp so it doesn't
need to be looked up, and partitioning on time_id directly is easy enough to
handle, but if I'm understanding the problem, it sounds like nothing short
of computing the appropriate time ids before issuing the query and then
including a 'where f.time_id between x and y' clause to the query will
result in the partitions being correctly excluded.  Is that what people are
doing to solve this problem?  The alternative is to leave a timestamp column
in the fact table (something I tend to do since it makes typing ad-hoc
queries in psql much easier) and partition on that column and then always
include a where clause for that column that is at least as large as the
requested row range.  Both result in fairly ugly queries, though I can
certainly see how I might structure my code to always build queries which
adhere to this.

I'm just in the process of designing a star schema for a project and was
intending to use exactly the structure I described at the top of the email.
Is there a postgres best-practices for solving this problem? There's no way
I can get away without partitioning.  I'm looking at a worst case table of
100,000 rows being written every 5 minutes, 24x7 - 29 million rows per day,
a billion rows per month - with most queries running over a single month or
comparing same months from differing years and quarters - so a month based
partitioning.  Normal case is closer to 10K rows per 5 minutes.

Suggestions?

--sam


Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan

2011-04-28 Thread Samuel Gendler
On Wed, Apr 27, 2011 at 5:19 PM, Sok Ann Yap  wrote:

> On Thu, Apr 28, 2011 at 7:23 AM, Kevin Grittner
>
>
> I understand the need to tune PostgreSQL properly for my use case.
> What I am curious about is, for the data set I have, under what
> circumstances (hardware/workload/cache status/etc) would a sequential
> scan really be faster than an index scan for that particular query?
>

Possibly none on your hardware - if the index is likely to be in memory
along with the actual table rows.  In which case, the cost for index scan
(random page cost) should be made much closer to the cost for sequential
access.  It looks like the planner must use the same strategy on each
iteration of the loop - it can't do index scan for some values and
sequential scan for others, so it must be computing the cost as
sequential_cost * (number of entries(44)) versus random_cost * (number of
entries).  If random page cost is unreasonably high, it's not hard to see
how it could wind up looking more expensive to the planner, causing it to
choose the sequential scan for each loop iteration.  If it were able to
change strategy on each iteration, it would be able to accurately assess
cost for each iteration and choose the correct strategy for that value.  As
soon as you set the costs closer to actual cost for your system, postgres
does make the correct choice.  If there weren't enough memory that postgres
could be 'sure' that the index would remain in cache at least for the
duration of all 44 iterations due to high workload, it is easy to see how
the index scan might become significantly more expensive than the sequential
scan, since the index scan must also load the referenced page from the table
- postgres cannot get values directly from the index.


> To simulate a scenario when nothing is cached, I stopped PostgreSQL,
> dropped all system cache (sync; echo 3 > /proc/sys/vm/drop_caches),
> restarted PostgreSQL, and ran the query. A sequential scan run took
> 13.70 seconds, while an index scan run took 0.34 seconds, which is
> still 40 times faster.
>
> Also, I tried increasing effective_cache_size from 512MB to 3GB (the
> database size is 2+GB), and it still favor sequential scan. The
> estimated costs did not change at all.
>

Greg Smith had this to say in a another thread on this same subject:

effective_cache_size probably doesn't do as much as you suspect.  It is used
for one of the computations for whether an index is small enough that it can
likely be read into memory efficiently.  It has no impact on caching
decisions outside of that.


This is why the cost for random page access must be fairly accurate.Even if
the index is in memory, *it still needs to access the page of data in the
table referenced by the index*, which is why the cost of random access must
be accurate.  That cost is a factor of both the performance of your storage
infrastructure and the cache hit rate and can't really be computed by the
database on the fly.  You seem to be looking at the data which exposes the
fact that random page access is fast and wondering why postgres isn't doing
the right thing when postgres isn't doing the right thing precisely because
it doesn't know that random page access is fast.  Since you don't have
particularly fast storage infrastructure, this is likely a function of cache
hit rate, so you must factor in eventual load on the db when setting this
value.  While it may be fast in a lightly loaded test environment, those
random page accesses will get much more expensive when competing with other
concurrent disk access.

There's another thread currently active on this list (it started on April
12) with subject "Performance" which contains this explanation of what is
going on and why you need to tune these parameters independently of
effective_cache_size:

When the planner decides what execution plan to use,
it computes a 'virtual cost' for different plans and then chooses the
cheapest one.

Decreasing 'random_page_cost' decreases the expected cost of plans
involving index scans, so that at a certain point it seems cheaper than
a plan using sequential scans etc.

You can see this when using EXPLAIN - do it with the original cost
values, then change the values (for that session only) and do the
EXPLAIN only. You'll see how the execution plan suddenly changes and
starts to use index scans.

The problem with random I/O is that it's usually much more expensive
than sequential I/O as the drives need to seek etc. The only case when
random I/O is just as cheap as sequential I/O is when all the data is
cached in memory, because within RAM there's no difference between
random and sequential access (right, that's why it's called Random
Access Memory).

So in the previous post setting both random_page_cost and seq_page_cost
to the same value makes sense, because when the whole database fits into
the memory, there's no difference and index scans are favorable.

In this case (the database is much bigger than the available RAM) this
no longer 

Re: [PERFORM] stored proc and inserting hundreds of thousands of rows

2011-04-30 Thread Samuel Gendler
On Sat, Apr 30, 2011 at 5:12 PM, Jeff Janes  wrote:

>
>
> gist indices are designed to make this type of thing fast, by using
> techniques to rule out most of those comparisons without actually
> performing them.  I don't know enough about the
> guts of either your distance function or the gist indexes to know if
> you can do it this way, but if you can it would certainly be the way
> to go.
>

It is definitely a good idea to consider a gist index for eliminating most
of a large dataset, if applicable.  Do a little reading on the topic and,
hopefully, it's applicability (or not) will become apparent.

However, as someone who has built a number of ad servers over the years, for
several of the larger ad networks, the first thing I'd do is separate your
ad serving from real-time interaction with your database, no matter what the
underlying technology.  If you ad serving is dependent on your database, it
means that hiccups in the database will have an impact on ad serving, which
is rarely tolerable.  And god forbid you should need to take the db down for
a period of maintenance. The reliability and performance required of most ad
servers is often greater than what should reasonably be expected of a
relational database, particularly if there are other processes accessing the
database, as is the case with your system. The first rule of ad serving is
that no outage of backend systems should ever be able to prevent or impact
front end ad serving. Some kind of in-memory cache of doc/ad mappings which
the ad server interacts with will serve you in good stead and will be much
easier to scale horizontally than most relational db architectures lend
themselves to.  If you have an ever increasing set of documents and ads,
you'll inevitably wind up 'sharding' your dataset across multiple db hosts
in order to maintain performance - which creates a number of maintenance
complexities. Easier to maintain a single database and do analytics over a
single data source, but insulate it from the real-time performance
requirements of your ad serving. Even something as simple as a process that
pushes the most recent doc/ad mappings into a memcache instance could be
sufficient - and you can scale your memcache across as many hosts as is
necessary to deliver the lookup latencies that you require no matter how
large the dataset.  Similarly, if you are updating the database from the ad
server with each ad served in order to record an impression or click, you'll
be far better off logging those and then processing the logs in bulk on a
periodic basis.  If subsequent impressions are dependent upon what has
already been served historically, then use your memcache instance (or
whatever structure you eventually choose to utilize) to handle those
lookups.  This gives you the convenience and flexibility of a relational
system with SQL for access, but without the constraints of the capabilities
of a single host limiting real-time performance of the system as a whole.


[PERFORM] setting configuration values inside a stored proc

2011-05-12 Thread Samuel Gendler
I've got a stored proc that constructs some aggregation queries as strings
and then executes them.  I'd like to be able to increase work_mem before
running those queries.  If I set a new value for work_mem within the stored
proc prior to executing my query string, will that actually have an impact
on the query or is work_mem basically a constant once the outer statement
that calls the stored proc has begun?  I'd just test, but it will take hours
for me to grab a copy of production data and import into a new db host for
testing. I've already started that process, but I'm betting I'll have an
answer by the time it completes.  It's just the difference between modifying
the application which calls the procs (and doing a full software release in
order to do so or else waiting a month to go in the next release) vs
modifying the procs themselves, which requires only db a update.

--sam


Re: [PERFORM] setting configuration values inside a stored proc

2011-05-13 Thread Samuel Gendler
On Fri, May 13, 2011 at 1:28 AM, Andres Freund  wrote:

> Hi,
>
> On Friday, May 13, 2011 01:10:19 AM Samuel Gendler wrote:
>
> I would suggest doing ALTER FUNCTION blub(blarg) SET work_mem = '512MB';
>
>
Ah!  That's perfect and very convenient.  Thanks.

--sam


Re: [PERFORM] SORT performance - slow?

2011-05-19 Thread Samuel Gendler
Plus the entire explain analyze output into the form at
http://explain.depesz.com/ and you'll get a nicely human readable output
which shows both the inclusive and exclusive time spent on each step of the
query.  It also highlights any steps which show inaccurate statistics.  It
will also give you a perma-link which you can use in emails so that everyone
else can see the pretty version, too.



On Thu, May 19, 2011 at 2:13 PM, Strange, John W <
john.w.stra...@jpmchase.com> wrote:

> Am I reading this right in that the sort is taking almost 8 seconds?
>
>
>
> *"GroupAggregate  (cost=95808.09..95808.14 rows=1 width=142) (actual
> time=14186.999..14694.524 rows=315635 loops=1)"*
>
> "  Output: sq.tag, sq.instrument, s.d1, s.d2, s.d3, s.d4, s.d5, s.d6, s.d7,
> s.d8, s.d9, s.d10, sum(sq.v)"
>
> "  Buffers: shared hit=9763"
>
> *"  ->  Sort  (cost=95808.09..95808.09 rows=1 width=142) (actual
> time=14186.977..14287.068 rows=315635 loops=1)"*
>
> "Output: sq.tag, sq.instrument, s.d1, s.d2, s.d3, s.d4, s.d5, s.d6,
> s.d7, s.d8, s.d9, s.d10, sq.v"
>
> "Sort Key: sq.tag, sq.instrument, s.d1, s.d2, s.d3, s.d4, s.d5,
> s.d6, s.d7, s.d8, s.d9, s.d10"
>
> "Sort Method:  quicksort  Memory: 79808kB"
>
> "Buffers: shared hit=9763"
>
> *"->  Hash Join  (cost=87341.48..95808.08 rows=1 width=142)
> (actual time=6000.728..12037.492 rows=315635 loops=1)"*
>
> "  Output: sq.tag, sq.instrument, s.d1, s.d2, s.d3, s.d4, s.d5,
> s.d6, s.d7, s.d8, s.d9, s.d10, sq.v"
>
> "  Hash Cond: (s.scenarioid = sq.scenarioid)"
>
> "  Buffers: shared hit=9763"
>
>
>
>
>
>
> ___
> |* John W. Strange* | Vice President | Global Commodities Technology
> | J.P. Morgan | 700 Louisiana, 11th Floor | T: 713-236-4122 | C:
> 281-744-6476 | F: 713 236-
> | john.w.stra...@jpmchase.com | jpmorgan.com
>
>
>
> This communication is for informational purposes only. It is not intended
> as an offer or solicitation for the purchase or sale of any financial
> instrument or as an official confirmation of any transaction. All market
> prices, data and other information are not warranted as to completeness or
> accuracy and are subject to change without notice. Any comments or
> statements made herein do not necessarily reflect those of JPMorgan Chase &
> Co., its subsidiaries and affiliates. This transmission may contain
> information that is privileged, confidential, legally privileged, and/or
> exempt from disclosure under applicable law. If you are not the intended
> recipient, you are hereby notified that any disclosure, copying,
> distribution, or use of the information contained herein (including any
> reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any
> attachments are believed to be free of any virus or other defect that might
> affect any computer system into which it is received and opened, it is the
> responsibility of the recipient to ensure that it is virus free and no
> responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and
> affiliates, as applicable, for any loss or damage arising in any way from
> its use. If you received this transmission in error, please immediately
> contact the sender and destroy the material in its entirety, whether in
> electronic or hard copy format. Thank you. Please refer to
> http://www.jpmorgan.com/pages/disclosures for disclosures relating to
> European legal entities.
>


Re: [PERFORM] The shared buffers challenge

2011-05-26 Thread Samuel Gendler
On Thu, May 26, 2011 at 4:10 PM, Greg Smith  wrote:

>
> As for figuring out how this impacts more complicated cases, I hear
> somebody wrote a book or something that went into pages and pages of detail
> about all this.  You might want to check it out.
>
>
I was just going to suggest that there was significant and detailed
documentation of this stuff in a certain book, a well-thumbed copy of which
should be sitting on the desk of anyone attempting any kind of postgres
performance tuning.


Re: [PERFORM] picking a filesystem

2011-05-31 Thread Samuel Gendler
On Tue, May 31, 2011 at 8:35 AM, Robert Haas  wrote:

>
> So if you're running a RHEL5.4 or RHEL5.5 system, are you basically
> stuck with ext3?  I'm not sure if I'm remembering correctly, but ISTM
> that you've been uncomfortable with BOTH ext4 and XFS prior to RHEL6;
> but OK with both beginning with RHEL6.
>
> Also, any tips on mount options for XFS/ext4/ext3?
>

Greg's book has a whole chapter that goes through the pros and cons of each
type of fs and offers suggestions for configuring most of them for postgres.
 I haven't actually read the chapter in detail yet, so I won't try to
summarize its content here. It appeared to be pretty comprehensive during my
quick scan of the chapter


Re: [PERFORM] Oracle v. Postgres 9.0 query performance

2011-06-08 Thread Samuel Gendler
On Wed, Jun 8, 2011 at 12:03 PM, Tony Capobianco  wrote:

> My current setting is 22G.  According to some documentation, I want to
> set effective_cache_size to my OS disk cache + shared_buffers.  In this
> case, I have 4 quad-core processors with 512K cache (8G) and my
> shared_buffers is 7680M.  Therefore my effective_cache_size should be
> approximately 16G?  Most of our other etl processes are running fine,
> however I'm curious if I could see a significant performance boost by
> reducing the effective_cache_size.
>
>
disk cache, not CPU memory cache.  It will be some significant fraction of
total RAM on the host.  Incidentally, 16 * 512K cache = 8MB, not 8GB.

http://en.wikipedia.org/wiki/CPU_cache


Re: [PERFORM] poor performance when recreating constraints on large tables

2011-06-08 Thread Samuel Gendler
On Wed, Jun 8, 2011 at 12:28 PM, Robert Haas  wrote:

> On Mon, Jun 6, 2011 at 6:10 PM, Mike Broers  wrote:
> > Thanks for the suggestion, maintenance_work_mem is set to the default of
> > 16MB on the host that was taking over an hour as well as on the host that
> > was taking less than 10 minutes.  I tried setting it to 1GB on the faster
> > test server and it reduced the time from around 6-7 minutes to about
> 3:30.
> >  this is a good start, if there are any other suggestions please let me
> know
> > - is there any query to check estimated time remaining on long running
> > transactions?
>
> Sadly, no.  I suspect that coming up with a good algorithm for that is
> a suitable topic for a PhD thesis.  :-(
>
>
The planner knows how many rows are expected for each step of the query
plan, so it would be theoretically possible to compute how far along it is
in processing a query based on those estimates, wouldn't it?  Combine
percentage complete with time elapsed and you could get somewhat close if
the stats are accurate, couldn't you?  Of course, I have no clue as to the
internals of the planner and query executor which might or might not make
such tracking of query execution possible.


Re: [PERFORM] poor performance when recreating constraints on large tables

2011-06-08 Thread Samuel Gendler
On Wed, Jun 8, 2011 at 12:53 PM, Kevin Grittner  wrote:

> Samuel Gendler  wrote:
>
> > The planner knows how many rows are expected for each step of the
> > query plan, so it would be theoretically possible to compute how
> > far along it is in processing a query based on those estimates,
> > wouldn't it?
>
> And it is sometimes off by orders of magnitude.  How much remaining
> time do you report when the number of rows actually processed so far
> is five times the estimated rows that the step would process?  How
> about after it chugs on from there to 20 time she estimated row
> count?  Of course, on your next query it might finish after
> processing only 5% of the estimated rows
>

Sure, but if it is a query that is slow enough for a time estimate to be
useful, odds are good that stats that are that far out of whack would
actually be interesting to whoever is looking at the time estimate, so
showing some kind of 'N/A' response once things have gotten out of whack
wouldn't be unwarranted.  Not that I'm suggesting that any of this is a
particularly useful exercise.  I'm just playing with the original thought
experiment suggestion.


>
> -Kevin
>


Re: [PERFORM] poor performance when recreating constraints on large tables

2011-06-09 Thread Samuel Gendler
On Wed, Jun 8, 2011 at 10:57 PM, Greg Smith  wrote:

> Samuel Gendler wrote:
>
>> Sure, but if it is a query that is slow enough for a time estimate to be
>> useful, odds are good that stats that are that far out of whack would
>> actually be interesting to whoever is looking at the time estimate, so
>> showing some kind of 'N/A' response once things have gotten out of whack
>> wouldn't be unwarranted.
>>
>
> The next question is what are you then going to do with that information?
>
> The ability to track some measure of "progress" relative to expectations is
> mainly proposed as something helpful when a query has gone out of control.
>  When that's happened, the progress meter normally turns out to be
> fundamentally broken; the plan isn't happening at all as expected.  So, as
> you say, you will get an "N/A" response that says the query is out of
> control, when in the cases where this sort of thing is expected to be the
> most useful.
>

Well, in my case, the use I'd put it to is a query that is necessarily long
running (aggregations over large quantities of data that take a minute or
two to complete), and the stats are accurate enough that it would
potentially let me show a progress meter of some kind in the few places
where such queries are run interactively rather than on a schedule.  Not
that I'm really thinking seriously about doing so, but there are places in
code I maintain where such a thing could prove useful if its accuracy is
reasonable for the queries in question.  ENough to at least toy with the
suggested sequence method and see what happens when I've got some spare time
to play.


Re: [PERFORM] Large rows number, and large objects

2011-06-19 Thread Samuel Gendler
On Sat, Jun 18, 2011 at 9:06 PM, Jose Ildefonso Camargo Tolosa <
ildefonso.cama...@gmail.com> wrote:

> Greetings,
>
> I have been thinking a lot about pgsql performance when it is dealing
> with tables with lots of rows on one table (several millions, maybe
> thousands of millions).  Say, the Large Object use case:
>
> one table has large objects (have a pointer to one object).
> The large object table stores the large object in 2000 bytes chunks
> (iirc), so, if we have something like 1TB of data stored in large
> objects, the large objects table would have something like 550M rows,
> if we get to 8TB, we will have 4400M rows (or so).
>
> I have read at several places that huge tables should be partitioned,
> to improve performance now, my first question comes: does the
> large objects system automatically partitions itself? if no: will
> Large Objects system performance degrade as we add more data? (I guess
> it would).
>
> Now... I can't fully understand this: why does the performance
> actually goes lower? I mean, when we do partitioning, we take a
> certain parameter to "divide" the data,and then use the same parameter
> to issue the request against the correct table... shouldn't the DB
> actually do something similar with the indexes? I mean, I have always
> thought about the indexes, well, exactly like that: approximation
> search, I know I'm looking for, say, a date that is less than
> 2010-03-02, and the system should just position itself on the index
> around that date, and scan from that point backward... as far as my
> understanding goes, the partitioning only adds like this "auxiliary"
> index, making the system, for example, go to a certain table if the
> query goes toward one particular year (assuming we partitioned by
> year), what if the DB actually implemented something like an Index for
> the Index (so that the first search on huge tables scan on an smaller
> index that points to a position on the larger index, thus avoiding the
> scan of the large index initially).
>
> Well I'm writing all of this half-sleep now, so... I'll re-read it
> tomorrow... in the meantime, just ignore anything that doesn't make a
> lot of sense :) .
>

Partitioning helps in a number of ways.  First, if running a query which
must scan an entire table, if the table is very large, that scan will be
expensive.  Partitioning can allow the query planner to do a sequential scan
over just some of the data and skip the rest (or process it via some other
manner, such as an index lookup).  Also, the larger an index is, the more
expensive the index is to maintain.  Inserts and lookups will both take
longer.  Partitioning will give you n indexes, each with m/n entries
(assuming fairly even distribution of data among partitions), so any given
index will be smaller, which means inserts into a partition will potentially
be much faster.  Since large tables often also have very high insert rates,
this can be a big win.  You can also gain better resource utilization by
moving less frequently used partitions onto slower media (via a tablespace),
freeing up space on your fastest i/o devices for the most important data.  A
lot of partitioning tends to happen by time, and the most frequently run
queries are often on the partitions containing the most recent data, so it
often can be very beneficial to keep only the most recent partitions on
fastest storage.  Then there is caching.  Indexes and tables are cached by
page.  Without clustering a table on a particular index, the contents of a
single page may be quite arbitrary.  Even with clustering, depending upon
the usage patterns of the table in question, it is entirely possible that
any given page may only have some fairly small percentage of highly relevant
data if the table is very large.  By partitioning, you can (potentially)
ensure that any given page in cache will have a higher density of highly
relevant entries, so you'll get better efficiency out of the caching layers.
 And with smaller indexes, it is less likely that loading an index into
shared buffers will push some other useful chunk of data out of the cache.

As for the large object tables, I'm not sure about the internals.  Assuming
that each table gets its own table for large objects, partitioning the main
table will have the effect of partitioning the large object table, too -
keeping index maintenance more reasonable and ensuring that lookups are as
fast as possible.  There's probably a debate to be had on the benefit of
storing very large numbers of large objects in the db, too (as opposed to
keeping references to them in the db and actually accessing them via some
other mechanism.  Both product requirements and performance are significant
factors in that discussion).

As for your suggestion that the db maintain an index on an index, how would
the database do so in an intelligent manner? It would have to maintain such
indexes on every index and guess as to which values to use as boundaries for
each bucket. P

Re: [PERFORM] generating a large XML document

2011-06-20 Thread Samuel Gendler
On Sun, Jun 19, 2011 at 11:36 PM, Julius Tuskenis  wrote:

> Hello,
>
> I'm sorry to write again, but as I received no answer I wonder if there is
> a better mailing list to address concerning this question? Or is there
> nothing to be done about the speed of xmlagg ?. Please let me as no answer
> is the worst answer to get


I played around a little after your initial post, but I'm not a postgresql
developer and I have no familiarity with the internals, so everything I'm
about to write is pure conjecture.

Programming languages that treat strings as immutable often suffer from this
kind of problem.  WIth each string concatenation, both strings have to be
copied to a new location in memory.  I suspect that this is happening in
this case.  The only viable fix would be to use a buffer that is mutable and
append into it rather than doing raw string concatenation.  If this is truly
the problem, I don't see that you have much choice but to re-implement
xmlagg in one of the available languages such that it uses a buffer instead
of immutable string concatenation.  It is probably that the xmlelement
function doesn't need to be rewritten, since it is only concatenating
relatively short strings.  It is less efficient than appending to a buffer,
but shouldn't get catastrophically slow.  But xmlagg is concatenating many
rows.  If each concatenation results in a full copy of the already
concatenated rows, you can see where performance would drop
catastrophically.

Here's the first google result for 'mutable string python' that I found,
which addresses this problem in python.
http://www.skymind.com/~ocrow/python_string/ You could rewrite the aggregate
function in plpython using one of the techniques in that file. I just
attempted to find the source to xm_agg in the postgresql source code and it
is pretty well obfuscated, so I don't see that being much help.  I wasn't
even able to determine if the problem actually is immutable string
concatenation.  So we don't know if xmlagg is building a DOM tree and then
serializing it once (which would imply that XMLELEMENT returns a single DOM
node, or if it is all workign with strings.  Barring answers from someone
who actually knows, I can only suggest that you read through the
documentation on writing an aggregate function and then do some
experimentation to see what you get when you use your own aggregate function
instead of xml_agg.  Hopefully, such experimentation won't take long to
determine if re-implementing xml_agg with a mutable buffer is a viable
option.

--sam


[PERFORM] near identical queries have vastly different plans

2011-06-30 Thread Samuel Gendler
Here's the setup:

I'm cross joining two dimensions before left outer joining to a fact table
so that I can throw a default value into the resultset wherever a value is
missing from the fact table.  I have a time dimension and another dimension.
 I want the cross join to only cross a subset of rows from each dimension,
so I have some filters in the ON clause of the inner join.

I've got 2 nearly identical queries that perform incredibly differently.
 The only difference between them is that the fast query uses two text
columns from the sensor dimension when filtering the inner join while the
slow query uses bigint id values, where the ids correlate to the text
strings in the fast query.  The string columns are device_name and
dpoint_name, where device_name is unique but many devices have dpoints with
the same name.  The bigint columns are device_id and dpoint_id, where both
device_id and dpoint_id map to a single row.  There are indexes on all of
them, so the only real difference is that an index on dpoint_name will have
more rows for a given value than the index on dpoint_id because dpoints with
the same name will still have different ids if attached to different
devices.  In both queries, exactly 35 rows from the sensor dimension will be
returned.  Note also that I'm aggregating fact table rows via avg() because
I have duplicate rows in the fact table, but I want to extract only a single
row for each time and sensor row.  The left outer join allows me to populate
any missing rows with a default value and the aggregation via avg() combines
duplicate rows so that they appear only once.

I can easily just use the fast query, but my concern is that without
understanding why the queries are executing differently, I might suddenly
discover my code using the slow query plan instead of the fast one at some
point in the future, even when using the varchar columns instead of the
bigint ids for filtering.  They differ in execution speed by about 5x (which
translates to many minutes), so that would be a big problem.  If I could
figure out either a query structure or an index structure which will force
the fast query plan, I'd be much happier.  So that is what I am looking for
- an explanation of how I might convince the planner to always use the fast
plan.

Its a CentOS host - Quad core AMD Opteron 1.6Ghz, 2GB of RAM, single 75GB
disk with everything on it.  I'm not looking for outright performance, just
relative performance between the 2 queries.  DB config was taken wholesale
from pg_tune with no changes, IIRC.  It isn't a production box.  I have yet
to spec out production hardware for this application, so I don't know what
it will eventually be.

# select version();
 version

--
 PostgreSQL 8.4.5 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20080704 (Red Hat 4.1.2-48), 64-bit

 name | current_setting

--+-
 checkpoint_completion_target | 0.9
 checkpoint_segments  | 64
 default_statistics_target| 100
 effective_cache_size | 1408MB
 lc_collate   | en_US.UTF-8
 lc_ctype | en_US.UTF-8
 log_directory| pg_log
 log_filename | postgresql-%a.log
 log_rotation_age | 1d
 log_rotation_size| 0
 log_truncate_on_rotation | on
 logging_collector| on
 maintenance_work_mem | 240MB
 max_connections  | 20
 max_stack_depth  | 2MB
 port | 5432
 server_encoding  | UTF8
 shared_buffers   | 480MB
 TimeZone | UTC
 wal_buffers  | 32MB
 work_mem | 48MB


time dimension looks like this:

  Column  |Type | Modifiers
--+-+---
 time_zone| character varying(64)   |
 tstamp   | timestamp without time zone |
 tstamptz | timestamp with time zone|
 local_key| bigint  |
 utc_key  | bigint  |
Indexes:
"idx_time_ny_local_key" btree (local_key)
"idx_time_ny_tstamp" btree (tstamp) CLUSTER
"idx_time_ny_tstamptz" btree (tstamptz)
"idx_time_ny_utc_key" btree (utc_key)

plus lots of other columns (approx 25 columns, mostly integer)  that aren't
relevant to this query.  It has 420,480 rows where each row is 300 seconds
after the previous row.  local_key and utc_key are bigint columns in the
form MMddHHmm (utc_key in UTC time and the other in local time for the
time zone represented by the table) and tstamp is the same value as an
actual timestamp type. tstamptz is jus

Re: [PERFORM] near identical queries have vastly different plans

2011-06-30 Thread Samuel Gendler
On Thu, Jun 30, 2011 at 1:53 AM, Samuel Gendler
wrote:

> If I could figure out either a query structure or an index structure which
> will force the fast query plan, I'd be much happier.  So that is what I am
> looking for - an explanation of how I might convince the planner to always
> use the fast plan.
>
>
For the record, "set enable_nestloop=false" does force a more effective plan
when using the 'slow' query.  It is not quite identical in structure - it
materializes the other side of the query, resulting in about 10% less
performance - but it is close enough that I'm tempted to disable nestloop
whenever I run the query in the hope that it will prevent the planner from
switching to the really awful plan.  I know that's kind of a drastic
measure, so hopefully someone out there will suggest a config fix which
accomplishes the same thing without requiring special handling for this
query, but at least it works (for now).

Incidentally, upgrading to 9.0.x is not out of the question if it is
believed that doing so might help here.  I'm only running 8.4 because I've
got another project in production on 8.4 and I don't want to have to deal
with running both versions on my development laptop.  But that's a pretty
weak reason for not upgrading, I know.

--sam


Re: [PERFORM] near identical queries have vastly different plans

2011-06-30 Thread Samuel Gendler
On Thu, Jun 30, 2011 at 1:53 AM, Samuel Gendler
wrote:

> If I could figure out either a query structure or an index structure which
> will force the fast query plan, I'd be much happier.  So that is what I am
> looking for - an explanation of how I might convince the planner to always
> use the fast plan.
>
>
I eventually noticed that constraint_exclusion didn't seem to be working and
remembered that it only works when the filter is on the partitioned table
itself, not when the table is being filtered via a join.  Adding a where
clause which limits f.time_fk to the appropriate range not only fixed
constraint_exclusion behaviour, but also caused the query planner to produce
the same plan for both versions of the query - a plan that was an order of
magnitude faster than the previous fastest plan.   It went from 20 seconds
to just less than 2 seconds.


Re: [PERFORM] near identical queries have vastly different plans

2011-07-01 Thread Samuel Gendler
On Fri, Jul 1, 2011 at 3:46 PM, Tom Lane  wrote:

> Samuel Gendler  writes:
> > I've got 2 nearly identical queries that perform incredibly differently.
>
> The reason the slow query sucks is that the planner is estimating at
> most one "s" row will match that complicated AND/OR condition, so it
> goes for a nestloop.  In the "fast" query there is another complicated
> AND/OR filter condition, but it's not so far off on the number of
> matching rows, so you get a better plan choice.  Can't tell from the
> given information whether the better guess is pure luck, or there's some
> difference in the column statistics that makes it able to get a better
> estimate for that.
>
> In general, though, you're skating on thin ice anytime you ask the
> planner to derive statistical estimates about combinations of correlated
> columns --- and these evidently are correlated.  Think about refactoring
> the table definitions so that you're only testing a single column, which
> ANALYZE will be able to provide stats about.  Or maybe you can express
> it as a test on a computed expression, which you could then keep an
> index on, prompting ANALYZE to gather stats about that.
>

Thanks.  There is actually already a column in s which is a primary key for
the 2 columns that are currently being tested for.  I didn't write the
application code which generates the query, so can't say for sure why it is
being generated as it is, but I'll ask the engineer in question to try the
primary key column instead and see what happens.


Re: [PERFORM] Query in 9.0.2 not using index in 9.0.0 works fine

2011-07-06 Thread Samuel Gendler
On Tue, Jul 5, 2011 at 1:50 PM, Matthias Howell
wrote:

>  I've just copied a database from one linux machine to another.
>
> "Fast" machine is CentOS 5.5, running postgres 9.0.0 64 bit
>
> ** **
>
> "Slow" machine is Red Hat 5.5 running postgres 9.0.2 64 bit.
>
> ** **
>
> Here's the query:
>
> explain analyze select sentenceid from sentences where sentenceid = any (
> array(select sentenceid from sentences where docid =
> any(array[696374,696377])))
>
> ** **
>
> on the fast machine this is the explain:
>
> "Bitmap Heap Scan on sentences  (cost=924.41..964.47 rows=10 width=8)
> (actual time=0.748..0.800 rows=41 loops=1)"
>
> "  Recheck Cond: (sentenceid = ANY ($0))"
>
> "  InitPlan 1 (returns $0)"
>
> "->  Bitmap Heap Scan on sentences  (cost=12.93..879.27 rows=220
> width=8) (actual time=0.199..0.446 rows=41 loops=1)"
>
> "  Recheck Cond: (docid = ANY ('{696374,696377}'::bigint[]))"
>
> "  ->  Bitmap Index Scan on sentdocs  (cost=0.00..12.87 rows=220
> width=0) (actual time=0.134..0.134 rows=41 loops=1)"
>
> "Index Cond: (docid = ANY ('{696374,696377}'::bigint[]))"*
> ***
>
> "  ->  Bitmap Index Scan on pk_sentences  (cost=0.00..45.14 rows=10
> width=0) (actual time=0.741..0.741 rows=41 loops=1)"
>
> "Index Cond: (sentenceid = ANY ($0))"
>
> "Total runtime: 0.925 ms"
>
> ** **
>
> And on the slow machine:
>
> "Seq Scan on sentences  (cost=1000608.90..1445893.60 rows=10
> width=8) (actual time=2679.412..6372.393 rows=41 loops=1)"
>
> "  Filter: (sentenceid = ANY ($0))"
>
> "  InitPlan 1 (returns $0)"
>
> "->  Bitmap Heap Scan on sentences  (cost=10.73..608.90 rows=152
> width=8) (actual time=0.044..0.076 rows=41 loops=1)"
>
> "  Recheck Cond: (docid = ANY ('{696374,696377}'::integer[]))"
>
> "  ->  Bitmap Index Scan on sentdocs  (cost=0.00..10.69 rows=152
> width=0) (actual time=0.037..0.037 rows=41 loops=1)"
>
> "Index Cond: (docid = ANY ('{696374,696377}'::integer[]))"
> 
>
> "Total runtime: 6372.468 ms"
>
> ** **
>
> The configurations were identical initially, I've changed those on the slow
> machine but to no avail.
>
> ** **
>
> there is an index on sentences on the docid in both systems.
>
> ** **
>
> I'm at quite a loss as to how/why this is occurring and what to do about
> it.
>
> ** **
>
> I tried disabling seqscan on the slow machine but that also made no
> difference.
>
> ** **
>
> Any help/ideas much appreciated.
>

Have you done a vacuum analyze since loading the data on the slow db?  Are
statistics settings the same between the two hosts?  It's interesting that
one version coerces the docid values to bigint and the other coerces to
integer, but that shouldn't impact the sentenceid comparison, which have to
be a consistent type since it is comparing sentenceid to sentenceid. Any
reason why this isn't collapsed down to 'select distinct sentenceid from
sentences where docid = any(array[696374,696377])' - is there a benefit to
the more complex structure?  For that matter, why not 'where docid in (
696374,696377)'

I didn't see anything in the docs about distinct or any(array) that would
indicate that that form should be preferred over IN ()


Re: [PERFORM] Query in 9.0.2 not using index in 9.0.0 works fine

2011-07-06 Thread Samuel Gendler
On Wed, Jul 6, 2011 at 5:50 AM, Matthias Howell
wrote:

>
>
> However, in the end, it was user brain damage.
>
> ** **
>
> It does use the doc id index for the subquery, but for some reason, the
> primary key on sentences - the sentenceid - was not set.  So in fact, there
> is no index.
>
> ** **
>
> Machines vindicated once again.
>

For the record, if you follow the instructions for submitting slow query
questions, we'd likely have spotted it very quickly if you hadn't spotted it
yourself while doing the cut and paste.  The instructions ask for table
definitions, so you'd likely have noticed the missing index when you copied
those into your email.  The link (
http://wiki.postgresql.org/wiki/SlowQueryQuestions ) is right there on the
mailing list page at postgresql.org.  It's always a toss-up whether to
attempt to answer a question like yours or just respond with a link to that
page when the relevant info is missing ;-)

--sam


Re: [PERFORM] query total time im milliseconds

2011-07-10 Thread Samuel Gendler
On Sun, Jul 10, 2011 at 4:41 AM, Radhya sahal  wrote:

> Dear all ,
> could any one help me?
> when i use pgadmin to exceute a query it shows the total time for query ..
> such as
> (select * form table_name.)query total time is for example 100 ms
> i want to know the command that can retentive the query total time
> in millisecond
> if i connect with postgresql from java using JDBC
> i need the query total time necessaryto use it in my project
> i don't want run explian just query
> thank's
>


long startTime = System.currentTimeMillis();
//execute query
long executionTime = System.currentTimeMillis() - startTime;


Re: [PERFORM] Statistics and Multi-Column indexes

2011-07-10 Thread Samuel Gendler
On Sun, Jul 10, 2011 at 2:16 PM, lars  wrote:

> I know this has been discussed various times...
>
> We are maintaining a large multi tenant database where *all* tables have a
> tenant-id and all indexes and PKs lead with the tenant-id.
> Statistics and counts for the all other columns are only really meaningful
> within the context of the tenant they belong to.
>
> There appear to be five options for me:
> 1. Using single column indexes on all interesting columns and rely on
> PostgreSQLs bitmap indexes to combine them (which are pretty cool).
> 2. Use multi column indexes and accept that sometimes Postgres pick the
> wrong index (because a non-tenant-id
> column might seem highly selective over the table, but it is not for a
> particular tenant - or vice versa).
> 3. Use a functional index that combines multiple columns and only query via
> these, that causes statistics
> gathering for the expression.
> I.e. create index i on t((tenantid||column1)) and SELECT ... FROM t WHERE
> tenantid||column1 = '...'
> 4. Play with n_distinct and/or set the statistics for the inner columns to
> some fixed values that lead to the plans that we want.
> 5. Have a completely different schema and maybe a database per tenant.
>
>
What about partitioning tables by tenant id and then maintaining indexes on
each partition independent of tenant id, since constraint exclusion should
handle filtering by tenant id for you.  That seems like a potentially more
tolerable variant of #5 How many tenants are we talking about?  I gather
partitioning starts to become problematic when the number of partitions gets
large.


Re: [PERFORM] cpu comparison

2011-07-18 Thread Samuel Gendler
I'm just top posting this because this whole thread needs a reset before it
goes any farther.

Start with a real description of these hosts - Number and types of disks,
filesystem configs, processors, memory, OS, etc.  If your db is small enough
to fit into RAM, please show us the db config you are using which ensures
that you are making best use of available RAM, etc.

Then we need to know what your test looks like - showing us a query and an
explain plan without any info about the table structure, indexes, number of
rows, and table usage patterns doesn't provide anywhere near enough info to
diagnose inefficiency.

There are several documents linked right from the page for this mailing list
that describe exactly how to go about providing enough info to get help from
the list.  Please read through them, then update us with the necessary
information, and I'm sure we'll be able to offer you some insight into what
is going on.

And for the record, your app probably doesn't need to change to use table
partitioning, at least when selecting data.  Depending upon how data is
loaded, you may need to change how you do inserts.  But it is impossible to
comment on whether partitioning might help you without knowing table
structure, value distributions, query patterns, and number of rows in the
table.  If you are always selecting over the whole range of data,
partitioning isn't likely to buy you anything, for example.


Re: [PERFORM] Need to tune for Heavy Write

2011-08-04 Thread Samuel Gendler
On Wed, Aug 3, 2011 at 9:56 PM, Adarsh Sharma wrote:

> Dear all,
>
> From the last few days, I researched a lot on Postgresql Performance Tuning
> due to slow speed of my server.
> My application selects data from mysql database about 10 rows , process
> it & insert into postgres 2 tables by making about 45 connections.


It's already been mentioned, but is worth reinforcing, that if you are
inserting 100,000 rows in 100,000 transactions, you'll see a huge
performance improvement by doing many more inserts per transaction.  Try
doing at least 500 inserts in each transaction (though you can possibly go
quite a bit higher than that without any issues, depending upon what other
traffic the database is handling in parallel).  You almost certainly don't
need 45 connections in order to insert only 100,000 rows.  I've got a crappy
VM with 2GB of RAM in which inserting 100,000 relatively narrow rows
requires less than 10 seconds if I do it in a single transaction on a single
connection.  Probably much less than 10 seconds, but the code I just tested
with does other work while doing the inserts, so I don't have a pure test at
hand.


[PERFORM] raid array seek performance

2011-09-13 Thread Samuel Gendler
I'm just beginning the process of benchmarking and tuning a new server.
 Something I really haven't done before.  I'm using Greg's book as a guide.
 I started with bonnie++ (1.96) and immediately got anomalous results (I
think).

Hardware is as follows:

2x quad core xeon 5504 2.0Ghz, 2x4MB cache
192GB DDR3 1066 RAM
24x600GB 15K rpm SAS drives
adaptec 52445 controller

The default config, being tested at the moment, has 2 volumes, one 100GB and
one 3.2TB, both are built from a stripe across all 24 disks, rather than
splitting some spindles out for one volume and another set for the other
volume.  At the moment, I'm only testing against the single 3.2TB volume.

The smaller volume is partitioned into /boot (ext2 and tiny) and / (ext4 and
91GB).  The larger volume is mounted as xfs with the following options
(cribbed from an email to the list earlier this week, I
think): logbufs=8,noatime,nodiratime,nobarrier,inode64,allocsize=16m

Bonnie++ delivered the expected huge throughput for sequential read and
write.  It seems in line with other benchmarks I found online.  However, we
are only seeing 180 seeks/sec, but seems quite low.  I'm hoping someone
might be able to confirm that and. hopefully, make some suggestions for
tracking down the problem if there is one.

Results are as follows:

1.96,1.96,newbox,1,1315935572,379G,,1561,99,552277,46,363872,34,3005,90,981924,49,179.1,56,16,19107,69,+,+++,20006,69,19571,72,+,+++,20336,63,7111us,10666ms,14067ms,65528us,592ms,170ms,949us,107us,160us,383us,31us,130us


Version  1.96   --Sequential Output-- --Sequential Input-
--Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
%CP
newzonedb.z1.p 379G  1561  99 552277  46 363872  34  3005  90 981924  49
179.1  56
Latency  7111us   10666ms   14067ms   65528us 592ms
170ms
--Sequential Create-- Random
Create
-Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files:max:min/sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
%CP
newbox16 19107  69 + +++ 20006  69 19571  72 + +++ 20336
 63
Latency   949us 107us 160us 383us  31us
130us

Also, my inclination is to default to the following volume layout:

2 disks in RAID 1 for system
4 disks in RAID 10 for WAL (xfs)
18 disks in RAID 10 for data (xfs)

Use case is minimal OLTP traffic, plus a fair amount of data warehouse style
traffic - low connection count, queries over sizeable fact tables (100s of
millions of rows) partitioned over time, insert-only data loading, via COPY,
plus some tables are populated via aggregation queries over other tables.
 Basically, based on performance of our current hardware, I'm not concerned
about being able to handle the data-loading load, with the 4 drive raid 10
volume, so emphasis is on warehouse query speed.  I'm not best pleased by
the 2 Ghz CPUs, in that context, but I wasn't given a choice on the
hardware.

Any comments on that proposal are welcome.  I've got only a week to settle
on a config and ready the box for production, so the number of iterations I
can go through is limited.


Re: [PERFORM] raid array seek performance

2011-09-13 Thread Samuel Gendler
On Tue, Sep 13, 2011 at 12:13 PM, Samuel Gendler
wrote:

> I'm just beginning the process of benchmarking and tuning a new server.
>  Something I really haven't done before.  I'm using Greg's book as a guide.
>  I started with bonnie++ (1.96) and immediately got anomalous results (I
> think).
>
> Hardware is as follows:
>
> 2x quad core xeon 5504 2.0Ghz, 2x4MB cache
> 192GB DDR3 1066 RAM
> 24x600GB 15K rpm SAS drives
> adaptec 52445 controller
>
> The default config, being tested at the moment, has 2 volumes, one 100GB
> and one 3.2TB, both are built from a stripe across all 24 disks, rather than
> splitting some spindles out for one volume and another set for the other
> volume.  At the moment, I'm only testing against the single 3.2TB volume.
>
> The smaller volume is partitioned into /boot (ext2 and tiny) and / (ext4
> and 91GB).  The larger volume is mounted as xfs with the following options
> (cribbed from an email to the list earlier this week, I
> think): logbufs=8,noatime,nodiratime,nobarrier,inode64,allocsize=16m
>
> Bonnie++ delivered the expected huge throughput for sequential read and
> write.  It seems in line with other benchmarks I found online.  However, we
> are only seeing 180 seeks/sec, but seems quite low.  I'm hoping someone
> might be able to confirm that and. hopefully, make some suggestions for
> tracking down the problem if there is one.
>
> Results are as follows:
>
>
> 1.96,1.96,newbox,1,1315935572,379G,,1561,99,552277,46,363872,34,3005,90,981924,49,179.1,56,16,19107,69,+,+++,20006,69,19571,72,+,+++,20336,63,7111us,10666ms,14067ms,65528us,592ms,170ms,949us,107us,160us,383us,31us,130us
>
>
> Version  1.96   --Sequential Output-- --Sequential Input-
> --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
> %CP
> newzonedb.z1.p 379G  1561  99 552277  46 363872  34  3005  90 981924  49
> 179.1  56
> Latency  7111us   10666ms   14067ms   65528us 592ms
> 170ms
> --Sequential Create-- Random
> Create
> -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files:max:min/sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
> %CP
> newbox16 19107  69 + +++ 20006  69 19571  72 + +++
> 20336  63
> Latency   949us 107us 160us 383us  31us
> 130us
>
>
My seek times increase when I reduce the size of the file, which isn't
surprising, since once everything fits into cache, seeks aren't dependent on
mechanical movement.  However, I am seeing lots of bonnie++ results in
google which appear to be for a file size that is 2x RAM which show numbers
closer to 1000 seeks/sec (compared to my 180).  Usually, I am seeing 16GB
file for 8GB hosts.  So what is an acceptable random seeks/sec number for a
file that is 2x memory?  And does file size make a difference independent of
available RAM such that the enormous 379GB file that is created on my host
is skewing the results to the low end?


Re: [PERFORM] Performance Anomaly with "col in (A,B)" vs. "col = A OR col = B" ver. 9.0.3

2011-09-26 Thread Samuel Gendler
On Mon, Sep 26, 2011 at 2:11 PM, Timothy Garnett wrote:

>
> Though maybe in a lot of common use situations people only supply values
> that are known present so maybe this would make things worse more often then
> better (maybe limit 1 or better EXISTS would be a hint the value is not
> known present). Experimenting a bit it doesn't seem to matter which values
> are selected so it's not taking into account any kind of distribution over
> the histogram boundaries.


If I'm not mistaken, the problem here is actually the LIMIT 1, yes?  The
planner is opting for the sequential scan because it assumes it will
interrupt the scan relatively quickly when a row is matched?  So in the case
where you are really looking for existence, perhaps the better solution is
to select a count of the number of matching rows (perhaps grouped by id so
you know which ones match)? That would emulate the behaviour of select
without a limit, which would be more likely to use the index. It all depends
on just what you are actually doing with the row you are returning, of
course, and if there is some other way to get it once you know of its
existence.

SELECT count(1), exp_detls.id FROM exp_detls WHERE (exp_detls.hts_code_id IN
(12,654)) GROUP BY exp_detls.id

might work, depending upon how many different values of exp_detls.id you are
likely to see for any given set of hts_code_ids.  Actually, I know little
about the query planner, but it seems to me that the aggregation and
grouping might be sufficient to force it away from preferring the sequential
scan, even if you leave a 'limit 1' on the query, since it will have to find
more than 1 row in order to return a single row, since that single row
contains an aggregate.  So if your concern is about the potential of
transferring millions of rows across the network, I think that might fix it,
though it is definitely a kludge.  Of course, the downside is that the index
won't be as fast as a sequential scan in the cases where the scan does get
interrupted quickly, but you've clearly already considered that for your use
patterns.


Re: [PERFORM] Adding more memory = hugh cpu load

2011-10-10 Thread Samuel Gendler
On Mon, Oct 10, 2011 at 1:52 PM, Kevin Grittner  wrote:

> alexandre - aldeia digital  wrote:
>
> > I came to the list to see if anyone else has experienced the same
> > problem
>
> A high load average or low idle CPU isn't a problem, it's a
> potentially useful bit of information in diagnosing a problem.  I
> was hoping to hear what the actual problem was, since I've had a few
> problems in high RAM situations, but the solutions depend on what
> the actual problems are.  I don't suppose you saw periods where
> queries which normally run very quickly (say in a millisecond or
> less) were suddenly taking tens of seconds to run -- "stalling" and
> then returning to normal?  Because if I knew you were having a
> problem like *that* I might have been able to help.  Same for other
> set of symptoms; it's just the suggestions would have been
> different.  And the suggestions would have depended on what your
> system looked like besides the RAM.
>
> If you're satisfied with how things are running with less RAM,
> though, there's no need.
>

The original question doesn't actually say that performance has gone down,
only that cpu utilization has gone up. Presumably, with lots more RAM, it is
blocking on I/O a lot less, so it isn't necessarily surprising that CPU
utilization has gone up.  The only problem would be if db performance has
gotten worse. Maybe I missed a message where that was covered?  I don't see
it in the original query to the list.


[PERFORM] backups blocking everything

2011-10-27 Thread Samuel Gendler
I've got a large mixed-used database, with the data warehouse side of things
consisting of several tables at hundreds of millions of rows, plus a number
of tables with tens of millions.  There is partitioning, but as the volume
of data has risen, individual partitions have gotten quite large.  Hardware
is 2x4 core 2.0Ghz Xeon processors, 176GB of RAM, 4 drives in raid 10 for
WAL logs and 16 or 20 spindles for data, also in RAID 10.  Total database
size is currently 399GB - via pg_database_size().  It's also worth noting
that we switched from 8.4 to 9.0.4 only about a month ago, and we were not
seeing this problem on 8.4.x.  The database is growing, but not at some kind
of exponential rate. full backup, compressed, on the old hardware was 6.3GB
and took about 1:45:00 to be written.  Recent backups are 8.3GB and taking 3
or 4 hours.  We were not seeing al queries stall out during the backups on
8.4, so far as I am aware.

The time it takes for pg_dump to run has grown from 1 hour to 3 and even 4
hours over the last 6 months, with more than half of that increase occurring
since we upgrade to 9.0.x.  In the last several weeks (possibly since the
upgrade to 9.0.4), we are seeing all connections getting used up (our main
apps use connection pools, but monitoring and some utilities are making
direct connections for each query, and some of them don't check for the
prior query to complete before sending another, which slowly eats up
available connections).  Even the connection pool apps cease functioning
during the backup, however, as all of the connections wind up in parse
waiting state.  I also see lots of sockets in close wait state for what
seems to be an indefinite period while the backup is running and all
connections are used up.  I assume all of this is the result of pg_dump
starting a transaction or otherwise blocking other access.  I can get
everything using a pool, that's not a huge problem to solve, but that won't
fix the fundamental problem of no queries being able to finish while the
backup is happening.

I know I'm not the only one running a database of this size.  How do others
handle backups?  At the moment, I don't have replication happening.  I can
use the old hardware to replicate to.  It doesn't have quite the i/o
capacity and nowhere near as much RAM, but I wouldn't be looking to use it
for querying unless I lost the primary, and it is definitely capable of
handling the insert load, at least when the inserts are being done directly.
 I'm not sure if it is easier or harder for it to handle the same inserts
via streaming replication.  My question is, what are the performance
repercussions of running a pg_dump backup off the replicated server.  If it
experiences the same kind of lockup, will SR get so far behind that it can't
catch up?  Is there some other preferred way to get a backup of a large db?

And finally, is the lockout I'm experiencing actually the result of a bug or
misuse of pg_dump in some way?


Re: [PERFORM] backups blocking everything

2011-10-27 Thread Samuel Gendler
On Thu, Oct 27, 2011 at 1:45 PM, Nicholson, Brad (Toronto, ON, CA) <
bnichol...@hp.com> wrote:

> >From: pgsql-performance-ow...@postgresql.org [mailto:
> pgsql-performance-ow...@postgresql.org] On Behalf Of Samuel Gendler
> >Sent: Thursday, October 27, 2011 12:47 PM
> >To: pgsql-performance@postgresql.org
> >Subject: [PERFORM] backups blocking everything
> >
> >I've got a large mixed-used database, with the data warehouse side of
> things consisting of several tables at hundreds of millions of rows, plus a
> number of tables with tens of >millions.  There is partitioning, but as the
> volume of data has risen, individual partitions have gotten quite large.
>  Hardware is 2x4 core 2.0Ghz Xeon processors, 176GB of RAM, 4 drives in
> >raid 10 for WAL logs and 16 or 20 spindles for data, also in RAID 10.
>  Total database size is currently 399GB - via pg_database_size().  It's also
> worth noting that we switched from 8.4 to >9.0.4 only about a month ago, and
> we were not seeing this problem on 8.4.x.  The database is growing, but not
> at some kind of exponential rate. full backup, compressed, on the old
> hardware >was 6.3GB and took about 1:45:00 to be written.  Recent backups
> are 8.3GB and taking 3 or 4 hours.  We were not seeing al queries stall out
> during the backups on 8.4, so far as I am aware.
> >
> >The time it takes for pg_dump to run has grown from 1 hour to 3 and even 4
> hours over the last 6 months, with more than half of that increase occurring
> since we upgrade to 9.0.x.  In the >last several weeks (possibly since the
> upgrade to 9.0.4), we are seeing all connections getting used up (our main
> apps use connection pools, but monitoring and some utilities are making
> >direct connections for each query, and some of them don't check for the
> prior query to complete before sending another, which slowly eats up
> available connections).  Even the connection >pool apps cease functioning
> during the backup, however, as all of the connections wind up in parse
> waiting state.  I also see lots of sockets in close wait state for what
> seems to be an >indefinite period while the backup is running and all
> connections are used up.  I assume all of this is the result of pg_dump
> starting a transaction or otherwise blocking other access.  I >can get
> everything using a pool, that's not a huge problem to solve, but that won't
> fix the fundamental problem of no queries being able to finish while the
> backup is happening.
>
> What is the I/O utilization like during the dump?  I've seen this situation
> in the past and it was caused be excessively bloated tables causing I/O
> starvation while they are getting dumped.
>

There are definitely no bloated tables.  The large tables are all
insert-only, and old data is aggregated up and then removed by dropping
whole partitions.  There should be no bloat whatsoever.  The OLTP side of
things is pretty minimal, and I can pg_dump those schemas in seconds, so
they aren't the problem, either.  I don't know what the I/O utilization is
during the dump, offhand. I'll be doing a more thorough investigation
tonight, though I suppose I could go look at the monitoring graphs if I
weren't in the middle of 6 other things at the moment.  the joys of startup
life.


Re: [PERFORM] backups blocking everything

2011-10-27 Thread Samuel Gendler
On Thu, Oct 27, 2011 at 2:15 PM, Samuel Gendler
wrote:
>
>
> There are definitely no bloated tables.  The large tables are all
> insert-only, and old data is aggregated up and then removed by dropping
> whole partitions.  There should be no bloat whatsoever.  The OLTP side of
> things is pretty minimal, and I can pg_dump those schemas in seconds, so
> they aren't the problem, either.  I don't know what the I/O utilization is
> during the dump, offhand. I'll be doing a more thorough investigation
> tonight, though I suppose I could go look at the monitoring graphs if I
> weren't in the middle of 6 other things at the moment.  the joys of startup
> life.
>
>
Does pg_dump use work_mem, maintenance_work_mem, or both?  I'm seeing a huge
spike in swap-in during the period when I can't get into the db, then a
little bit of swap out toward the end.  We've got very little OLTP traffic -
like one or two users logged in and interacting with the system at a time,
at most, so I've got work_mem set pretty high, as most of our reporting
queries do large aggregations that grind to a halt if they go to disk.
Besides, we've got nearly 200GB of RAM.  But it would seem that pg_dump is
allocating a large number of work_mem (or maintenance_work_mem) segments.

# show work_mem;
 work_mem
--
 512MB
(1 row)

# show maintenance_work_mem;
 maintenance_work_mem
--
 2GB

To be honest, I'm not entirely certain how to interpret some of the graphs
I'm looking at in this context.

here are some pictures of what is going on. The db monitoring itself goes
away when it eats all of the connections, but you can see what direction
they are headed and the values when it finally manages to get a connection
again at the end of the period.  All of the other numbers are just host
monitoring, so they are continuous through the shutout.

Memory usage on the host (shared buffers is set to 8GB):

http://photos.smugmug.com/photos/i-sQ4hVCz/0/L/i-sQ4hVCz-L.png

Swap Usage:

http://photos.smugmug.com/photos/i-T25vcZ2/0/L/i-T25vcZ2-L.png

Swap rate:

http://photos.smugmug.com/photos/i-WDDcN9W/0/L/i-WDDcN9W-L.png

CPU utilization:

http://photos.smugmug.com/photos/i-4xkGqjB/0/L/i-4xkGqjB-L.png

Load Average:

http://photos.smugmug.com/photos/i-p4n94X4/0/L/i-p4n94X4-L.png

disk IO for system disk (where the backup is being written to):

http://photos.smugmug.com/photos/i-gbCxrnq/0/M/i-gbCxrnq-M.png

disk IO for WAL volume:

http://photos.smugmug.com/photos/i-5wNwrDX/0/M/i-5wNwrDX-M.png

disk IO for data volume:

http://photos.smugmug.com/photos/i-r7QGngG/0/M/i-r7QGngG-M.png

Various postgres monitors - the graph names are self explanatory:

http://photos.smugmug.com/photos/i-23sTvLP/0/M/i-23sTvLP-M.png
http://photos.smugmug.com/photos/i-73rphrf/0/M/i-73rphrf-M.png
http://photos.smugmug.com/photos/i-rpKvrVJ/0/L/i-rpKvrVJ-L.png
http://photos.smugmug.com/photos/i-QbNQFJM/0/L/i-QbNQFJM-L.png


Re: [PERFORM] backups blocking everything

2011-10-28 Thread Samuel Gendler
On Fri, Oct 28, 2011 at 2:20 PM, Merlin Moncure  wrote:

>
> hrm -- it doesn't look like you are i/o bound -- postgres is
> definitely the bottleneck.  taking a dump off of production is
> throwing something else out of whack which is affecting your other
> processes.
>
> band aid solutions might be:
> *) as noted above, implement hot standby and move dumps to the standby
> *) consider adding connection pooling so your system doesn't
> accumulate N processes during dump
>
>
Both are in the works.  The 1st one is more involved, but I'm going to move
our monitoring to a pool next week, so I at least stop getting locked out.
We'll be moving to a non-superuser user, as well.  That's an artifact of the
very early days that we never got around to correcting.



> a better diagnosis might involve:
> *) strace of one of your non-dump proceses to see where the blocking
> is happening
> *) profiling one of your user processes and compare good vs bad time
>

This only happens at a particularly anti-social time, so we're taking the
easy way out up front and just killing various suspected processes each
night in order to narrow things down.  It looks like it is actually an
interaction between a process that runs a bunch of fairly poorly architected
queries running on a machine set up with the wrong time zone, which was
causing it to run at exactly the same time as the backups.  We fixed the
time zone problem last night and didn't have symptoms, so that's the
fundamental problem, but the report generation process has a lot of room for
improvement, regardless.  There's definitely lots of room for improvement,
so it's now really about picking the resolutions that offer the most bang
for the buck.  I think a hot standby for backups and report generation is
the biggest win, and I can work on tuning the report generation at a later
date.


> Is there anything out of the ordinary about your application that's
> worth mentioning?  using lots of subtransactions? prepared
> transactions? tablespaces? huge amounts of tables? etc?
>

Nope.  Pretty normal.


>
> Have you checked syslogs/dmesg/etc for out of the ordinary system events?
>

nothing.

Thanks for taking the time to go through my information and offer up
suggestions, everyone!

--sam


Re: [PERFORM] Error while vacuuming

2011-11-07 Thread Samuel Gendler
On Mon, Nov 7, 2011 at 10:33 PM, Bhakti Ghatkar  wrote:

> Tom,
>
> Currently we are using version 9.0.1.
>
> Which version shall we update to? 9.05 or 9.1 ?
>

9.0.5 should be compatible with your installed db and contain any bug fixes
that have been released.  Which isn't to say that you shouldn't test and
make a backup before upgrading the binaries on your production server, of
course.

--sam


Re: [PERFORM] Performance question 83 GB Table 150 million rows, distinct select

2011-11-16 Thread Samuel Gendler
On Wed, Nov 16, 2011 at 3:32 PM, Scott Marlowe wrote:

>
> If the OP's considering partitioning, they should really consider
> upgrading to 9.1 which has much better performance of things like
> aggregates against partition tables.
>
>
Could you elaborate on this a bit, or point me at some docs?  I manage a
600GB db which does almost nothing but aggregates on partitioned tables -
the largest of which has approx 600 million rows across all partitions.
 grouping in the aggregates tends to be on the partition column and rarely,
if ever, would a group cross multiple partitions. We're on 9.0 and could
definitely use some performance gains.


Re: [PERFORM] Duplicate deletion optimizations

2012-01-06 Thread Samuel Gendler
On Fri, Jan 6, 2012 at 6:35 AM,  wrote:

> Hello,
>
> I've a table with approximately 50 million rows with a schema like this:
>
>id bigint NOT NULL DEFAULT nextval('stats_5mn'::regclass)**,
>t_value integer NOT NULL DEFAULT 0,
>t_record integer NOT NULL DEFAULT 0,
>output_id integer NOT NULL DEFAULT 0,
>count bigint NOT NULL DEFAULT 0,
>CONSTRAINT stats_mcs_5min_pkey PRIMARY KEY (id)
>
> Every 5 minutes, a process have to insert a few thousand of rows in this
> table,
> but sometime, the process have to insert an already existing row (based on
> values in the triplet (t_value, t_record, output_id). In this case, the row
> must be updated with the new count value. I've tried some solution given
> on this
> stackoverflow question [1] but the insertion rate is always too low for my
> needs.
>
> So, I've decided to do it in two times:
>
>  - I insert all my new data with a COPY command
>  - When it's done, I run a delete query to remove oldest duplicates
>
> Right now, my delete query look like this:
>
>SELECT min(id) FROM stats_5mn
>GROUP BY t_value, t_record, output_id
>HAVING count(*) > 1;
>
> The duration of the query on my test machine with approx. 16 million rows
> is ~18s.
>

Have you considered doing the insert by doing a bulk insert into a temp
table and then pulling rows that don't exist across to the final table in
one query and updating rows that do exist in another query?  I did a very
brief scan of the SO thread and didn't see it suggested.  Something like
this:

update stats_5mn set count = count + t.count
from temp_table t
where stats_5mn.t_value = t.t_value and stats_5mn.t_record and
stats_5mn.output_id = t.output_id;

insert into stats_5mn
select * from temp_table t
where not exists (
select 1 from stats_5mn s
where s.t_value = t.t_value and s.t_record = t.t_record and s.output_id =
t.output_id
);

drop table temp_table;

Note - you must do the update before the insert because doing it the other
way around will cause every row you just inserted to also be updated.

I'm not sure it'd be markedly faster, but you'd at least be able to retain
a unique constraint on the triplet, if desired.  And, to my eye, the logic
is easier to comprehend.  The different query structure may make better use
of your index, but I imagine that it is not using it currently because your
db isn't configured to accurately reflect the real cost of index use vs
sequential scan, so it is incorrectly determining the cost of looking up
7.5 million rows.  Its estimate of the row count is correct, so the
estimate of the cost must be the problem.  We'd need to know more about
your current config and hardware specs to be able to even start making
suggestions about config changes to correct the problem.


Re: [PERFORM] Duplicate deletion optimizations

2012-01-06 Thread Samuel Gendler
On Fri, Jan 6, 2012 at 12:22 PM, Marc Eberhard wrote:

> Hi Samuel!
>
> On 6 January 2012 20:02, Samuel Gendler  wrote:
> > Have you considered doing the insert by doing a bulk insert into a temp
> > table and then pulling rows that don't exist across to the final table in
> > one query and updating rows that do exist in another query?  I did a very
> > brief scan of the SO thread and didn't see it suggested.  Something like
> > this:
> >
> > update stats_5mn set count = count + t.count
> > from temp_table t
> > where stats_5mn.t_value = t.t_value and stats_5mn.t_record and
> > stats_5mn.output_id = t.output_id;
> >
> > insert into stats_5mn
> > select * from temp_table t
> > where not exists (
> > select 1 from stats_5mn s
> > where s.t_value = t.t_value and s.t_record = t.t_record and s.output_id =
> > t.output_id
> > );
> >
> > drop table temp_table;
>
> Am I right to assume that the update/insert needs to be placed into a
> begin / end transaction block if such batch uploads might happen
> concurrently? Doesn't seem to be the case for this question here, but
> I like the solution and wonder if it works under more general
> circumstances.
>

yes, assuming you are concerned about making the insertion atomic.
 Obviously, a failure in the second query after success in the 1st query
would be problematic outside of a transaction, since any attempt to repeat
the entire operation would result in repeated updates.


> What's the overhead of creating and dropping a temporary table? Is it
> only worth doing this for a large number of inserted/updated elements?
> What if the number of inserts/updates is only a dozen at a time for a
> large table (>10M entries)?
>

pretty minimal, but enough that doing a handful of rows at a time probably
wouldn't be worth it.  You'd surely get index usage on a plain insert in
such a case, so I'd probably just use an upsert stored proc for doing small
numbers of rows - unless you are doing large numbers of inserts, just a few
at a time.  In that case, I'd try to accumulate them and then do them in
bulk.  Those are tough questions to answer without a specific context.  My
real answer is 'try it and see.'  You'll always get an answer that is
specific to your exact circumstance that way.

By the way, there is definitely a difference between creating a temp table
and creating a table temporarily.  See the postgres docs about temp tables
for specifics, but many databases treat temp tables differently from
ordinary tables, so it is worth understanding what those differences are.
 Temp tables are automatically dropped when a connection (or transaction)
is closed.  Temp table names are local to the connection, so multiple
connections can each create a temp table with the same name without
conflict, which is convenient. I believe they are also created in a
specific tablespace on disk, etc.


Re: [PERFORM] Indexes and Primary Keys on Rapidly Growing Tables

2012-02-21 Thread Samuel Gendler
On Tue, Feb 21, 2012 at 9:59 AM, Alessandro Gagliardi
wrote:

> I was thinking about that (as per your presentation last week) but my
> problem is that when I'm building up a series of inserts, if one of them
> fails (very likely in this case due to a unique_violation) I have to
> rollback the entire commit. I asked about this in the 
> noviceforum
>  and was advised to use
> SAVEPOINTs. That seems a little clunky to me but may be the best way.
> Would it be realistic to expect this to increase performance by ten-fold?
>
>
if you insert into a different table before doing a bulk insert later, you
can de-dupe before doing the insertion, eliminating the issue entirely.


Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-24 Thread Samuel Gendler
On Thu, Feb 23, 2012 at 10:39 PM, Reuven M. Lerner wrote:

> Hi, everyone.
>
> So it turns out that we're not using 25 GB of virtual memory.  (That's
> what I had been shown yesterday, and it was a bit surprising, to say the
> least...)
>
> A few statistics that I managed to get from the Windows developers/system
> administrators:
>
> - The machine has a total of 3.5 GB of RAM
> - shared_buffers was set to 256 MB (yes, MB!)
> - Virtual memory usage by our process is 3 MB (yes, MB)
> - CPU is virtually idle when running the deletes, using about 1% of CPU
> - No other processes are accessing the database when we're running the
> maintenance; there are a total of three server processes, but two are idle.
>

What is work_mem set to?  If all the other values were set so low, I'd
expect work_mem to also be small, which could be causing all kind of disk
activity when steps don't fit into a work_mem segment.


Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-27 Thread Samuel Gendler
On Mon, Feb 27, 2012 at 6:59 AM, Reuven M. Lerner wrote:

>
> So for now, we'll just try to DELETE faster than we INSERT, and combined
> with autovacuum, I'm hoping that this crisis will be averted.  That said,
> the current state of affairs with these machines is pretty fragile, and I
> think that we might want to head off such problems in the future, rather
> than be surprised by them.
>
>
>
For the record, one very effective long term solution for doing this and
continuing to be able to do this no matter how many rows have accumulated
is to partition the data tables over time so that you can just drop older
partitions.  It does require code changes since relying on a trigger on the
parent table to distribute the inserts to the correct partition is much
slower than simply modifying your code to insert/copy into the correct
partition directly.  But it is well worth doing if you are accumulating
large volumes of data.  You can even leave old partitions around if you
don't need the disk space, since well-constructed queries will simply
ignore their existence, anyway, if you are only ever going back 30 days or
less.  Indexes are on individual partitions, so you needn't worry about
indexes getting too large, either.


Re: [PERFORM] TCP Overhead on Local Loopback

2012-04-02 Thread Samuel Gendler
On Sun, Apr 1, 2012 at 6:11 PM, Andrew Dunstan  wrote:

>
>
> On 04/01/2012 08:29 PM, Claudio Freire wrote:
>
>> On Sun, Apr 1, 2012 at 8:54 PM, Andrew Dunstan
>>  wrote:
>>
>>> You could try using Unix domain socket and see if the performance
 improves. A relevant link:

>>>
>>> He said Windows. There are no Unix domain sockets on Windows. (And please
>>> don't top-post)
>>>
>> Windows supports named pipes, which are functionally similar, but I
>> don't think pg supports them.
>>
>>
> Correct, so telling the OP to have a look at them isn't at all helpful.
> And they are not supported on all Windows platforms we support either
> (specifically not on XP, AIUI).
>

But suggesting moving away from TCP/IP with no actual evidence that it is
network overhead that is the problem is a little premature, regardless.
 What, exactly, are the set of operations that each update is performing
and is there any way to batch them into fewer statements within the
transaction.  For example, could you insert all 60,000 records into a
temporary table via COPY, then run just a couple of queries to do bulk
inserts and bulk updates into the destination tble via joins to the temp
table?  60,000 rows updated with 25 columns, 1 indexed in 3ms is not
exactly slow.  That's a not insignificant quantity of data which must be
transferred from client to server, parsed, and then written to disk,
regardless of TCP overhead.  That is happening via at least 60,000
individual SQL statements that are not even prepared statements.  I don't
imagine that TCP overhead is really the problem here.  Regardless, you can
reduce both statement parse time and TCP overhead by doing bulk inserts
(COPY) followed by multi-row selects/updates into the final table.  I don't
know how much below 3ms you are going to get, but that's going to be as
fast as you can possibly do it on your hardware, assuming the rest of your
configuration is as efficient as possible.


Re: [PERFORM] Parallel Scaling of a pgplsql problem

2012-04-25 Thread Samuel Gendler
On Wed, Apr 25, 2012 at 11:52 AM, Venki Ramachandran <
venki_ramachand...@yahoo.com> wrote:

> Hi all:
> Can someone please guide me as to how to solve this problem? If this is
> the wrong forum, please let me know which one to post this one in. I am new
> to Postgres (about 3 months into it)
>
> I have PostGres 9.0 database in a AWS server (x-large) and a pgplsql
> program that does some computation. It takes in a date range and for one
> pair of personnel (two employees in a company) it calculates some values
> over the time period. It takes about 40ms (milli seconds) to complete and
> give me the answer. All good so far.
>
> Now I have to run the same pgplsql on all possible combinations of
> employees and with 542 employees that is about say 300,000 unique pairs.
>
> So (30 * 40)/(1000 * 60 * 60) = 3.33 hours and I have to rank them and
> show it on a screen. No user wants to wait for 3 hours,  they can probably
> wait for 10 minutes (even that is too much for a UI application). How do I
> solve this scaling problem? Can I have multiple parellel sessions and each
> session have multiple/processes that do a pair each at 40 ms and then
> collate the results. Does PostGres or pgplsql have any parallel computing
> capability.
>

The question is, how much of that 40ms is spent performing the calculation,
how much is spent querying, and how much is function call overhead, and how
much is round trip between the client and server with the query and
results?  Depending upon the breakdown, it is entirely possible that the
actual per-record multiplier can be kept down to a couple of milliseconds
if you restructure things to query data in bulk and only call a single
function to do the work.  If you get it down to 4ms, that's a 20 minute
query.  Get it down to 1ms and you're looking at only 5 minutes for what
would appear to be a fairly compute-intensive report over a relatively
large dataset.


Re: [PERFORM] Expected performance of querying 5k records from 4 million records?

2012-06-18 Thread Samuel Gendler
On Mon, Jun 18, 2012 at 9:39 AM, Anish Kejariwal  wrote:

>
> So, it looks like clustering the index appropriately fixes things!  Also,
> I'll recreate the index switching the order to (dataset_id, stat_id,data_id)
>
> Just keep in mind that clustering is a one-time operation.  Inserts and
updates will change the order of records in the table, so you'll need to
re-cluster periodically to keep performance high if there are a lot of
inserts and updates into the tables. I didn't re-read the thread, but I
seem recall a partitioned table, so assuming you are partitioning in a
manner which keeps the number of partitions that are actively being
inserted/updated on to a minimum, you only need to cluster the active
partitions, which isn't usually terribly painful.  Also, if you are bulk
loading data (and not creating random spaces in the table by deleting and
updating), you can potentially order the data on the way into the table to
avoid the need to cluster repeatedly.

--sam


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Samuel Gendler
On Thu, Jul 5, 2012 at 12:13 PM, Reza Taheri  wrote:

> Hi Robert,
> Yes, the same concept. Oracle's IOT feature is used often with TPC
> benchmarks.
>
>
Reza, it would be very helpful if you were to provide the list with a lot
more information about your current software and hardware configuration
before coming to the conclusion that the only possible way forward is with
a significant architectural change to the db engine itself.  Not only is it
not at all clear that you are extracting maximum performance from your
current hardware and software, but I doubt anyone is particularly
interested in doing a bunch of development purely to game a benchmark.
 There has been significant discussion of the necessity and viability of
the feature you are requesting in the past, so you should probably start
where those discussions left off rather than starting the discussion all
over again from the beginning.  Of course, if vmware were to sponsor
development of the feature in question, it probably wouldn't require nearly
as much buy-in from the wider community.

Getting back to the current performance issues -  I have little doubt that
the MS SQL benchmark was set up and run by people who were intimately
familiar with MS SQL performance tuning.  You stated in your earlier email
that your team doesn't have significant postgresql-specific experience, so
it isn't necessarily surprising that your first attempt at tuning didn't
get the results that you are looking for. You stated that you have 14 SSDs
and 90 spinning drives, but you don't specify how they are combined and how
the database is laid out on top of them.  There is no mention of how much
memory is available to the system. We don't know how you've configured
postgresql's memory allocation or how your config weights the relative
costs of index lookups, sequential scans, etc.  The guidelines for this
mailing list include instructions for what information should be provided
when asking about performance improvements.
http://archives.postgresql.org/pgsql-performance/  Let's start by
ascertaining how your benchmark results can be improved without engaging in
a significant development effort on the db engine itself.


Re: [PERFORM] The need for clustered indexes to boost TPC-V performance

2012-07-05 Thread Samuel Gendler
On Thu, Jul 5, 2012 at 1:37 PM, Reza Taheri  wrote:

>
> I provided more config details in an earlier email.
>
> ** **
>
>
>
I hate to disagree, but unless I didn't get a message sent to the list, you
haven't provided any details about your postgresql config or otherwise
adhered to the guidelines for starting a discussion of a performance
problem around here.  I just searched my mailbox and no email from you has
any such details.  Several people have asked for them, including myself.
 You say you will give any details we want, but this is at least the 3rd or
4th request for such details and they have not yet been forthcoming.


Re: [PERFORM] very very slow inserts into very large table

2012-07-16 Thread Samuel Gendler
On Mon, Jul 16, 2012 at 7:06 AM, Mark Thornton  wrote:

>
>>  Every insert updates four indexes, so at least 3 of those will be in
> random order. The indexes don't fit in memory, so all those updates will
> involve reading most of the relevant b-tree pages from disk (or at least
> the leaf level). A total of 10ms of random read from disk (per inserted
> row) wouldn't surprise me ... which adds up to more than 10 days for your
> 93 million rows.


Which is the long way of saying that you will likely benefit from
partitioning that table into a number of smaller tables, especially if
queries on that table tend to access only a subset of the data that can be
defined to always fit into a smaller number of partitions than the total.
 At the very least, inserts will be faster because individual indexes will
be smaller.  But unless all queries can't be constrained to fit within a
subset of partitions, you'll also see improved performance on selects.

--sam


Re: [PERFORM] Is drop/restore trigger transactional?

2012-08-07 Thread Samuel Gendler
On Tue, Aug 7, 2012 at 2:39 PM, Craig James  wrote:

>
> Obviously this is a very expensive trigger, but one that we can drop
> in a very specific circumstance.  But we NEVER want to drop it for
> everyone.  It seems like a very reasonable use-case to me.
>
>
Sounds like you should try doing the work inside the trigger conditionally
and see if that improves performance enough, since you aren't likely to get
anything that better suits your needs without patching postgres.


Re: [PERFORM] Inaccurate Explain Cost

2012-09-26 Thread Samuel Gendler
On Wed, Sep 26, 2012 at 1:21 PM, hubert depesz lubaczewski <
dep...@depesz.com> wrote:

> On Wed, Sep 26, 2012 at 02:38:09PM -0400, Robert Sosinski wrote:
> > The first query shows a cost of 190,169.55 and runs in 199,806.951 ms.
> > When I disable nested loop, I get a cost of 2,535,992.34 which runs in
> > only 133,447.790 ms.  We have run queries on our database with a cost
> > of 200K cost before and they ran less then a few seconds, which makes
> > me wonder if the first query plan is inaccurate.  The other issue is
> > understanding why a query plan with a much higher cost is taking less
> > time to run.
>
> Are you under impression that cost should be somehow related to actual
> time?
> If yes - that's not true, and afaik never was.
> the fact that you got similar time and cost is just a coincidence.
>

Well...only sort of.  In a well-tuned db with accurate statistics, relative
cost between 2 plans should be reflected in relative execution time between
those 2 queries (assuming the data in memory is similar for both runs,
anyway), and that's what he seems to be complaining about.  The plan with
higher cost had lower execution time, which resulted in the planner picking
the slower query.  But the reason for the execution time discrepancy would
appear to be, at least in part, inaccurate statistics resulting in an
incorrect estimate of number of rows in a loop iteration.  More info about
the db config would help to identify other things contributing to the
inaccurate cost estimate - as mentioned earlier, please refer to
http://wiki.postgresql.org/wiki/Slow_Query_Questions when asking
performance questions

And yes, I know you know all of this, Hubert.  I wrote it for the benefit
of the original questioner.

--sam


Re: [PERFORM] Scaling 10 million records in PostgreSQL table

2012-10-08 Thread Samuel Gendler
On Mon, Oct 8, 2012 at 1:27 PM, Craig Ringer  wrote:

>
> If you already have appropriate indexes and have used `explain analyze` to
> verify that the query isn't doing anything slow and expensive, it's
> possible the easiest way to improve performance is to set up async
> replication or log shipping to a local hot standby on real physical
> hardware, then do the query there.
>

I've run postgresql on medium instances using elastic block store for the
storage and had no difficulty running queries like this one on tables of
comparable (and larger) size. It might not come back in 10ms, but such
queries weren't so slow that I would describe the wait as "a lot of time"
either.  My guess is that this is a sequential scan on a 10 million record
table with lots of bloat due to updates.  Without more info about table
structure and explain analyze output, we are all just guessing, though.
 Please read the wiki page which describes how to submit performance
problems and restate your question.


Re: [PERFORM] Scaling 10 million records in PostgreSQL table

2012-10-09 Thread Samuel Gendler
On Mon, Oct 8, 2012 at 1:25 PM, Navaneethan R  wrote:

>
>After created the index for WHERE clause "WHERE dealer_id = 270001"..It
> is performing better.I have more dealer ids Should I do it for each
> dealer_id?
>
>
All you've really done is confuse the issue.  Please read the wiki page on
how to submit performance questions and actually follow the directions.
 Show us the table structure when the query is performing poorly ALONG WITH
explain analyze output, so we can see how the query is being handled by the
db.  Adding indexes for just one particular value isn't likely a great
solution unless there's a reason why that value is special or performance
for that value needs to be particularly good.  Far better to get at the
root problem of performance issues on that table, whether it is table
bloat, insufficient indexes, invalid statistics, or something else.


Re: [PERFORM] have: seq scan - want: index scan

2012-10-17 Thread Samuel Gendler
On Tue, Oct 16, 2012 at 4:45 PM, Chris Ruprecht  wrote:

> Hi guys,
>
> PG = 9.1.5
> OS = winDOS 2008R8
>
> I have a table that currently has 207 million rows.
> there is a timestamp field that contains data.
> more data gets copied from another database into this database.
> How do I make this do an index scan instead?
> I did an "analyze audittrailclinical" to no avail.
> I tested different indexes - no same behavior.
>
>
> The query plan says:
>
> "  ->  Seq Scan on audittrailclinical  (cost=0.00..8637598.76
> rows=203856829 width=62)"
> "Filter: (("timestamp" >= '2008-01-01
> 00:00:00'::timestamp without time zone) AND ("timestamp" <= '2012-10-05
> 00:00:00'::timestamp without time zone))"
>
> which takes forever.
>
> How do I make this do an index scan instead?
> I did an "analyze audittrailclinical" to no avail.
>

analyze says 203 million out of 207 million rows are matched by your
timestamp filter, so it is definitely going to favour a sequential scan,
since an index scan that matches that many rows will inevitably be slower
than simply scanning the table, since it will have to both do the lookups
and load the actual records from the table (all of them, basically) in
order to determine their visibility to you, so your index scan will just
turn sequential access of the table pages into random access and require
index lookups as well.   You can possibly verify this by setting
enable_seqscan to false and running your analyze again and see how the plan
changes, though I don't believe that will necessarily remove all sequential
scans, it just reduces their likelihood, so you may see that nothing
changes. If the estimate for the number of matching rows is incorrect,
you'll want to increase the statistics gathering for that table or just
that column.

ALTER TABLE  ALTER COLUMN  SET STATISTICS 

where number is between 10 and 1000 and I think the default is 100.  Then
re-analyze the table and see if the query plan shows better estimates.  I
think 9.2 also supports "index only scans" which eliminate the need to load
the matched records in certain circumstances. However, all of the columns
used by the query would need to be in the index, and you are using an awful
lot of columns between the select clause and the table joins.

Are you lacking indexes on the columns used for joins that would allow more
selective index scans on those columns which could then just filter by
timestamp?  I'm not much of an expert on the query planner, so I'm not sure
what exactly will cause that behaviour, but I'd think that good statistics
and useful indexes should allow the rest of the where clause to be more
selective of the rows from audittrailclinical unless
patientaccount.defaultaccount
= 'Y' and patient.dnsortpersonnumber = '347450'  are similarly
non-selective, though patient.dnsortpersonnumber would seem like it is
probably the strong filter, so make sure you've got indexes and accurate
stats on all of the foreign keys that connect patient table and
audittrailclinical table.  It'd be useful to see the rest of the explain
analyze output so we could see how it is handling the joins and why.  Note
that because you have multiple composite foreign keys joining tables in
your query, you almost certainly won't those composite keys in a single
index.  If you have indexes on those columns but they are single-column
indexes, that may be what is causing the planner to try to filter the atc
table on the timestamp rather than via the joins.  I'm sure someone more
knowledgable than I will be along eventually to correct any misinformation
I may have passed along.  Without knowing anything about your schema or the
rest of the explain analyze output, I'm mostly just guessing.  There is an
entire page devoted to formulating useful mailing list questions,
incidentally.  Yours really isn't.  Or if the atc table definition is
complete, you are definitely missing potentially useful indexes, since you
are joining to that table via encountersid and you don't show an index on
that column - yet that is the column that eventually joins out to the
patient and patientaccount tables, which have the stronger filters on them.

Incidentally, why the join to the entity table via entitysid?  No columns
from that table appear to be used anywhere else in the query.

--sam


Re: [PERFORM] Invalid memory alloc request size

2012-10-31 Thread Samuel Gendler
This was answered on the list last time you asked it. You are exceeding a
maximum buffer size. There was an implication that it was related to
converting a string from one encoding to another that could maybe be
alleviated by using the same encoding in both client and server, but a more
reliable solution is probably breaking your sql file into smaller pieces
(or, perhaps even better would be bulk-loading the data via COPY, assuming
that isn't subject to the same buffer size limitation ). I suppose you
could investigate recompiling postgresql with a larger buffer, though that
is likely to have side effects that i certainly can't predict.



On Wed, Oct 31, 2012 at 3:24 AM, Mahavir Trivedi
wrote:

> dear friends
>
> i have - sql file of size more than 1 gb
> when i execute it  then after some time "Invalid memory alloc request size
> 100234023 byte"   occcured
> what ' s problem that i don't know ?
>
>
>
> with thanks
> mahavir
>


Re: [PERFORM] Thousands databases or schemas

2012-11-08 Thread Samuel Gendler
On Thu, Nov 8, 2012 at 1:36 AM, Denis  wrote:

>
> P.S.
> Not to start a holywar, but FYI: in a similar project where we used MySQL
> now we have about 6000 DBs and everything works like a charm.
>

You seem to have answered your own question here.  If my recollection of a
previous discussion about many schemas and pg_dump performance is accurate,
I suspect you are going to be told that you've got a data architecture that
is fairly incompatible with postgresql's architecture and you've
specifically ruled out a solution that would play to postgresql's strengths.


Re: [PERFORM] General key issues when comparing performance between PostgreSQL and oracle

2013-07-16 Thread Samuel Gendler
On Tue, Jul 16, 2013 at 9:51 AM, Brian Fehrle wrote:

>
> Are there any known differences between the database systems in terms of
> query planners or general operations (sorting, unions) that are notable
> different between the systems that would make postgres slow down when
> executing the exact same queries?
>
>
yes.

But unless you provide more detail, it's impossible to say what those
differences might be.  In all probability, postgresql can be configured to
provide comparable performance, but without details, who can really say?