from:"Marti Raudsepp"

Re: [PERFORM] query plan question, nested loop vs hash join

2014-10-07 Thread Marti Raudsepp

On Fri, Oct 3, 2014 at 6:38 PM, Andrey Lizenko lizenk...@gmail.com wrote:
 Is it possible to force optimizer choose the second plan without doing  set
 enable_hashjoin = off; ?

 Increasing  of 'effective_cache_size' leads to similar thing with mergejoin,
 other options (work_mem, shared_buffers. etc) do not change anything.

Have you tried changing random_page_cost?

In small databases where most of the data is cached anyway, lowering
random_page_cost to somewhere between 1 and 2 usually leads to better
planner decisions.

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance of SELECT * much faster than SELECT colname with large offset

2014-10-03 Thread Marti Raudsepp

On Fri, Oct 3, 2014 at 5:39 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marc Slemko ma...@znep.com writes:
 I ran into this oddity lately that goes against everything I thought I
 understood and was wondering if anyone had any insight.

 SELECT * avoids a projection step ... see ExecAssignScanProjectionInfo.

It would be cool if OFFSET could somehow signal the child nodes don't
bother constructing the actual tuple. Not sure if that could work in
more complex queries. But this is just one of many performance
problems with large OFFSETs.

Of course you can always work around this using a subquery...
select description from (
  select * from ccrimes offset 514 limit 1
) subq;

But most of the time it's better to use scalable paging techniques:
http://use-the-index-luke.com/sql/partial-results/fetch-next-page

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue: index not used on GROUP BY...

2014-08-28 Thread Marti Raudsepp

On Thu, Aug 28, 2014 at 11:50 AM, gmb gmbou...@gmail.com wrote:
 Can somebody please confirm whether aggregate functions such as GROUP BY
 should use indexes ?

Yes, if the planner deems it faster than other approaches. It can make
wrong choices for many reasons, but usually when your planner tunables
like random_page_cost, effective_cache_size aren't set appropriately.

There's some advice here:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

Just for the purpose of testing, you could try set enable_sort=false
in your session and see if that makes it faster.

On Thu, Aug 28, 2014 at 12:08 PM, gmb gmbou...@gmail.com wrote:
 Sort Key: co_id, client_id, doc_no,

Something went missing from this line...

 Sort Method: external merge  Disk: 80304kB

Depends on your hardware and workloads, but more work_mem may also
improve queries to avoid sorts and hashes needing to use disk. But
beware, setting it too high may result in your server running out of
memory.

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Window functions, partitioning, and sorting performance

2014-08-21 Thread Marti Raudsepp

On Thu, Aug 21, 2014 at 4:29 PM, Eli Naeher enae...@gmail.com wrote:
 Clearly the bulk of the time is spent sorting the rows in the original
 table, and then again sorting the results of the subselect. But I'm afraid I
 don't really know what to do with this information. Is there any way I can
 speed this up?

Sort Method: external merge  Disk: 120976kB

The obvious first step is to bump up work_mem to avoid disk-based
sort. Try setting it to something like 256MB in your session and see
how it performs then. This may also allow the planner to choose
HashAggregate instead of sort.

It not always straightforward how to tune correctly. It depends on
your hardware, concurrency and query complexity, here's some advice:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#work_mem_maintainance_work_mem

Also you could create an index on (route, direction, stop, stop_time)
to avoid the inner sort entirely.

And it seems that you can move the INNER JOIN stop to the outer
query as well, not sure if that will change much.

Try these and if it's still problematic, report back with a new EXPLAIN ANALYZE

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Window functions, partitioning, and sorting performance

2014-08-21 Thread Marti Raudsepp

On Thu, Aug 21, 2014 at 7:19 PM, Eli Naeher enae...@gmail.com wrote:
 However, when I try to do a
 test self-join using it, Postgres does two seq scans across the whole table,
 even though I have indexes on both id and previous_stop_event:
 http://explain.depesz.com/s/ctck. Any idea why those indexes are not being
 used?

Because the planner thinks seq scan+hash join is going to be faster
than incurring the overhead of index scans for other kinds of plans.

You can try out alternative plan types by running 'set
enable_hashjoin=off' in your session. If it does turn out to be
faster, then it usually means you haven't set planner tunables right
(random_page_cost, effective_cache_size and possibly cpu_tuple_cost).

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] how does the planer to estimate row when i use order by and group by

2014-08-12 Thread Marti Raudsepp

On Tue, Aug 12, 2014 at 5:59 AM, 楊新波 silent0...@gmail.com wrote:
 why does the planer estimate 200 rows when i use order by and group by .
 evn:postgresql 8.4 and 9.3

 Can anybody suggest something or explain this behavior?

Because the table is empty, analyze doesn't store any stats for the
table, so the planner uses some default guesses.

This is actually beneficial for cases where you have done some inserts
to a new table, and autovacuum hasn't gotten around to analyzing it
yet. And it rarely hurts because any query plan will be fast when
there's no data.

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] two table join with order by on both tables attributes

2014-08-08 Thread Marti Raudsepp

On Fri, Aug 8, 2014 at 4:05 AM, Evgeniy Shishkin itparan...@gmail.com wrote:
 select * from users join  notifications on users.id=notifications.user_id 
 ORDER BY users.priority desc ,notifications.priority desc limit 10;

 In my understanding, i need to have two indexes
 on users(priority desc, id)
 and notifications(user_id, priority desc)

 And actually with this kind of query we really want the most wanted 
 notifications, by the user.
 So we really can rewrite to order by users.priority desc, id asc, 
 notifications.priority desc according to business logic.

You can rewrite it with LATERAL to trick the planner into sorting each
user's notifications separately. This should give you the nestloop
plan you expect:

SELECT *
FROM users,
LATERAL (
  SELECT * FROM notifications WHERE notifications.user_id=users.id
  ORDER BY notifications.priority DESC
) AS notifications
ORDER BY users.priority DESC, users.id

It would be great if Postgres could do this transformation automatically.

There's a partial sort patch in the current CommitFest, which would
solve the problem partially (it could use the index on users, but the
notifications sort would have to be done in memory still).
https://commitfest.postgresql.org/action/patch_view?id=1368

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Evaluating query performance with caching in PostgreSQL 9.1.6

2013-05-31 Thread Marti Raudsepp

On Fri, May 31, 2013 at 7:32 PM,  fburg...@radiantblue.com wrote:
 1.) Is there any way to clear the cache so that we can ensure that when we
 run explain analyze on a query and make some minor adjustments to that
 query and re-execute, the plan is not cached.

PostgreSQL doesn't cache query plans if you do a normal SELECT or
EXPLAIN ANALYZE SELECT query. Plans are cached only if you use
prepared queries:
1. Embedded queries within PL/pgSQL procedures
2. Explicit PREPARE/EXECUTE commands
3. PQprepare in the libpq library (or other client library)

If you don't use these, then you are experiencing something else and
not plan cache.

Maybe you're referring to disk cache. The only way to clear
PostgreSQL's cache (shared buffers) is to restart it, but there is
another level of caching done by the operating system.

On Linux you can drop the OS cache using:
echo 1  /proc/sys/vm/drop_caches

 2.) I am noticing that when I look at pg_stat_activities: autovacuum is
 re-processing some old Partition tables way back in 2007, which are static
 and are essentially read-only partitions. the line item in pg_stat reads as
 follows: autovacuum:VACUUM public.digi_sas_y2007m07 (to prevent wraparound).
 Is there a way to have autovacuum skip these static type partition tables,

No. This is a necessary and critical operation. PostgreSQL stores row
visibility information based on 32-bit transaction IDs (xids). This
value is small enough that it can wrap around, so very old tables need
to be frozen. Details here:
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

If this is a problem for you then you may want to schedule manual
VACUUM FREEZE on old tables during low usage periods.

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] statistics target for columns in unique constraint?

2013-05-13 Thread Marti Raudsepp

On Mon, May 13, 2013 at 6:01 PM, ach alanchi...@gmail.com wrote:
 what I'm wondering is, since
 the unique constraint already covers the whole table and all rows in
 entirety, is it really necessary for statistics to be set that high on
 those?

AFAIK if there are exact-matching unique constraints/indexes for a
query's WHERE clause, the planner will deduce that the query only
returns 1 row and won't consult statistics at all.

 Or does that only serve to slow down inserts to that table?

It doesn't slow down inserts directly. Tables are analyzed in the
background by autovacuum. However, I/O traffic from autovacuum analyze
may slow down inserts running concurrently.

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] limit order by performance issue

2012-10-16 Thread Marti Raudsepp

On Tue, Oct 16, 2012 at 10:47 PM, Karl Denninger k...@denninger.net wrote:
 Put an index on time_stamp (I assume there is one on id_signal already)

Well the optimal index for this particular query would include both columns:
(id_signal, time_stamp) -- in this order.

Additionally, if you want to take advantage of the index-only scans
feature, add the SELECTed column too:
(id_signal, time_stamp, var_value)

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] timing != log duration

2012-03-21 Thread Marti Raudsepp

On Wed, Mar 21, 2012 at 13:42, Rafael Martinez r.m.guerr...@usit.uio.no wrote:
 I am wondering why the time reported by \timing in psql is not the same
 as the time reported by duration in the log file when log_duration or
 log_min_duration_statement are on?

psql's \timing measures time on the client -- which includes the
network communication time (time to send the query to the server, and
receive back the results)

log_min_duration_statement measures time on the server, so it doesn't
know how long network transmission takes.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Large insert and delete batches

2012-03-01 Thread Marti Raudsepp

On Thu, Mar 1, 2012 at 21:06, Kääriäinen Anssi anssi.kaariai...@thl.fi wrote:
 The queries are select * from the_table where id = 
 ANY(ARRAY[list_of_numbers])
 and the similar delete, too.

 [...] However, once you go into
 millions of items in the list, the query will OOM my Postgres server.

The problem with IN() and ARRAY[] is that the whole list of numbers
has to be parsed by the SQL syntax parser, which has significant
memory and CPU overhead (it has to accept arbitrary expressions in the
list). But there's a shortcut around the parser: you can pass in the
list as an array literal string, e.g:
select * from the_table where id = ANY('{1,2,3,4,5}')

The SQL parser considers the value one long string and passes it to
the array input function, which is a much simpler routine. This should
scale up much better.

Even better if you could pass in the array as a query parameter, so
the SQL parser doesn't even see the long string -- but I think you
have to jump through some hoops to do that in psycopg2.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] UPDATE on NOT JOIN

2012-02-15 Thread Marti Raudsepp

On Wed, Feb 15, 2012 at 20:33, Gabriel Biberian
ad...@beemotechnologie.com wrote:
 Currently, i use the following query to update the filesystem table with the
 missing files :
 UPDATE filesystem SET dead=some_value WHERE dead=0 AND (SELECT 1 FROM
 temporary AS t WHERE t.hash=filesystem.hash LIMIT 1) IS NULL

I don't know if this solves your problem entirely, but an obvious
improvement would be using the NOT EXISTS (SELECT ...) construct:

UPDATE filesystem SET dead=some_value WHERE dead=0 AND NOT EXISTS
(SELECT 1 FROM temporary AS t WHERE t.hash=filesystem.hash);

PostgreSQL 8.4+ can optimize this into an anti join query (you
didn't mention what version you are using).

Also, if your hardware isn't very limited, you should increase the
work_mem setting from the default (1MB).

If the above doesn't help significantly, please post the full EXPLAIN
ANALYZE output.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance on large, append-only tables

2012-02-10 Thread Marti Raudsepp

On Wed, Feb 8, 2012 at 20:03, David Yeu david@skype.net wrote:
  * Š WHERE group_id = ? ORDER BY created_at DESC LIMIT 20 OFFSET ?;
  * Pages of twenty rows.

A good improvement for this sort of queries is the scalable paging
trick. Instead of increasing the OFFSET argument -- which means that
Postgres has to scan more and more rows -- you should remember an
index key where the last page ended.

In other words, you get the first page using:
WHERE group_id = ? ORDER BY created_at DESC LIMIT 20

Say, this page returns created_at values between 2012-01-01 and
2012-01-10. If the user clicks next page, you run a query like this
instead:
WHERE group_id = ? AND created_at'2012-01-10' ORDER BY created_at DESC LIMIT 20

Thus, every next page fetch always takes a constant time. Of course
there's a small problem when two rows have equal times. Then, you can
add primary key to the sort key to disambiguate those rows:

WHERE group_id = ? AND (created_at, pkey_col)  ('2012-01-10', 712)
ORDER BY created_at, pkey_col DESC LIMIT 20

Of course an index on (group_id, created_at) or (group_id, created_at,
pkey_col) is necessary for these to work well

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgress is taking lot of CPU on our embedded hardware.

2012-01-30 Thread Marti Raudsepp

On Sat, Jan 28, 2012 at 19:11, Jayashankar K B
jayashankar...@lnties.com wrote:
 But we are stumped by the amount of CPU Postgres is eating up.

You still haven't told us *how* slow it actually is and how fast you
need it to be? What's your database layout like (tables, columns,
indexes, foreign keys)? What do the queries look like that you have
problems with?

 Our database file is located on a class 2 SD Card. So it is understandable if 
 there is lot of IO activity and speed is less.

Beware that most SD cards are unfit for database write workloads,
since they only perform very basic wear levelling (in my experience
anyway -- things might have changed, but I'm doubtful). It's a matter
of time before you wear out some frequently-written blocks and they
start returning I/O errors or corrupted data.

If you can spare the disk space, increase checkpoint_segments, as that
means at least WAL writes are spread out over a larger number of
blocks. (But heap/index writes are still a problem)

They can also corrupt your data if you lose power in the middle of a
write -- since they use much larger physical block sizes than regular
hard drives and it can lose the whole block, which file systems or
Postgres are not designed to handle. They also tend to not respect
flush/barrier requests that are required for database consistency.

Certainly you should do such power-loss tests before you release your
product. I've built an embedded platform with a database. Due to disk
corruptions, in the end I opted for mounting all file systems
read-only and keeping the database only in RAM.

 Any configuration settings we could check up?

For one, you should reduce max_connections to a more reasonable number
-- I'd guess you don't need more than 5 or 10 concurrent connections.

Also set synchronous_commit=off; this means that you may lose some
committed transactions after power loss, but I think with SD cards all
bets are off anyway.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PostgreSQL Parallel Processing !

2012-01-27 Thread Marti Raudsepp

On Fri, Jan 27, 2012 at 06:31, sridhar bamandlapally
sridhar@gmail.com wrote:
 --
 | Id  | Operation | Name | Rows  | Bytes | Cost (%CPU)| Time |
 --
 |   0 | SELECT STATEMENT  |  |  7444K|   944M| 16077   (4)| 00:03:13 |
 |   1 |  TABLE ACCESS FULL| EMP  |  7444K|   944M| 16077   (4)| 00:03:13 |
 --

Sorry to take this off topic, but... Seriously, over 3 minutes to read
944 MB of data? That's less than 5 MB/s, what's wrong with your
database? :)

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Dramatic change in memory usage with version 9.1

2011-12-19 Thread Marti Raudsepp

On Mon, Dec 19, 2011 at 17:04, Rafael Martinez r.m.guerr...@usit.uio.no wrote:
 * Sudden decrease of swap when running backup/vacuum+analyze jobs

Do you know for certain that this memory use is attributed to
vacuum/analyze/backup, or are you just guessing? You should isolate
whether it's the vacuum or a backup process/backend that takes this
memory.

Do you launch vacuum/analyze manually or are you just relying on autovacuum?
How many parallel vacuum jobs are there?
What's your autovacuum_max_workers set to?
How large is your database?
How did you perform the upgrade -- via pg_upgrade or pg_dump?

 Any ideas about why this dramatic change in memory usage when the only
 thing apparently changed from our side is the postgres version?

Well, for one, there have been many planner changes that make it use
memory more aggressively, these probably being the most significant:
* Materialize for nested loop queries in 9.0:
http://rhaas.blogspot.com/2010/04/materialization-in-postgresql-90.html
* Hash join usage for RIGHT and FULL OUTER JOINs in 9.0

However, none of these would apply to vacuum, analyze or backups.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] will the planner ever use an index when the condition is ?

2011-12-18 Thread Marti Raudsepp

On Sun, Dec 18, 2011 at 16:52, Roxanne Reid-Bennett r...@tara-lu.com wrote:
 Is there an index type that can check not equal?
 This specific column has a limited number of possible values - it is
 essentially an enumerated list.

Instead of writing WHERE foo3 you could rewrite it as WHERE foo IN
(1,2,4,...) or WHERE foo  3 OR foo  3. Both of these are indexable
queries, but obviously the planner may choose not to use index scan if
it's not worth it.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Is it possible to use index on column for regexp match operator '~'?

2011-12-14 Thread Marti Raudsepp

2011/12/14 Rural Hunter ruralhun...@gmail.com:
 for example, the where condition is: where '' ~ col1. I created a normal
 index on col1 but seems it is not used.

I assume you want to search values that match one particular pattern,
that would be col1 ~ ''

The answer is, only very simple patterns that start with '^'. Note
that you MUST use the text_pattern_ops index opclass:

# create table words (word text);
# copy words from '/usr/share/dict/words';
# create index on words (word text_pattern_ops);
# explain select * from words where word ~ '^post';
Index Scan using words_word_idx on words  (cost=0.00..8.28 rows=10 width=9)
  Index Cond: ((word ~=~ 'post'::text) AND (word ~~ 'posu'::text))
  Filter: (word ~ '^post'::text)



If you just want to search for arbitrary strings, in PostgreSQL 9.1+
you can use pg_trgm extension with a LIKE expression:

# create extension pg_trgm;
# create index on words using gist (word gist_trgm_ops);
# explain select * from words where word like '%post%';
Bitmap Heap Scan on words  (cost=4.36..40.23 rows=10 width=9)
  Recheck Cond: (word ~~ '%post%'::text)
  -  Bitmap Index Scan on words_word_idx1  (cost=0.00..4.36 rows=10 width=0)
Index Cond: (word ~~ '%post%'::text)



There's also the wildspeed external module which is somewhat faster
at this: http://www.sai.msu.su/~megera/wiki/wildspeed

And someone is working to get pg_trgm support for arbitrary regular
expression searches. This *may* become part of the next major
PostgreSQL release (9.2)
http://archives.postgresql.org/message-id/CAPpHfduD6EGNise5codBz0KcdDahp7--MhFz_JDD_FRPC7-i=a...@mail.gmail.com

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Response time increases over time

2011-12-08 Thread Marti Raudsepp

On Thu, Dec 8, 2011 at 06:37, Aidan Van Dyk ai...@highrise.ca wrote:
 Let me guess, debian squeeze, with data and xlog on both on a single
 ext3 filesystem, and the fsync done by your commit (xlog) is flushing
 all the dirty data of the entire filesystem (including PG data writes)
 out before it can return...

This is fixed with the data=writeback mount option, right?
(If it's the root file system, you need to add
rootfsflags=data=writeback to your kernel boot flags)

While this setting is safe and recommended for PostgreSQL and other
transactional databases, it can cause garbage to appear in recently
written files after a crash/power loss -- for applications that don't
correctly fsync data to disk.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Intersect/Union X AND/OR

2011-12-07 Thread Marti Raudsepp

On Mon, Dec 5, 2011 at 14:14, Thiago Godoi thiagogodo...@gmail.com wrote:
 My original query :

 select table1.id
 from table1, (select function(12345) id) table2
 where table1.kind = 1234
 and table1.id = table2.id

 Nested Loop  (cost=0.00..6.68 rows=1 width=12)
   Join Filter: ()
   -  Seq Scan on recorte  (cost=0.00..6.39 rows=1 width=159)
     Filter: (id = 616)
   -  Result  (cost=0.00..0.26 rows=1 width=0)

Note that this EXPLAIN output is quite different from your query.
Intead of a kind=1234 clause there's id=616. Also, please post
EXPLAIN ANALYZE results instead whenever possible.

 When I changed the query to use intersect :
[...]
 The second plan is about 10 times faster than the first one.

Judging by these plans, the 1st one should not be slower.

Note that just running the query once and comparing times is often
misleading, especially for short queries, since noise often dominates
the query time -- depending on how busy the server was at the moment,
what kind of data was cached, CPU power management/frequency scaling,
etc. ESPECIALLY don't compare pgAdmin timings since those also include
network variance, the time taken to render results on your screen and
who knows what else.

A simple way to benchmark is with pgbench. Just write the query to a
text file (it needs to be a single line and not more than ~4000
characters).
Then run 'pgbench -n -f pgbench_script -T 5' to run it for 5 seconds.
These results  are still not entirely reliable, but much better than
pgAdmin timings.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] SSL encryption makes bytea transfer slow

2011-11-08 Thread Marti Raudsepp

On Tue, Nov 8, 2011 at 12:25, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 I can't get oprofile to run on this RHEL6 box, it doesn't record
 anything, so all I can test is total query duration.

Maybe this helps you with OProfile?

http://people.planetpostgresql.org/andrew/index.php?/archives/224-The-joy-of-Vx.html

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] SSL encryption makes bytea transfer slow

2011-11-03 Thread Marti Raudsepp

On Fri, Oct 28, 2011 at 14:02, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 Without SSL the SELECT finished in about a second,
 with SSL it took over 23 seconds (measured with
 \timing in psql).

When you query with psql, it requests columns in text format. Since
bytea hex-encodes its value if output is text, this means it's
transmitting 60 MB for a 30 MB bytea value.

If you could make sure that your app is requesting binary output, then
you could cut 50% off this time. As others mentioned, most of the
overhead is in SSL compression (not encryption), which can be
disabled, but is not very easy to do.

But 23 seconds for 60 MB is still *very* slow, so something else could
be going wrong. What kind of CPU is this?

On Thu, Nov 3, 2011 at 16:48, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 Disabling OpenSSL compression in the source (which
 is possible since OpenSSL 1.0.0) does not give me any performance
 improvement.

If it doesn't give you any performance improvement then you haven't
disabled compression. Modern CPUs can easily saturate 1 GbitE with
AES256-encrypted connections. Compression is usually the bottleneck,
at 20-30 MB/s.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Ineffective autovacuum

2011-09-27 Thread Marti Raudsepp

1. First things first: vacuum cannot delete tuples that are still
visible to any old running transactions. You might have some very long
queries or transactions that prevent it from cleaning properly:

select * from pg_stat_activity where xact_start  now()-interval '10 minutes';

2. On 8.3 and earlier servers with large tables, it's critical that
your max_fsm_pages and max_fsm_relations are tuned properly. Failing
that, autovacuum will permanently leak space that can only be fixed
with a VACUUM FULL (which will take an exclusive lock and run for a
very long time)

PostgreSQL version 8.4 addressed this problem, but for the
unfortunate, you have to follow the tuning advice here:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#autovacuum_max_fsm_pages.2C_max_fsm_relations

On Tue, Sep 27, 2011 at 08:08, Royce Ausburn royce...@inomial.com wrote:
 I've noticed that the same tables seem to be auto vacuum'd over and over 
 again… Some of the tables are a bit surprising in that they're updated 
 semi-regularly, but not enough (I'd think) to warrant an autovacuum every few 
 minutes… Is this unusual?

Maybe they're just auto-analyze processes? Those get triggered on
insert-only tables too, when vacuum normally wouldn't run.

 Perhaps unrelated: I've done some digging around and happened across a 
 nightly task doing:
 select pg_stat_reset()

AFAIK (but I could be wrong), vacuum uses a separate set of statistics
not affected by pg_stat_reset.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [PERFORMANCE] Insights: fseek OR read_cluster?

2011-09-26 Thread Marti Raudsepp

On Mon, Sep 26, 2011 at 15:51, Antonio Rodriges antonio@gmail.com wrote:
 What is read_cluster()  ? Are you talking about some kind of async and/or

 I meant that if you want to read a chunk of data from file you (1)
 might not call traditional fseek but rather memorize hard drive
 cluster numbers to boost disk seeks and, (2) perform the read of disk
 cluster directly.

PostgreSQL accesses regular files on a file system via lseek(), read()
and write() calls, no magic.

In modern extent-based file systems, mapping a file offset to a
physical disk sector is very fast -- compared to the time of actually
accessing the disk.

I can't see how direct cluster access would even work, unless you'd
give the database direct access to a raw partition, in which case
Postgres would effectively have to implement its own file system. The
gains are simply not worth it for Postgres, our developer resources
are better spent elsewhere.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Odd misprediction

2011-09-16 Thread Marti Raudsepp

On Fri, Sep 16, 2011 at 17:50, Claudio Freire klaussfre...@gmail.com wrote:
 It's not an issue for me (it's not really impacting performance), but
 since it was odd I thought I might ask.

 I have this supermegaquery:

 SELECT
       t.date AS status_date, lu.id AS memberid, lu.username AS
 username, u.url AS url, ub.url_pattern AS urlpattern, lu.email AS
 email,
       lu.birth_date AS birthdate, lu.creation_date AS creationdate,
 s.name AS state, co.name AS country,
       opd.survey_id AS originalSurvey, c.name AS city , lu.confirmed
 AS confirmed , pd.name AS action , sd.duration AS loi
 FROM   tracks t
       LEFT JOIN surveyduration_v sd
       ON     sd.member_id = t.member_id
       AND    sd.survey_id = 5936
       INNER JOIN all_users_v lu
       ON     lu.id              = t.member_id
       AND    lu.panel_source_id = 1
       LEFT JOIN track_status ts
       ON     ts.id = t.track_status_id
       LEFT JOIN partners p
       ON     p.id = t.partner_id
       LEFT JOIN urls u
       ON     u.id        = t.url_id
       AND    u.survey_id = 5936
       LEFT JOIN url_batchs ub
       ON     u.url_batch_id = ub.id
       LEFT JOIN states s
       ON     lu.state_id = s.id
       LEFT JOIN cities c
       ON     lu.city_id = c.id
       LEFT JOIN countries co
       ON     lu.country_id = co.id
       LEFT JOIN partner_deliveries pd
       ON     pd.id                  = t.partner_delivery_id
       AND    t.partner_id IS NOT NULL
       LEFT JOIN partner_deliveries opd
       ON     opd.id = pd.originator_id
 WHERE  t.survey_id   = 5936
 AND    t.track_status_id IN (5)

 With the views

 CREATE OR REPLACE VIEW surveyduration_v AS
  SELECT date_part('epoch'::text, t.date - tl2.date) / 60::double
 precision AS duration, t.member_id, t.survey_id
   FROM tracks t
   JOIN track_logs tl2 ON t.id = tl2.track_id
  WHERE tl2.track_status_id = 8 AND t.track_status_id = 7;

 CREATE OR REPLACE VIEW all_users_v AS
         SELECT 1 AS panel_source_id, livra_users.id,
 livra_users.birth_date, livra_users.creation_date, livra_users.email,
 livra_users.first_name, livra_users.last_name, livra_users.username,
 livra_users.image_link, livra_users.confirmed,
 livra_users.is_panelist, livra_users.unregistered, livra_users.reason,
 livra_users.privacy, livra_users.sex, livra_users.site,
 livra_users.country_id, livra_users.state_id, livra_users.city_id,
 livra_users.last_activity_date, livra_users.partner_id,
 livra_users.survey_id, livra_users.panelist_update,
 livra_users.panelist_percentage
           FROM livra_users
 UNION ALL
         SELECT 2 AS panel_source_id, - external_users.id AS id,
 NULL::timestamp without time zone AS birth_date,
 external_users.creation_date, external_users.email, NULL::character
 varying AS first_name, NULL::character varying AS last_name,
 external_users.username, NULL::character varying AS image_link, true
 AS confirmed, external_users.is_panelist, false AS unregistered,
 NULL::integer AS reason, 0 AS privacy, NULL::integer AS sex,
 external_users.site, external_users.country_id, NULL::integer AS
 state_id, NULL::integer AS city_id, NULL::timestamp without time zone
 AS last_activity_date, NULL::integer AS partner_id,
 external_users.survey_id, NULL::bigint AS panelist_update,
 NULL::smallint AS panelist_percentage
           FROM external_users;

 Server is 9.0.3 running on linux

 The BIG tables are tracks, track_logs and urls, all  30M rows.

 One detail that could be related is that tracks.member_id is an
 undeclared (denoramlized) foreign key to livra_users.

 The resulting plan is:

 Hash Left Join  (cost=51417.93..974563.27 rows=2241518 width=1276)
   Hash Cond: (*SELECT* 1.country_id = co.id)
   -  Hash Left Join  (cost=51415.40..941722.50 rows=2241518 width=1271)
         Hash Cond: (*SELECT* 1.state_id = s.id)
         -  Hash Left Join  (cost=51373.45..910859.68 rows=2241518 
 width=1263)
               Hash Cond: (t.partner_delivery_id = pd.id)
               Join Filter: (t.partner_id IS NOT NULL)
               -  Hash Left Join  (cost=32280.78..854175.26
 rows=2241518 width=1256)
                     Hash Cond: (*SELECT* 1.city_id = c.id)
                     -  Hash Join  (cost=24183.20..792841.63
 rows=2241518 width=1249)
                           Hash Cond: (*SELECT* 1.id = t.member_id)
                           -  Append  (cost=0.00..148254.38
 rows=3008749 width=168)
                                 -  Subquery Scan on *SELECT* 1
 (cost=0.00..140223.96 rows=3008748 width=168)
                                       -  Seq Scan on livra_users
 (cost=0.00..110136.48 rows=3008748 width=168)
                                 -  Subquery Scan on *SELECT* 2
 (cost=0.00..8030.42 rows=1 width=60)
                                       -  Result  (cost=0.00..8030.41
 rows=1 width=60)
                                             One-Time Filter: false
                                             -  Seq Scan on
 external_users  (cost=0.00..8030.41 rows=1 width=60)

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Marti Raudsepp

On Tue, Sep 13, 2011 at 00:26, Robert Klemme shortcut...@googlemail.com wrote:
 In the case of PG this particular example will work:
 1. TX inserts new PK row
 2. TX tries to insert same PK row = blocks
 1. TX commits
 2. TX fails with PK violation
 2. TX does the update (if the error is caught)

That goes against the point I was making in my earlier comment. In
order to implement this error-catching logic, you'll have to allocate
a new subtransaction (transaction ID) for EVERY ROW you insert. If
you're going to be loading billions of rows this way, you will invoke
the wrath of the vacuum freeze process, which will seq-scan all
older tables and re-write every row that it hasn't touched yet. You'll
survive it if your database is a few GB in size, but in the terabyte
land that's unacceptable. Transaction IDs are a scarce resource there.

In addition, such blocking will limit the parallelism you will get
from multiple inserters.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-13 Thread Marti Raudsepp

On Tue, Sep 13, 2011 at 19:34, Robert Klemme shortcut...@googlemail.com wrote:
 I don't think so.  You only need to catch the error (see attachment).
 Or does this create a sub transaction?

Yes, every BEGIN/EXCEPTION block creates a subtransaction -- like a
SAVEPOINT it can roll back to in case of an error.

 Yes, I mentioned the speed issue.  But regardless of the solution for
 MySQL's INSERT..ON DUPLICATE KEY UPDATE which Igor mentioned you
 will have the locking problem anyhow if you plan to insert
 concurrently into the same table and be robust.

In a mass-loading application you can often divide the work between
threads in a manner that doesn't cause conflicts.

For example, if the unique key is foobar_id and you have 4 threads,
thread 0 will handle rows where (foobar_id%4)=0, thread 1 takes
(foobar_id%4)=1 etc. Or potentially hash foobar_id before dividing the
work.

I already suggested this in my original post.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-12 Thread Marti Raudsepp

On Mon, Sep 12, 2011 at 23:04, Shaun Thomas stho...@peak6.com wrote:
 I was alluding to the fact that if a DBA had his system running for a week
 at our transaction level, and PG didn't have forced auto vacuum, and their
 maintenance lapsed even slightly, they could end up with a corrupt database.

It doesn't actually corrupt your database. If you manage to hit the
wraparound age, PostgreSQL disallows new connections and tells you to
run a VACUUM from a standalone backend. (But that should never happen
due to the forced vacuum freeze processes)

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres for a data warehouse, 5-10 TB

2011-09-11 Thread Marti Raudsepp

On Sun, Sep 11, 2011 at 17:23, Andy Colson a...@squeakycode.net wrote:
 On 09/11/2011 08:59 AM, Igor Chudov wrote:
 By the way, does that INSERT UPDATE functionality or something like this 
 exist in Postgres?
 You have two options:
 1) write a function like:
 create function doinsert(_id integer, _value text) returns void as
 2) use two sql statements:

Unfortunately both of these options have caveats. Depending on your
I/O speed, you might need to use multiple loader threads to saturate
the write bandwidth.

However, neither option is safe from race conditions. If you need to
load data from multiple threads at the same time, they won't see each
other's inserts (until commit) and thus cause unique violations. If
you could somehow partition their operation by some key, so threads
are guaranteed not to conflict each other, then that would be perfect.
The 2nd option given by Andy is probably faster.

You *could* code a race-condition-safe function, but that would be a
no-go on a data warehouse, since each call needs a separate
subtransaction which involves allocating a transaction ID.



Which brings me to another important point: don't do lots of small
write transactions, SAVEPOINTs or PL/pgSQL subtransactions. Besides
being inefficient, they introduce a big maintenance burden. In
PostgreSQL's MVCC, each tuple contains a reference to the 32-bit
transaction ID that inserted it (xmin). After hitting the maximum
32-bit value transaction ID, the number wraps around. To prevent old
rows from appearing as new, a vacuum freeze process will run after
passing autovacuum_freeze_max_age transactions (200 million by
default) to update all old rows in your database. Using fewer
transaction IDs means it runs less often.

On small databases, this is usually not important. But on a 10TB data
warehouse, rewriting a large part of your database totally kills
performance for any other processes.
This is detailed in the documentation:
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] seq scan in the case of max() on the primary key column

2011-06-22 Thread Marti Raudsepp

On Thu, Jun 16, 2011 at 21:36, Shaun Thomas stho...@peak6.com wrote:
 You can call that instead of max, and it'll be much faster. You can create
 an analog for min if you need it. So for this, you'd call:

Cool, I've needed this function sometimes but never bothered enough to
write it myself. Now I created a wiki snippet page for this handy
feature here:
https://wiki.postgresql.org/wiki/Efficient_min/max_over_partitioned_table

With Jim Nasby's idea to use regclass instead of relation names, the
function is now half its length and probably more reliable. There's no
need to touch pg_class directly at all.

I also changed it to return bigint instead of integer, as that's more
versatile, and the performance loss is probably negligible.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] 100% CPU Utilization when we run queries.

2011-06-10 Thread Marti Raudsepp

On Wed, Jun 8, 2011 at 07:19, bakkiya bakk...@gmail.com wrote:
 We have a postgresql 8.3.8 DB which consumes 100% of the CPU whenever we run
 any query. We got vmstat output Machine details are below:

Any query? Does even SELECT 1 not work? Or SELECT * FROM sometable LIMIT 1

Or are you having problems with only certain kinds of queries? If so,
please follow this for how to report it:
https://wiki.postgresql.org/wiki/SlowQueryQuestions

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgresql on itanium server

2011-06-09 Thread Marti Raudsepp

On Thu, Jun 9, 2011 at 13:03, muthu krishnan
muthu.krishnan.li...@gmail.com wrote:
 Thank you for suggesting the valuable URL, we are getting 3 floating point
 assist fault error for every second, will it impact the performance for
 postgresql?

Probably.

The kernel throttles these messages, so you're probably performing
many more of these calculations than the number of messages.

 Is there any option to turn on flush to zero mode in itanium cpu while
 compiling postgresql from source?

As the URL mentions, you can build with CFLAGS=-ffast-math, that
should work for PostgreSQL too.

But since you know you're operating with denormal numbers, you WILL
get different results to queries. Whether that's a problem for you
depends on your application. You could start getting division by zero
errors for instance.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: {Spam} [PERFORM] Will shared_buffers crash a server

2011-05-01 Thread Marti Raudsepp

Qiang Wang forest_qi...@yahoo.com wrote:
 We have PostgreSQL 8.3 running on Debian Linux server. We built an
 applicantion using PHP programming language and Postgres database. There are
 appoximatly 150 users using the software constantly. We had some performance
 degration before and after some studies we figured out we will need to tune
 PostgreSQL configurations.

 However we suffered 2 times server crashes after tunning the configuration.
 Does anyone have any idea how this can happen?

Could you explain in more detail, *how* it crashed?

On Linux, the first suspect for crashes is usually the OOM
(out-of-memory) killer. When the kernel thinks it's run out of memory,
it picks a task and kills it. Due to the way PostgreSQL uses shared
memory, it's more likely to be killed than other processes.

To figure out whether you've suffered an OOM kill, run dmesg, you
would see something like:
[2961426.424851] postgres invoked oom-killer: gfp_mask=0x201da,
order=0, oomkilladj=0
[2961426.424857] postgres cpuset=/ mems_allowed=0
[2961426.424861] Pid: 932, comm: postgres Not tainted 2.6.31-22-server
#65-Ubuntu
[2961426.424863] Call Trace:
...

The first step in solving OOM kills is disabling memory overcommit;
add 'vm.overcommit_memory = 0' to /etc/sysctl.conf and run the command
'echo 0  /proc/sys/vm/overcommit_memory'

This doesn't prevent OOM kills entirely, but usually reduces them
significantly, queries will now abort with an out of memory error if
they're responsible for memory exhaustion.

You can also reduce the chance that PostgreSQL is chosen for killing,
by changing its oom_adj, documented here:
http://blog.credativ.com/en/2010/03/postgresql-and-linux-memory-management.html

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] multiple table scan performance

2011-03-29 Thread Marti Raudsepp

On Wed, Mar 30, 2011 at 01:16, Samuel Gendler sgend...@ideasculptor.com wrote:
 I've got some functionality that necessarily must scan a relatively large 
 table

 Is there any performance benefit to revamping the workload such that it issues
 a single:
 insert into (...) select ... UNION select ... UNION select
 as opposed to 3 separate insert into () select ... statements.

Apparently not, as explained by Claudio Freire. This seems like missed
opportunity for the planner, however. If it scanned all three UNION
subqueries in parallel, the synchronized seqscans feature would kick
in and the physical table would only be read once, instead of 3 times.

(I'm assuming that seqscan disk access is your bottleneck)

You can trick Postgres (8.3.x and newer) into doing it in parallel
anyway: open 3 separate database connections and issue each of these
'INSERT INTO ... SELECT' parts separately.  This way all the queries
should execute in about 1/3 the time, compared to running them in one
session or with UNION ALL.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] pg9.0.3 explain analyze running very slow compared to a different box with much less configuration

2011-03-24 Thread Marti Raudsepp

On Thu, Mar 24, 2011 at 11:11, Achilleas Mantzios
ach...@matrix.gatewaynet.com wrote:
 My problem had to do with the speed of gettimeofday. You might want to do 
 some special setting regarding
 your box's way of reading time for the hw clock.

Just for extra info, on x86, TSC is usually the fast timeofday
implementation. On recent CPUs in single-socket configurations, TSC
should always be available, regardless of any power management. I
don't know about multi-socket. If you want to know whether your kernel
is using tsc, run:

cat /sys/devices/system/clocksource/clocksource0/current_clocksource

On older CPUs, you often had to disable some sort of power management
in order to get a stable TSC -- the ondemand scaling governor is the
top suspect. Disabling this is distro-specific. You have to reboot to
get the kernel to re-test TSC. Unfortunately disabling power
management later at boot doesn't help you, you have to prevent it from
activating at all.

For debugging, grepping dmesg for tsc or clocksource is often helpful.
On machines with unstable TSC you'll see output like this:

[0.00] Fast TSC calibration using PIT
[0.164068] checking TSC synchronization [CPU#0 - CPU#1]: passed.
[0.196730] Switching to clocksource tsc
[0.261347] Marking TSC unstable due to TSC halts in idle
[0.261536] Switching to clocksource acpi_pm

If you just want to get repeatable timings, you can force both
machines to use the hpet clocksource:
echo hpet  /sys/devices/system/clocksource/clocksource0/current_clocksource

Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow query on CLUTER -ed tables

2011-03-23 Thread Marti Raudsepp

2011/3/23 Laszlo Nagy gand...@shopzeus.com:
 GroupAggregate  (cost=5553554.25..5644888.17 rows=2283348 width=50)
   -  Sort  (cost=5553554.25..5559262.62 rows=2283348 width=50)
         Sort Key: pph.hid, ppoh.merchantid, pph.hdate
         -  Nested Loop  (cost=0.00..5312401.66 rows=2283348 width=50)
               -  Index Scan using idx_product_price_history_id_hdate on
 product_price_history pph  (cost=0.00..8279.80 rows=4588 width=16)
                     Index Cond: (id = 37632081)
               -  Index Scan using pk_product_price_offer_history on
 product_price_offer_history ppoh  (cost=0.00..1149.86 rows=498 width=42)
                     Index Cond: (ppoh.hid = pph.hid)
                     Filter: (ppoh.isfeatured = 1)

I suspect that, since the matched hid's probably aren't sequential,
many of those ~500 product_price_offer_history rows will be far apart
on disk.

Please show the EXPLAIN ANALYZE output in the slow case, not just
EXPLAIN. Also, PostgreSQL version? What configuration options have you
changed? (http://wiki.postgresql.org/wiki/SlowQueryQuestions)

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Tuning massive UPDATES and GROUP BY's?

2011-03-14 Thread Marti Raudsepp

On Sun, Mar 13, 2011 at 18:36, runner run...@winning.com wrote:
 Tried removing the indexes and other constraints just for
 the import but for a noob like me, this was too much to ask.  Maybe
 when I get more experience.

pgAdmin should make it pretty easy. Choose each index and constraint,
save the code from the SQL pane for when you need to restore it, and
do a right click - Drop

 Other than being very inefficient, and consuming
 more time than necessary, is there any other down side to importing
 into an indexed table?

Doing so will result in somewhat larger (more bloated) indexes, but
generally the performance impact of this is minimal.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Tuning massive UPDATES and GROUP BY's?

2011-03-12 Thread Marti Raudsepp

On Fri, Mar 11, 2011 at 21:06, fork forkandw...@gmail.com wrote:
 Like the following?  Will it rebuild the indexes in a sensical way?

Don't insert data into an indexed table. A very important point with
bulk-loading is that you should load all the data first, then create
the indexes. Running multiple (different) CREATE INDEX queries in
parallel can additionally save a lot of time. Also don't move data
back and forth between the tables, just drop the original when you're
done.

Doing this should give a significant performance win. Partitioning
them to fit in cache should improve it further, but I'm not sure
anymore that it's worthwhile considering the costs and extra
maintenance.

 Is there a rule of thumb on tradeoffs in a partitioned table?

The only certain thing is that you'll lose group aggregate and
merge join query plans. If you only see HashAggregate plans when
you EXPLAIN your GROUP BY queries then it probably won't make much of
a difference.

 I would use the partition column whatever I am most likely
 to cluster by in a single big table, right?

Yes.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Tuning massive UPDATES and GROUP BY's?

2011-03-10 Thread Marti Raudsepp

On Thu, Mar 10, 2011 at 17:40, fork forkandw...@gmail.com wrote:
 The data is not particularly sensitive; if something happened and it rolled
 back, that wouldnt be the end of the world.  So I don't know if I can use
 dangerous setting for WAL checkpoints etc.   There are also aren't a lot of
 concurrent hits on the DB, though a few.

If you don't mind long recovery times in case of a crash, set
checkpoint_segments to ~100 and checkpoint_completion_target=0.9; this
will improve write throughput significantly.

Also, if you don't mind CORRUPTing your database after a crash,
setting fsync=off and full_page_writes=off gives another significant
boost.

 I am loathe to create a new table from a select, since the indexes themselves
 take a really long time to build.

UPDATE on a table with many indexes will probably be slower. If you
want to speed up this part, use INSERT INTO x SELECT and take this
chance to partition your table, such that each individual partition
and most indexes will fit in your cache. Index builds from a warm
cache are very fast in PostgreSQL. You can create several indexes at
once in separate sessions, and the table will only be scanned once.

Don't forget to bump up maintenance_work_mem for index builds, 256MB
might be a reasonable arbitrary value.

The downside is that partitioning can interfere with your read queries
if they expect the data in a sorted order. But then, HashAggregate
tends to be faster than GroupAggregate in many cases, so this might
not matter for your queries. Alternatively you can experiment with
PostgreSQL 9.1 alpha, which has mostly fixed this shortcoming with the
merge append plan node.

 As the title alludes, I will also be doing GROUP BY's on the data, and would
 love to speed these up, mostly just for my own impatience...

I think regular tuning is the best you can do here.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Perl Binding affects speed?

2011-02-25 Thread Marti Raudsepp

On Fri, Feb 25, 2011 at 05:02, Sam Wong s...@hellosam.net wrote:
 * But if I do this - using binding:
 $dbh-selectall_arrayref(SELECT * from shipment_lookup WHERE (UPPER(lookup)
 LIKE ?), undef, '0GURG5YGVQA9%');
 It took 10 seconds to finish the query, just like it was using full table
 scan instead! Even though the 'explain' shows the same query plan.

This is a pretty common shortcoming with placeholders. Since planning
of parameterized queries is done *before* binding parameters, the
planner has no knowledge of what the ? placeholder actually is. Thus
it often gets the selectivity statistics wrong and produces worse
plans for your values.

AFAIK the only workaround is to not use variable binding in these
cases, but escape and insert your variables straight it into the SQL
query.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem

2011-02-16 Thread Marti Raudsepp

On Tue, Feb 15, 2011 at 20:01, Scott Marlowe scott.marl...@gmail.com wrote:
 run htop and look for red.  if youi've got lots of red bar on each CPU
 but no io wait then it's waiting for memory access.

I don't think this is true. AFAICT the red bar refers to system
time, time that's spent in the kernel -- either in syscalls or kernel
background threads.

Operating systems don't generally account memory accesses (cache
misses) for processes, if you don't specially ask for it. The closest
thing I know of is using Linux perf tools, e.g. perf top -e
cache-misses. OProfile, DTrace and SystemTap can probably do
something similar.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] LIMIT on partitioned-table!?

2011-02-15 Thread Marti Raudsepp

On Tue, Feb 15, 2011 at 21:33, Kim A. Brandt kimabra...@gmx.de wrote:
 removing the ORDER BY worked. But I am afraid to ask this. How can I order
 by partition? It seams that the planner has picked a random(!?) order of
 partition to select from. The returned records, from the selected partition,
 are correctly sorted bythe index though.

If a single query accesses more than one partition, PostgreSQL
currently cannot read the values in index-sorted order. Hence with
ORDER BY and LIMIT, PostgreSQL cannot return *any* results before it
has read all matching rows and then sorted them. Adding a LIMIT
doesn't help much. Your only bet is to reduce the number of matched
rows, or make sure that you only access a single partition.

Increasing work_mem may speed up the sort step if you're hitting the
disk (EXPLAIN ANALYZE VERBOSE will tell you whether that's the case).

This will change in PostgreSQL 9.1 which has a new Merge Append plan node.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-08 Thread Marti Raudsepp

On Tue, Feb 8, 2011 at 18:36, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Yeah, current behavior with that shutdown option is the opposite of
 smart for any production environment I've seen.  (I can see where it
 would be handy in development, though.)  What's best in production
 is the equivalent of the fast option with escalation to immediate if
 necessary to ensure shutdown within the time limit.

+1, we should call it dumb :)

Not accepting new connections with the database system is shutting
down makes it even worse -- it means you can't log in to the server
to inspect who's querying it or call pg_terminate_backend() on them.

I couldn't find any past discussions about changing the default to fast.
Are there any reasons why that cannot be done in a future release?

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-08 Thread Marti Raudsepp

On Tue, Feb 8, 2011 at 22:09, Greg Smith g...@2ndquadrant.com wrote:
 Kevin and I both suggested a fast plus timeout then immediate behavior is
 what many users seem to want.  My comments were at
 http://archives.postgresql.org/pgsql-hackers/2009-09/msg01145.php ; for an
 example of how fast shutdown can fail see
 http://archives.postgresql.org/pgsql-bugs/2009-03/msg00062.php

True, I've hit that a few times too.

Seems that a better solution would be implementing a new -m option
that does this transparently?

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Really really slow select count(*)

2011-02-07 Thread Marti Raudsepp

On Mon, Feb 7, 2011 at 05:03, Craig Ringer cr...@postnewspapers.com.au wrote:
 What would possibly help would be if Pg could fall back to lower
 shared_buffers automatically, screaming about it in the logs but still
 launching. OTOH, many people don't check the logs, so they'd think their
 new setting had taken effect and it hadn't - you've traded one usability
 problem for another. Even if Pg issued WARNING messages to each client
 that connected, lots of (non-psql) clients don't display them, so many
 users would never know.

 Do you have a suggestion about how to do this better? The current
 approach is known to be rather unlovely, but nobody's come up with a
 better one that works reasonably and doesn't trample on other System V
 shared memory users that may exist on the system.

We could do something similar to what Apache does -- provide distros
with a binary to check the configuration file in advance. This check
program is launched before the restart command, and if it fails, the
server is not restarted.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Different execution plans for semantically equivalent queries

2011-02-07 Thread Marti Raudsepp

On Mon, Feb 7, 2011 at 00:03, Mikkel Lauritsen ren...@tala.dk wrote:
 SELECT * FROM table t1 WHERE 0 = (SELECT COUNT(*) FROM table t2 WHERE
     t2.type = t1.type AND t2.timestamp  t1.timestamp)

 I suspect that *any* database is going to have trouble optimizing that.

 Just out of curiosity I've been looking a bit at the optimizer code
 in PostgreSQL, and it seems as if it would be at least theoretically
 possible to add support for things like transforming the query at
 hand into the NOT EXISTS form; a bit like how = NULL is converted
 to IS NULL.

 Would a change like that be accepted, or would you rather try to
 indirectly educate people into writing better SQL?

There are some reasonable and generic optimizations that could be done
here. Being able to inline subqueries with aggregates into joins would
be a good thing e.g. transform your query into this:

SELECT t1.* FROM table t1 JOIN table t2 ON (t2.type = t1.type)
WHERE t2.timestamp  t1.timestamp
GROUP BY t1.* HAVING COUNT(t2.*)=0

However, this is probably still worse than a NOT EXISTS query.

I am less excited about turning COUNT(x)=0 query to NOT EXISTS
because that's just a bad way to write a query.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] COPY TO stdout statements occurrence in log files

2011-01-14 Thread Marti Raudsepp

On Fri, Jan 14, 2011 at 23:19, Chris Browne cbbro...@acm.org wrote:
 2.  In 9.1, there will be a new answer, as there's a GUC to indicate the
 application_name.

Actually this was already introduced in PostgreSQL 9.0 :)

You can add application_name to your log_line_prefix with %a. For
pg_dump it will display pg_dump

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] encourging bitmap AND

2010-12-25 Thread Marti Raudsepp

On Thu, Dec 23, 2010 at 22:52, Tom Lane t...@sss.pgh.pa.us wrote:
 Ben midfi...@gmail.com writes:
 i have a schema similar to the following

 create index foo_s_idx on foo using btree (s);
 create index foo_e_idx on foo using btree (e);

 i want to do queries like

 select * from foo where 150 between s and e;

 That index structure is really entirely unsuited to what you want to do,
 so it's not surprising that the planner isn't impressed with the idea of
 a bitmap AND.

Why is it unsuited for this query? It expands to (150  s AND 150  e)
 which should work nicely with bitmap AND as far as I can tell.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] MySQL HandlerSocket - Is this possible in PG?

2010-12-22 Thread Marti Raudsepp

On Tue, Dec 21, 2010 at 11:09, Michael Ben-Nes mich...@epoch.co.il wrote:
 Just stumbled on the following post:
 http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html

 The post claim that MySQL can do more qps then MemCahed or any other NoSQL
 when doing simple queries like: SELECT * FROM table WHERE id=num;

 And I wonder if:

 1. Currently, is it possbile to achive the same using PG 9.0.x
 2. Is it possible at all?

I was curious what could be done currently, without any modifications
to PostgreSQL itself, so I ran a simple benchmark.

Table:
create table usr (user_id int primary key not null, user_name text not
null, user_email text not null, created timestamp not null);
insert into usr select generate_series(1, 100), 'Yukari Takeba',
'yukari.tak...@dena.jp', '2010-02-03 11:22:33';

?php
$db = pg_connect('');
$res = pg_prepare($db, 'get_user', 'select user_name, user_email,
created from usr where user_id=$1');
$res = pg_query($db, 'begin');

$args = array();
for($i = 0; $i  25; $i++)
{
  $args[0] = rand(1, 100);
  $res = pg_execute($db, 'get_user', $args);
  $row = pg_fetch_row($res);
}
?

Each process does 250k queries, so when I run 4 in parallel it's 1M
queries total.

I'm running PostgreSQL 9.1alpha2, PHP 5.3.4, kernel 2.6.36.2 on Arch
Linux; AMD Phenom II X4 955.
The only tuning I did was setting shared_buffers=256M

Results:
% time php pg.php  time php pg.php time php pg.php time php pg.php  sleep 11
[1] 29792
[2] 29793
[3] 29795
[4] 29797
php pg.php  1,99s user 0,97s system 30% cpu 9,678 total
[2]done   time php pg.php
php pg.php  1,94s user 1,06s system 30% cpu 9,731 total
[3]  - done   time php pg.php
php pg.php  1,92s user 1,07s system 30% cpu 9,746 total
[1]  - done   time php pg.php
php pg.php  2,00s user 1,04s system 31% cpu 9,777 total
[4]  + done   time php pg.php

So around 10 seconds to run the test in total.
These numbers aren't directly comparable to their test -- I tested
over a local UNIX socket, with PHP client on the same machine -- but
it's a datapoint nevertheless.

Bottom line, you can expect up to 100 000 QPS using pg_execute() on a
cheap quad-core gamer CPU. You won't be beating memcached with current
PostgreSQL, but I think it's a respectable result.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Auto-clustering?

2010-12-17 Thread Marti Raudsepp

2010/12/17 Filip Rembiałkowski filip.rembialkow...@gmail.com:
 regarding clustering: it does not help with index bloat.

I'm almost sure it does, CLUSTER re-creates all indexes from scratch
after copying the tuples.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] postgres performance tunning

2010-12-17 Thread Marti Raudsepp

On Thu, Dec 16, 2010 at 14:33, selvi88 selvi@gmail.com wrote:
        I have a requirement for running more that 15000 queries per second.
 Can you please tell what all are the postgres parameters needs to be changed
 to achieve this.

You have not told us anything about what sort of queries they are or
you're trying to do. PostgreSQL is not the solution to all database
problems. If all you have is a dual-core machine then other software
can possibly make better use of the available hardware.

First of all, if they're mostly read-only queries, you should use a
caching layer (like memcache) in front of PostgreSQL. And you can use
replication to spread the load across multiple machines (but you will
get some latency until the updates fully propagate to slaves).

If they're write queries, memory databases (like Redis), or disk
databases specifically optimized for writes (like Cassandra) might be
more applicable.

Alternatively, if you can tolerate some latency, use message queuing
middleware like RabbitMQ to queue up a larger batch and send updates
to PostgreSQL in bulk.

As for optimizing PostgreSQL itself, if you have a high connection
churn then you will need connection pooling middleware in front --
such as pgbouncer or pgpool. But avoiding reconnections is a better
idea. Also, use prepared queries to avoid parsing overheads for every
query.

Obviously all of these choices involve tradeoffs and caveats, in terms
of safety, consistency, latency and application complexity.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] only one index is using, why?

2010-12-15 Thread Marti Raudsepp

On Wed, Dec 15, 2010 at 08:56, AI Rumman rumman...@gmail.com wrote:
 My question is why crmentity_setype_idx index is being used only.
 crmentity_deleted_idx index is not using.
 Any idea please.

Because the planner determined that the cost of scanning *two* indexes
and combining the results is more expensive than scanning one index
and filtering the results afterwards.

Looks like your query could use a composite index on both columns:
(deleted, setype)
Or a partial index:  (setype) WHERE deleted=0

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Hardware recommendations

2010-12-09 Thread Marti Raudsepp

On Thu, Dec 9, 2010 at 04:28, Scott Marlowe scott.marl...@gmail.com wrote:
 On Wed, Dec 8, 2010 at 5:03 PM, Benjamin Krajmalnik k...@servoyant.com 
 wrote:
 My biggest concern with SSD drives is their life expectancy,

 Generally that's not a big issue, especially as the SSDs get larger.
 Being able to survive a power loss without corruption is more of an
 issue, so if you go SSD get ones with a supercapacitor that can write
 out the data before power down.

I agree with Benjamin here. Even if you put multiple SSD drives into a
RAID array, all the drives get approximately the same write load and
thus will likely wear out and fail at the same time!

 As for the Areca controllers, I haven't tested them with the latest
 drivers or firmware, but we would routinely get 180 to 460 days of
 uptime between lockups

That sucks! But does a BBU even help with SSDs? The flash eraseblock
is larger than the RAID cache unit size anyway, so as far as I can
tell, it might not save you in the case of a power loss.

Any thoughts whether software RAID on SSD is a good idea?

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Hardware recommendations

2010-12-08 Thread Marti Raudsepp

On Thu, Dec 9, 2010 at 01:26, Andy angelf...@yahoo.com wrote:
 If you are IO-bound, you might want to consider using SSD.

 A single SSD could easily give you more IOPS than 16 15k SAS in RAID 10.

Are there any that don't risk your data on power loss, AND are cheaper
than SAS RAID 10?

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow query to get last created row using CURRVAL

2010-12-04 Thread Marti Raudsepp

On Sat, Dec 4, 2010 at 13:56, Mathieu De Zutter math...@dezutter.org wrote:
 I have no idea why in some cases the index scan is not considered.
 Does anyone have an idea?

I guess that it's because the currval() function is volatile -- its
value has to be tested for again each row.

Try this instead:
SELECT user_id FROM log_event WHERE id = (SELECT CURRVAL('log_event_id_seq'));

This will assure that there's only one call to currval().

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Marti Raudsepp

On Wed, Nov 17, 2010 at 01:31, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, we're not going to increase the default to gigabytes, but we could
 very probably increase it by a factor of 10 or so without anyone
 squawking.  It's been awhile since I heard of anyone trying to run PG in
 4MB shmmax.  How much would a change of that size help?

In my testing, when running a large bulk insert query with fdatasync
on ext4, changing wal_buffers has very little effect:
http://ompldr.org/vNjNiNQ/wal_sync_method1.png

(More details at
http://archives.postgresql.org/pgsql-performance/2010-11/msg00094.php
)

It would take some more testing to say this conclusively, but looking
at the raw data, there only seems to be an effect when moving from 8
to 16MB. Could be different on other file systems though.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] MVCC performance issue

2010-11-14 Thread Marti Raudsepp

On Sat, Nov 13, 2010 at 07:53, Craig Ringer cr...@postnewspapers.com.au wrote:
 Oracle's MVCC approach has its own costs. Like Pg's, those costs increase
 with update/delete frequency. Instead of table bloat, Oracle suffers from
 redo log growth (or redo log size management issues). Instead of increased
 table scan costs from dead rows, Oracle suffers from random I/O costs as it
 looks up the out-of-line redo log for old rows. Instead of long-running
 writer transactions causing table bloat, Oracle can have problems with
 long-running reader transactions aborting when the redo log runs out of
 space.

Another advantage of Oracle's approach seems that they need much less
tuple-level overhead. IMO the 23-byte tuple overhead is a much bigger
drawback in Postgres than table fragmentation.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] MVCC performance issue

2010-11-14 Thread Marti Raudsepp

On Thu, Nov 11, 2010 at 20:25, Kyriacos Kyriacou
kyriac...@prime-tel.com wrote:
 By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
 new copy of the row in a new location.

 result is to have huge fragmentation on table space, unnecessary updates
 in all affected indexes, unnecessary costly I/O operations, poor
 performance on SELECT that retrieves big record sets (i.e. reports etc)
 and slower updates.

Have you tried reducing the table fillfactor and seeing if HOT update
ratio increases?

PostgreSQL 8.3 introduced HOT updates as kind of a middle ground -- if
the update doesn't affect indexed columns and there's enough space in
the same page that is being updated, then the new version will be
written in the same page and indexes don't need to be touched at all.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-14 Thread Marti Raudsepp

On Sat, Nov 13, 2010 at 20:01, Tom Lane t...@sss.pgh.pa.us wrote:
 What's your basis for asserting he's uninterested?  Please have a little
 patience.

My apologies, I was under the impression that he hadn't answered your
request, but he did in the -hackers thread.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-13 Thread Marti Raudsepp

On Mon, Nov 8, 2010 at 20:40, Tom Lane t...@sss.pgh.pa.us wrote:
 The latter choice is the one that requires testing to prove that it is the 
 proper and preferred default from the performance and data reliability POV.

 And, in fact, the game plan is to do that testing and see which default
 we want.  I think it's premature to argue further about this until we
 have some test results.

Who will be doing that testing? You said you're relying on Greg Smith
to manage the testing, but he's obviously uninterested, so it seems
unlikely that this will go anywhere.

I posted my results with the simple INSERT test, but nobody cared. I
could do some pgbench runs, but I have no idea what parameters would
give useful results.

Meanwhile, PostgreSQL performance is regressing and there's still no
evidence that open_datasync is any safer.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-08 Thread Marti Raudsepp

On Mon, Nov 8, 2010 at 02:05, Greg Smith g...@2ndquadrant.com wrote:
 Where's your benchmarks proving it then?  If you're right about this, and
 I'm not saying you aren't, it should be obvious in simple bechmarks by
 stepping through various sizes for wal_buffers and seeing the
 throughput/latency situation improve.

Since benchmarking is the easy part, I did that. I plotted the time
taken by inserting 2 million rows to a table with a single integer
column and no indexes (total 70MB). Entire script is attached. If you
don't agree with something in this benchmark, please suggest
improvements.

Chart: http://ompldr.org/vNjNiNQ/wal_sync_method1.png
Spreadsheet: http://ompldr.org/vNjNiNg/wal_sync_method1.ods (the 2nd
worksheet has exact measurements)

This is a different machine from the original post, but similar
configuration. One 1TB 7200RPM Seagate Barracuda, no disk controller
cache, 4G RAM, Phenom X4, Linux 2.6.36, PostgreSQL 9.0.1, Arch Linux.

This time I created a separate 20GB ext4 partition specially for
PostgreSQL, with all default settings (shared_buffers=32MB). The
partition is near the end of the disk, so hdparm gives a sequential
read throughput of ~72 MB/s. I'm getting frequent checkpoint warnings,
should I try larger checkpoing_segments too?

The partition is re-created and 'initdb' is re-ran for each test, to
prevent file system allocation from affecting results. I did two runs
of all benchmarks. The points on the graph show a sum of INSERT time +
COMMIT time in seconds.

One surprising thing on the graph is a plateau, where open_datasync
performs almost equally with wal_buffers=128kB and 256kB.

Another noteworthy difference (not visible on the graph) is that with
open_datasync -- but not fdatasync -- and wal_buffers=128M, INSERT
time keeps shrinking, but COMMIT takes longer. The total INSERT+COMMIT
time remains the same, however.



I have a few expendable hard drives here so I can test reliability by
pulling the SATA cable as well. Is this kind of testing useful? What
workloads do you suggest?

Regards,
Marti


pgtest.sh
Description: Bourne shell script

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-07 Thread Marti Raudsepp

On Mon, Nov 8, 2010 at 01:35, Greg Smith g...@2ndquadrant.com wrote:
 Yes; it's supposed to, and that logic works fine on some other platforms.

No, the logic was broken to begin with. Linux technically supported
O_DSYNC all along. PostgreSQL used fdatasync as the default. Now,
because Linux added proper O_SYNC support, PostgreSQL suddenly prefers
O_DSYNC over fdatasync?

 Until you've
 quantified which of the cases do that--which is required for reliable
 operation of PostgreSQL--and which don't, you don't have any data that can
 be used to draw a conclusion from.  If some setups are faster because they
 write less reliably, that doesn't automatically make them the better choice.

I don't see your point. If fdatasync worked on Linux, AS THE DEFAULT,
all the time until recently, then how does it all of a sudden need
proof NOW?

If anything, the new open_datasync should be scrutinized because it
WASN'T the default before and it hasn't gotten as much testing on
Linux.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences

2010-11-05 Thread Marti Raudsepp

On Fri, Nov 5, 2010 at 13:32, A B gentosa...@gmail.com wrote:
 I was just thinking about the case where I will have almost 100%
 selects, but still needs something better than a plain key-value
 storage so I can do some sql queries.
 The server will just boot, load data, run,  hopefully not crash but if
 it would, just start over with load and run.

If you want fast read queries then changing
fsync/full_page_writes/synchronous_commit won't help you.

Just follow the regular tuning guide. shared_buffers,
effective_cache_size, work_mem, default_statistics_target can make a
difference.

http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences

2010-11-05 Thread Marti Raudsepp

On Fri, Nov 5, 2010 at 13:11, Guillaume Cottenceau g...@mnc.ch wrote:
 Don't use PostgreSQL, just drop your data, you will end up with
 the same results and be even faster than any use of PostgreSQL.
 If anyone needs data, then just say you had data corruption, and
 that since 100% dataloss is accepted, then all's well.

You're not helping. There are legitimate reasons for trading off
safety for performance.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-05 Thread Marti Raudsepp

On Fri, Nov 5, 2010 at 23:10, Greg Smith g...@2ndquadrant.com wrote:
 Not having a real O_DSYNC on linux until recently makes it even more
 dubious to have it as a default...


 If Linux is now defining O_DSYNC

Well, Linux always defined both O_SYNC and O_DSYNC, but they used to
have the same value. The defaults changed due to an unfortunate
heuristic in PostgreSQL, which boils down to:

#if O_DSYNC != O_SYNC
#define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC
#else
#define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC

(see src/include/access/xlogdefs.h for details)

In fact, I was wrong in my earlier post. Linux always offered O_DSYNC
behavior. What's new is POSIX-compliant O_SYNC, and the fact that
these flags are now distinguished.

Here's the change in Linux:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6b2f3d1f769be5779b479c37800229d9a4809fc3

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-05 Thread Marti Raudsepp

On Sat, Nov 6, 2010 at 00:06, Greg Smith g...@2ndquadrant.com wrote:
  Please refrain from making changes to popular documents like the
 tuning guide on the wiki based on speculation about what's happening.

I will grant you that the details were wrong, but I stand by the conclusion.

I can state for a fact that PostgreSQL's default wal_sync_method
varies depending on the fcntl.h header.
I have two PostgreSQL 9.0.1 builds, one with older
/usr/include/bits/fcntl.h and one with newer.

When I run show wal_sync_method; on one instance, I get fdatasync.
On the other one I get open_datasync.

So let's get down to code.

Older fcntl.h has:
#define O_SYNC   01
# define O_DSYNCO_SYNC  /* Synchronize data.  */

Newer has:
#define O_SYNC 0401
# define O_DSYNC01  /* Synchronize data.  */

So you can see that in the older header, O_DSYNC and O_SYNC are equal.

src/include/access/xlogdefs.h does:

#if defined(O_SYNC)
#define OPEN_SYNC_FLAG  O_SYNC
...
#if defined(OPEN_SYNC_FLAG)
/* O_DSYNC is distinct? */
#if O_DSYNC != OPEN_SYNC_FLAG
#define OPEN_DATASYNC_FLAG  O_DSYNC

^ it's comparing O_DSYNC != O_SYNC

#if defined(OPEN_DATASYNC_FLAG)
#define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC
#elif defined(HAVE_FDATASYNC)
#define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC

^ depending on whether O_DSYNC and O_SYNC were equal, the default
wal_sync_method will change.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-03 Thread Marti Raudsepp

Just some ideas that went through my mind when reading your post.

On Wed, Nov 3, 2010 at 17:52, Nick Matheson nick.d.mathe...@noaa.gov wrote:
 than observed raw disk reads (5 MB/s versus 100 MB/s).  Part of this is
 due to the storage overhead we have observed in Postgres.  In the
 example below, it takes 1 GB to store 350 MB of nominal data.

PostgreSQL 8.3 and later have 22 bytes of overhead per row, plus
page-level overhead and internal fragmentation. You can't do anything
about row overheads, but you can recompile the server with larger
pages to reduce page overhead.

 Is there any way using stored procedures (maybe C code that calls
 SPI directly) or some other approach to get close to the expected 35
 MB/s doing these bulk reads?

Perhaps a simpler alternative would be writing your own aggregate
function with four arguments.

If you write this aggregate function in C, it should have similar
performance as the sum() query.

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Insert performance with composite index

2010-11-01 Thread Marti Raudsepp

On Mon, Nov 1, 2010 at 14:49, Divakar Singh dpsma...@yahoo.com wrote:
 I am trying to tune my libpq program for insert performance.
 When I tried inserting 1M rows into a table with a Primary Key, it took
 almost 62 seconds.
 After adding a composite index of 2 columns, the performance degrades to 125
 seconds.

This sounds a lot like the bottleneck I was hitting. What Linux kernel
version are you running?

If it's 2.6.33 or later, see:
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#wal_sync_method_wal_buffers
http://archives.postgresql.org/pgsql-performance/2010-10/msg00602.php

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Insert performance with composite index

2010-11-01 Thread Marti Raudsepp

On Mon, Nov 1, 2010 at 14:56, Divakar Singh dpsma...@yahoo.com wrote:
 Thanks for your tips. i will try those.
 I am on Solaris Sparc 5.10

Sorry, I assumed you were running Linux. But still it could be the
same problem as I had.

Be careful changing your wal_sync_method, as it has the potential to
corrupt your database. I have no experience with Solaris.

For what it's worth, Jignesh Shah recommends using
wal_sync_method=fsync on Solaris:
http://blogs.sun.com/jkshah/entry/postgresql_on_solaris_better_use
http://blogs.sun.com/jkshah/entry/postgresql_wal_sync_method_and

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-10-31 Thread Marti Raudsepp

Hi pgsql-performance,

I was doing mass insertions on my desktop machine and getting at most
1 MB/s disk writes (apart from occasional bursts of 16MB). Inserting 1
million rows with a single integer (data+index 56 MB total) took over
2 MINUTES! The only tuning I had done was shared_buffers=256MB. So I
got around to tuning the WAL writer and found that wal_buffers=16MB
works MUCH better. wal_sync_method=fdatasync also got similar results.

First of all, I'm running PostgreSQL 9.0.1 on Arch Linux
* Linux kernel 2.6.36 (also tested with 2.6.35.
* Quad-core Phenom II
* a single Seagate 7200RPM SATA drive (write caching on)
* ext4 FS over LVM, with noatime, data=writeback

I am creating a table like: create table foo(id integer primary key);
Then measuring performance with the query: insert into foo (id) select
generate_series(1, 100);

130438,011 mswal_buffers=64kB, wal_sync_method=open_datasync  (all defaults)
29306,847 ms wal_buffers=1MB, wal_sync_method=open_datasync
4641,113 ms  wal_buffers=16MB, wal_sync_method=open_datasync
^ from 130s to 4.6 seconds by just changing wal_buffers.

5528,534 ms wal_buffers=64kB, wal_sync_method=fdatasync
4856,712 ms wal_buffers=16MB, wal_sync_method=fdatasync
^ fdatasync works well even with small wal_buffers

2911,265 mswal_buffers=16MB, fsync=off
^ Not bad, getting 60% of ideal throughput

These defaults are not just hurting bulk-insert performance, but also
everyone who uses synchronus_commit=off

Unless fdatasync is unsafe, I'd very much want to see it as the
default for 9.1 on Linux (I don't know about other platforms).  I
can't see any reasons why each write would need to be sync-ed if I
don't commit that often. Increasing wal_buffers probably has the same
effect wrt data safety.

Also, the tuning guide on wiki is understating the importance of these
tunables. Reading it I got the impression that some people change
wal_sync_method but it's dangerous and it even literally claims about
wal_buffers that 1MB is enough for some large systems

But the truth is that if you want any write throughput AT ALL on a
regular Linux desktop, you absolutely have to change one of these. If
the defaults were better, it would be enough to set
synchronous_commit=off to get all that your hardware has to offer.

I was reading mailing list archives and didn't find anything against
it either. Can anyone clarify the safety of wal_sync_method=fdatasync?
Are there any reasons why it shouldn't be the default?

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-10-31 Thread Marti Raudsepp

On Sun, Oct 31, 2010 at 21:59, Greg Smith g...@2ndquadrant.com wrote:
 open_datasync support was just added to Linux itself very recently.

Oh I didn't realize it was a new feature. Indeed O_DSYNC support was
added in 2.6.33

It seems like bad behavior on PostgreSQL's part to default to new,
untested features.

I have updated the tuning wiki page with my understanding of the problem:
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server#wal_sync_method_wal_buffers

Regards,
Marti

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

72 matches

Mail list logo