Re: [PERFORM] Having some problems with concurrent COPY commands

2015-10-13 Thread Heikki Linnakangas

On 10/12/2015 11:14 PM, Shaun Thomas wrote:

On Mon, Oct 12, 2015 at 1:28 PM, Andres Freund  wrote:


Any chance
you could provide profiles of such a run?


This is as simple as I could make it reliably. With one copy running,
the thread finishes in about 1 second. With 2, it's 1.5s each, and
with all 4, it's a little over 3s for each according to the logs. I
have log_min_duration_statement set to 1000, so it's pretty obvious.
The scary part is that it's not even scaling linearly; performance is
actually getting *worse* with each subsequent thread.

Regarding performance, all of this fits in memory. The tables are only
100k rows with the COPY statement. The machine itself is 8 CPUs with
32GB of RAM, so it's not an issue of hardware. So far as I can tell,
it happens on every version I've tested on, from 9.2 to 9.4. I also
take back what I said about wal_level. Setting it to minimal does
nothing. Disabling archive_mode and setting max_wal_senders to 0 also
does nothing. With 4 concurrent processes, each takes 3 seconds, for a
total of 12 seconds to import 400k rows when it would take 4 seconds
to do sequentially. Sketchy.


I was not able reproduce that behaviour on my laptop. I bumped the 
number of rows in your script 10, to make it run a bit longer. 
Attached is the script I used. The total wallclock time the COPYs takes 
on 9.4 is about 8 seconds for a single COPY, and 12 seconds for 4 
concurrent COPYs. So it's not scaling as well as you might hope, but 
it's certainly not worse-than-serial either, as you you're seeing.


If you're seeing this on 9.2 and 9.4 alike, this can't be related to the 
XLogInsert scaling patch, although you might've found a case where that 
patch didn't help where it should've. I ran "perf" to profile the test 
case, and it looks like about 80% of the CPU time is spent in the b-tree 
comparison function. That doesn't leave much scope for XLogInsert 
scalability to matter one way or another.


I have no explanation for what you're seeing though. A bad spinlock 
implementation perhaps? Anything special about the hardware at all? Can 
you profile it on your system? Which collation?


- Heikki


launch4.sh
Description: Bourne shell script

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Heikki Linnakangas

On 07/07/2015 10:59 PM, Graeme B. Bell wrote:

Cache flushing isn't an atomic operation though. Even if the ordering
is right, you are likely to have a partial fsync on the disk when the
lights go out - isn't your FS still corrupt?


If the filesystem is worth its salt, no. Journaling filesystems for 
example rely on the journal to work around that problem, and there are 
other mechanisms.


PostgreSQL has exactly the same problem and uses the WAL to solve it.

- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Heikki Linnakangas

On 07/07/2015 09:01 PM, Wes Vaske (wvaske) wrote:

Regarding:
“lie about their fsync status.”

This is mostly semantics but it might help google searches on the issue.

A drive doesn’t support fsync(), that’s a filesystem/kernel process. A drive will do 
a FLUSH CACHE. Before kernels 2.6.low numbers the fsync() call wouldn’t sent 
any ATA or SCSI command to flush the disk cache. Whereas—AFAICT—modern kernels and 
file system versions*will*  do this. When ‘sync’ is called the filesystem will issue 
the appropriate command to the disk to flush the write cache.

For ATA, this is “FLUSH CACHE” (E7h). To check support for the command use:
[root@postgres ~]# smartctl --identify /dev/sdu | grep FLUSH CACHE
   83 13  1   FLUSH CACHE EXT supported
   83 12  1   FLUSH CACHE supported
   86 13  1   FLUSH CACHE EXT supported
   86 12  1   FLUSH CACHE supported

The 1s in the 3rd column represent SUPPORTED for the feature listed in the last 
column.


Right, to be precise, the problem isn't the drive lies about fsync(). It 
lies about FLUSH CACHE instead. Search  replace fsync() with FLUSH 
CACHE, and the same question remains: When the drive breaks its promise 
wrt. FLUSH CACHE, does it nevertheless guarantee that the order the data 
is eventually flushed to disk is consistent with the order in which the 
data and FLUSH CACHE were sent to the drive? That's an important 
distinction, because it makes the difference between the most recent 
data the application saved might be lost even though the FLUSH CACHE 
command returned and your filesystem is corrupt.


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server: SSD/RAID recommendations?

2015-07-07 Thread Heikki Linnakangas

On 07/07/2015 05:15 PM, Wes Vaske (wvaske) wrote:

The M500/M550/M600 are consumer class drives that don't have power
protection for all inflight data.* (like the Samsung 8x0 series and
the Intel 3x0  5x0 series).

The M500DC has full power protection for inflight data and is an
enterprise-class drive (like the Samsung 845DC or Intel S3500  S3700
series).

So any drive without the capacitors to protect inflight data will
suffer from data loss if you're using disk write cache and you pull
the power.


Wow, I would be pretty angry if I installed a SSD in my desktop, and it 
loses a file that I saved just before pulling the power plug.



*Big addendum: There are two issues on powerloss that will mess with
Postgres. Data Loss and Data Corruption. The micron consumer drives
will have power loss protection against Data Corruption and the
enterprise drive will have power loss protection against BOTH.

https://www.micron.com/~/media/documents/products/white-paper/wp_ssd_power_loss_protection.pdf

 The Data Corruption problem is only an issue in non-SLC NAND but
it's industry wide. And even though some drives will protect against
that, the protection of inflight data that's been fsync'd is more
important and should disqualify *any* consumer drives from *any*
company from consideration for use with Postgres.


So it lies about fsync()... The next question is, does it nevertheless 
enforce the correct ordering of persisting fsync'd data? If you write to 
file A and fsync it, then write to another file B and fsync it too, is 
it guaranteed that if B is persisted, A is as well? Because if it isn't, 
you can end up with filesystem (or database) corruption anyway.


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] optimization join on random value

2015-05-03 Thread Heikki Linnakangas

On 05/04/2015 12:23 AM, Anton Bushmelev wrote:

  Hello guru of postgres,  it's possoble to tune query with join on random
string ?
i know that it is not real life example, but i need it for tests.

soe=# explain
soe-#  SELECT   ADDRESS_ID,
soe-#   CUSTOMER_ID,
soe-#   DATE_CREATED,
soe-#   HOUSE_NO_OR_NAME,
soe-#   STREET_NAME,
soe-#   TOWN,
soe-#   COUNTY,
soe-#   COUNTRY,
soe-#   POST_CODE,
soe-#   ZIP_CODE
soe-# FROM ADDRESSES
soe-# WHERE customer_id = trunc( random()*45000) ;
 QUERY
PLAN
---
  Seq Scan on addresses  (cost=0.00..165714.00 rows=22500 width=84)
Filter: ((customer_id)::double precision = trunc((random() *
45000::double precision)))
(2 rows)



There are two problems here that prohibit the index from being used:

1. random() is volatile, so it's recalculated for each row.
2. For the comparison, customer_id is cast to a float, and the index is 
on the bigint value.


To work around the first problem, put the random() call inside a 
subquery. And for the second problem, cast to bigint.


SELECT ... FROM addresses
WHERE customer_id = (SELECT random()*45000)::bigint

- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Reverse Key Index

2015-02-26 Thread Heikki Linnakangas

On 02/26/2015 12:31 AM, Josh Berkus wrote:

On 02/14/2015 10:35 AM, Sven R. Kunze wrote:

Thanks for the immediate reply.

I understand the use case is quite limited.

On the other hand, I see potential when it comes to applications which
use PostgreSQL. There, programmers would have to change a lot of code to
tweak existing (and more importantly working) queries to hash/reverse an
id column first. Using ORMs would make this change even more painful and
maybe even impossible.

When reading
https://richardfoote.wordpress.com/2008/01/14/introduction-to-reverse-key-indexes-part-i/
carefully, it also seems to work with index scan partially in case of
equality comparisons.


Seems like a good use for SP-GiST.  Go for it!


A b-tree opclass that just compares from right-to-left would work just
as well, and perform better.

- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of Columns and Update

2014-12-22 Thread Heikki Linnakangas

On 12/22/2014 10:53 PM, Robert DiFalco wrote:

This may fall into the category of over-optimization but I've become
curious.

I have a user table with about 14 columns that are all 1:1 data - so they
can't be normalized.

When I insert a row all columns need to be set. But when I update, I
sometimes only update 1-2 columns at a time. Does the number of columns
impact update speed?

For example:
  UPDATE users SET email = ? WHERE id = ?;


Yes, the number of columns in the table matters. The update is just as 
expensive regardless of how many of the columns you update.


When a row is updated, PostgreSQL creates a new version of the whole 
row. The new row version takes more space when the table has more 
columns, leading to more bloating of the table, which generally slows 
things down. In most applications the difference isn't big enough to 
matter, but it can be significant if you have very wide rows, and you 
update a lot.


PostgreSQL 9.4 made an improvement on this. In earlier versions, the new 
row version was also included completely in the WAL record, which added 
overhead. In 9.4, any columns at the beginning or end of the row that 
are not modified are left out of the WAL record, as long as the new row 
version is stored on the same page as the old one (which is common). For 
updating a single column, or a few columns that are next to each other, 
that's the same as saying that only the modified part of the row is 
WAL-logged.



I can easily break this up into logical tables like user_profile,
user_credential, user_contact_info, user_summary, etc with each table only
having 1-4 columns. But with the multiple tables I would often be joining
them to bring back a collection of columns.


That would help with the above-mentioned issues, but dealing with 
multiple tables certainly adds a lot of overhead too. Most likely you're 
better off just having the single table, after all.


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Small performance regression in 9.2 has a big impact

2014-11-25 Thread Heikki Linnakangas

On 11/25/2014 10:36 PM, Scott Marlowe wrote:

OK so there's a simple set of tree functions we use at work. They're
quite fast in 8.4 and they've gotten about 40% slower in 9.2. They're
a simple mix of sql and plpgsql functions which are at
http://pastebin.com/SXTnNhd5 and which I've attached.

Here's a test query:

select tree_ancestor_keys('00010001');

According to explain analyze on both 8.4 and 9.2 they have the same
plan. However, on the same machine the query is about 40% slower on
9.2. Note we're not hitting the disks, or even buffers here. It's pure
in memory plpsql and sql that we're running.

explain analyze select tree_ancestor_keys('00010001')
from generate_series(1,1000);

On 8.4 runs in about 280 to 300 ms. (you can run it once and get the
same diff, it's just easier to see with the generate series forcing it
to run 1000 times to kind of even out the noise.)

On 9.2, same machine, clean fresh dbs etc, it runs in ~400 ms. And
that difference seems to be there on all plpgsql and sql functions.

In our application, these tree functions get called millions and
millions of times a day, and a 40% performance penalty is a pretty big
deal.

We're already using the trick of telling the query planner that this
function will return 1 row with alter function rows 1 etc. That helps
a lot but it doesn't fix this underlying performance issue.

Server versions are 8.4.22 (last I think) and 9.2.9.

If anyone has any suggestions I'd love to hear them.


I don't know why this regressed between those versions, but looking at 
the functions, there's some low-hanging fruit:


1. tree_ancestor_keys() could use UNION ALL instead of UNION. (I believe 
duplicates are expected here, although I'm not 100% sure).


2. tree_ancestor_keys() calculates tree_level($1) every time it 
recurses. Would be cheaper to calculate once, and pass it as argument.


Put together:

CREATE FUNCTION tree_ancestor_keys(bit varying, integer, integer) 
RETURNS SETOF bit varying

LANGUAGE sql IMMUTABLE STRICT
AS $_$
  select tree_ancestor_key($1, $2)
  union all
  select tree_ancestor_keys($1, $2 + 1, $3)
  where $2  $3
$_$;

CREATE or replace FUNCTION tree_ancestor_keys(bit varying, integer) 
RETURNS SETOF bit varying

LANGUAGE sql IMMUTABLE STRICT
AS $_$
  select tree_ancestor_keys($1, $2 + 1, tree_level($1))
$_$;

These changes make your test query go about 2x faster on my laptop, with 
git master. I'm sure you could optimize the functions further, but those 
at least seem like fairly safe and simple changes.


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] auto vaccum is dying

2014-10-02 Thread Heikki Linnakangas

On 10/02/2014 07:43 AM, Rodrigo Barboza wrote:

Hello, I have a table that receives lots of updates and inserts.
Auto vaccum is always being cancelled on that table.
One day the database went on standby and I had to act manually to recover.

What should I do to avoid auto vaccum cancel?


Cancellation happens when you run a command that requires an a stronger 
on the table, like ALTER or TRUNCATE. Plain UPDATEs or INSERTS will not 
cause cancellations. There must be something else going on, causing the 
cancellations.


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Which update action quicker?

2014-09-24 Thread Heikki Linnakangas

On 09/23/2014 11:37 PM, Emi Lu wrote:

Hello list,

For a big table with more than 1,000,000 records, may I know which update is
quicker please?

(1) update t1
set c1 = a.c1
from a
where pk and
   t1.c1a.c1;
   ..
update t1
set c_N = a.c_N
from a
where pk and
   t1.c_Na.c_N;


(2)  update t1
set c1 = a.c1 ,
  c2  = a.c2,
  ...
  c_N = a.c_N
   from a
   where pk AND
 (  t1.c1  a.c1 OR t1.c2  a.c2. t1.c_N  a.c_N)


Probably (2).  is not indexable, so each update will have to perform a 
sequential scan of the table. With (2), you only need to scan it once, 
with (1) you have to scan it N times. Also, method (1) will update the 
same row multiple times, if it needs to have more than one column updated.



Or other quicker way for update action?


If a large percentage of the table needs to be updated, it can be faster 
to create a new table, insert all the rows with the right values, drop 
the old table and rename the new one in its place. All in one transaction.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Aggregating tsqueries

2014-09-18 Thread Heikki Linnakangas

On 09/17/2014 07:56 AM, Alexander Hill wrote:

Hello,

I have a table of tree nodes with a tsquery column. To get a subtree's
tsquery, I need to OR all of its nodes' tsqueries together.

I defined a custom aggregate using tsquery_or:

 CREATE AGGREGATE tsquery_or_agg (tsquery)
 (
 sfunc = tsquery_or,
 stype = tsquery
 );

but I've found that

 tsquery_or_agg(query)

is about a hundred times slower than this:

 ('(' || string_agg(query::text, ')|(') || ')')::tsquery

That works perfectly so I'm happy to continue doing it, but I'm curious to
know why the difference is so great and if anything can be done about it?


string_agg's state transition function uses a buffer that's expanded as 
needed. At every step, the next string is appended to the buffer. Your 
custom aggregate is less efficient, because it constructs a new tsquery 
object at every step. In every step, a new tsquery object is allocated 
and the old result and the next source tsquery are copied to it. That's 
much more expensive.


If you're not shy of writing C code, you could write a more efficient 
version of tsquery_or_agg too, using a similar technique.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Heikki Linnakangas

On 06/29/2014 03:43 PM, Soni M wrote:

top and sar says 100% cpu usage of one core, no sign of I/O wait.


Hmm, I wonder what it's doing then... If you have perf installed on 
the system, you can do perf top to get a quick overlook of where the 
CPU time is spent.


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-30 Thread Heikki Linnakangas

On 06/30/2014 05:46 PM, Soni M wrote:

Here's what 'perf top' said on streaming replica :

Samples: 26K of event 'cpu-clock', Event count (approx.): 19781
  95.97%  postgres [.] 0x002210f3


Ok, so it's stuck doing something.. Can you get build with debug symbols 
installed, so that we could see the function name?

- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres Replaying WAL slowly

2014-06-29 Thread Heikki Linnakangas

On 06/29/2014 11:14 AM, Soni M wrote:

Everything works fine until on Thursday we have high load on master, and
after that every streaming replica lag further behind the master. Even on
night and weekend where all server load is low. But the slony slave is OK
at all.


What does 'top' on the standby say? Is the startup process using 100% of 
(one) CPU replaying records, or is it waiting for I/O? How large is the 
database, does it fit in RAM? Any clues in the system or PostgreSQL logs?


- Heikki



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner doesn't take indexes into account

2014-05-28 Thread Heikki Linnakangas

On 05/28/2014 12:59 PM, Grzegorz Olszewski wrote:

random_page_cost = 4.0
seq_page_cost = 1.0

There is about 500,000 rows and about 500 new rows each business day.

About 96% of rows meet given conditions, that is, count shoud be about 480,000.


When such a large percentage of the rows match, a sequential scan is 
indeed a better plan than an index scan. Sequential access is much 
faster than random access.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NFS, file system cache and shared_buffers

2014-05-27 Thread Heikki Linnakangas

On 05/27/2014 02:06 PM, Albe Laurenz wrote:

I just learned that NFS does not use a file system cache on the client side.

On the other hand, PostgreSQL relies on the file system cache for performance,
because beyond a certain amount of shared_buffers performance will suffer.

Together these things seem to indicate that you cannot get good performance
with a large database over NFS since you can leverage memory speed.

Now I wonder if there are any remedies (CacheFS?) and what experiences
people have made with the performance of large databases over NFS.


I have no personal experience with NFS, but sounds like a 
higher-than-usual shared_buffers value would be good.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Revisiting disk layout on ZFS systems

2014-04-28 Thread Heikki Linnakangas

On 04/28/2014 06:47 PM, Karl Denninger wrote:

What I am curious about, however, is the xlog -- that appears to suffer
pretty badly from 128k record size, although it compresses even
more-materially; 1.94x (!)

The files in the xlog directory are large (16MB each) and thus first
blush would be that having a larger record size for that storage area
would help.  It appears that instead it hurts.


The WAL is fsync'd frequently. My guess is that that causes a lot of 
extra work to repeatedly recompress the same data, or something like that.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Revisiting disk layout on ZFS systems

2014-04-28 Thread Heikki Linnakangas

On 04/28/2014 09:07 PM, Karl Denninger wrote:

The WAL is fsync'd frequently. My guess is that that causes a lot of
extra work to repeatedly recompress the same data, or something like
that.


It shouldn't as ZFS re-writes on change, and what's showing up is not
high I/O*count*  but rather percentage-busy, which implies lots of head
movement (that is, lots of sub-allocation unit writes.)


That sounds consistent frequent fsyncs.


Isn't WAL essentially sequential writes during normal operation?


Yes, it's totally sequential. But it's fsync'd at every commit, which 
means a lot of small writes.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] tsearch2, large data and indexes

2014-04-25 Thread Heikki Linnakangas

On 04/24/2014 10:57 PM, Sergey Konoplev wrote:

On Thu, Apr 24, 2014 at 5:34 AM, Ivan Voras ivo...@freebsd.org wrote:

On 24 April 2014 13:34, Heikki Linnakangas hlinnakan...@vmware.com wrote:


As the docs say, the GIN index does not store the weights. As such, there is
no need to strip them. A recheck would be necessary if your query needs the
weights, precisely because the weights are not included in the index.

(In the OP's query, it's the ranking that was causing the detoasting.)


Thanks!

My problem is that I actually need the ranking. My queries can return
a large number of documents (tens of thousands) but I usually need
only the first couple of pages of most relevant results (e.g. 50-100
records). With PostgreSQL and tsearch2, this means that the tens of
thousands of documents found via the index are then detoasted and
ranked.


Heikki, what about the GIN improvements part 3: ordering in index
patch, was it committed?

http://www.postgresql.org/message-id/flat/capphfduwvqv5b0xz1dzuqaw29erdculzp2wotfjzdbs7bhp...@mail.gmail.com


Nope, wasn't committed.

- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] tsearch2, large data and indexes

2014-04-24 Thread Heikki Linnakangas

On 04/24/2014 01:56 AM, Sergey Konoplev wrote:

On Wed, Apr 23, 2014 at 4:08 AM, Ivan Voras ivo...@freebsd.org wrote:

Ok, I found out what is happening, quoting from the documentation:

GIN indexes are not lossy for standard queries, but their performance
depends logarithmically on the number of unique words. (However, GIN
indexes store only the words (lexemes) oftsvector values, and not
their weight labels. Thus a table row recheck is needed when using a
query that involves weights.)

My query doesn't have weights but the tsvector in the table has them -
I take it this is what is meant by involves weights.

So... there's really no way for tsearch2 to produce results based on
the index alone, without recheck? This is... limiting.


My guess is that you could use strip() function [1] to get rid of
weights in your table or, that would probably be better, in your index
only by using expressions in it and in the query, eg.


As the docs say, the GIN index does not store the weights. As such, 
there is no need to strip them. A recheck would be necessary if your 
query needs the weights, precisely because the weights are not included 
in the index.


(In the OP's query, it's the ranking that was causing the detoasting.)

- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] tsearch2, large data and indexes

2014-04-23 Thread Heikki Linnakangas

On 04/22/2014 10:57 AM, Ivan Voras wrote:

On 22 April 2014 08:40, Heikki Linnakangas hlinnakan...@vmware.com wrote:

On 04/20/2014 02:15 AM, Ivan Voras wrote:

More details: after thinking about it some more, it might have
something to do with tsearch2 and indexes: the large data in this case
is a tsvector, indexed with GIN, and the query plan involves a
re-check condition.

The query is of the form:
SELECT simple_fields FROM table WHERE fts @@ to_tsquery('...').

Does the re-check condition mean that the original tsvector data is
always read from the table in addition to the index?


Yes, if the re-check condition involves the fts column. I don't see why you
would have a re-check condition with a query like that, though. Are there
some other WHERE-conditions that you didn't show us?


Yes, I've read about tsearch2 and GIN indexes and there shouldn't be a
recheck condition - but there is.
This is the query:

SELECT documents.id, title, raw_data, q, ts_rank(fts_data, q, 4) AS
rank, html_filename
 FROM documents, to_tsquery('document') AS q
 WHERE fts_data @@ q
  ORDER BY rank DESC  LIMIT 25;


It's the ranking that's causing the detoasting. ts_rank(fts_data, q, 
4) has to fetch the contents of the fts_data column.


Sorry, I was confused earlier: the Recheck Cond: line is always there 
in the EXPLAIN output of bitmap index scans, even if the recheck 
condition is never executed at runtime. It's because the executor has to 
be prepared to run the recheck-condition, if the bitmap grows large 
enough to become lossy, so that it only stores the page numbers of 
matching tuples, not the individual tuples


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Workaround: Planner preference for tsquery filter vs. GIN index in fast text search

2014-04-22 Thread Heikki Linnakangas

On 04/20/2014 07:46 AM, Oleg Bartunov wrote:

btw, 9.4 should be wiser in case of rare+common terms, thanks to GIN
fast scan feature.


Indeed, although we didn't actually do anything to the planner to make 
it understand when fast scan helps. Doing something about cost 
estimation is still on the 9.4 Open Items list, but I don't have any 
ideas on what to do about it, and I haven't heard anything from 
Alexander about that either. That means that the cost estimation issue 
Laurence saw is going to be even worse in 9.4, because GIN is going to 
be faster than a seq scan in more cases than before and the planner 
doesn't know about it.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] tsearch2, large data and indexes

2014-04-22 Thread Heikki Linnakangas

On 04/20/2014 02:15 AM, Ivan Voras wrote:

Hello,

If a table contains simple fields as well as large (hundreds of KiB)
text fields, will accessing only the simple fields cause the entire
record data, including the large fields, to be read and unpacked?
(e.g. SELECT int_field FROM table_with_large_text)


No.


More details: after thinking about it some more, it might have
something to do with tsearch2 and indexes: the large data in this case
is a tsvector, indexed with GIN, and the query plan involves a
re-check condition.

The query is of the form:
SELECT simple_fields FROM table WHERE fts @@ to_tsquery('...').

Does the re-check condition mean that the original tsvector data is
always read from the table in addition to the index?


Yes, if the re-check condition involves the fts column. I don't see why 
you would have a re-check condition with a query like that, though. Are 
there some other WHERE-conditions that you didn't show us?


The large fields are stored in the toast table. You can check if the 
toast table is accessed with a query like this:


select * from pg_stat_all_tables where relid = (select reltoastrelid 
from pg_class where relname='table');


Run that before and after your query, and see if the numbers change.

- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unneeded joins on view

2014-04-16 Thread Heikki Linnakangas

On 04/16/2014 06:13 PM, Linos wrote:

I thought that Postgresql would optimize out joins on columns I
don't ask for when I use the view but it doesn't, this query:


It doesn't, because it would be wrong. It still has to check that the 
tables have a matching row (or multiple matching rows).


If you use LEFT JOINs instead, and have a unique index on all the ID 
columns, then the planner can do what you expected and leave out the joins.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Getting query plan alternatives from query planner?

2014-04-14 Thread Heikki Linnakangas

On 04/14/2014 09:36 PM, Stefan Keller wrote:

Who's the elephant in the room who is reluctant to introduce explicit hints?


Please read some of the previous discussions on this. Like this, in this 
very same thread:


http://www.postgresql.org/message-id/15381.1395410...@sss.pgh.pa.us

I'd like to have explicit hints, *of the kind explained in that 
message*. Hints that tell the planner what the data distribution is 
like. Hints to override statistics and heuristics used by the planner.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Batch update query performance

2014-04-07 Thread Heikki Linnakangas

On 04/07/2014 03:06 PM, Albe Laurenz wrote:

Hans Drexler wrote:

Postgres needs close to 50
minutes to process the same query on the same data. Sometimes, Postgres
needs more than 2 hours.

The application performs an update query on every row
of the table. The exact SQL of this query is:

update t67cdi_nl_cmp_descr set is_grc_002='Y'


[...]


We tried removing all indexes. That reduces the runtime to ~3 minutes.
When we start to put indexes back, the run time of the query increases
again with each index added.


Do I read that right that the duration of the update is reduced from
50 or 120 minutes to 3 when you drop all the indexes?


If that's true, you might be able to drop and re-create the indexes as 
part of the same transaction, and come out ahead. DROP/CREATE INDEX is 
transactional in PostgreSQL, so you can do:


BEGIN;
DROP INDEX index1;
...
DROP INDEX index15;
UPDATE t67cdi_nl_cmp_descr SET is_grc_002='Y'
CREATE INDEX index1 ...;
...
CREATE INDEX index15 ...;
COMMIT;

This will take an AccessExclusiveLock on the table, though, so the table 
will be inaccessible to concurrent queries while it's running.


Actually, since you are effectively rewriting the table anyway, you 
could create a new table with same structure, insert all rows from the 
old table, with is_grc_002 set to 'Y', drop the old table, and rename 
the new table into its place.


Do all the rows really need to be updated? If some of the rows already 
have is_grc_002='Y', you can avoid rewriting those rows by adding a 
WHERE-clause: WHERE NOT is_grc_002='Y' OR is_grc_002 IS NULL.


You could also play tricks with partitioning. Don't store the is_grc_002 
row in the table at all. Instead, create two tables, one for the rows 
that implicitly have is_grc_002='Y' and another for all the other rows. 
Then create a view on the union of the two tables, which adds the 
is_grc_002 column. Instead of doing a full-table update, you can just 
alter the view to display is_grc_002='Y' for both tables (and add a new 
table to hold new rows with is_grc_002'Y').



Hypothesis
we have tried many things to solve this problem ourselves, but to no
avail so far. Our hypothesis is that
the Postgres creates new records for all rows and then needs to update
all 15 indexes to make them point to the new rows. There does not seem
to be a way to avoid that.

Question:
- Is our hypothesis correct?
- Can the forum please advise us on possible ways to make the query
faster?


Your hypothesis may be correct.


Yeah, sounds about right. A full-table UPDATE like that is pretty much 
the worst-case scenario for PostgreSQL's MVCC system, unfortunately.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSI slows down over time

2014-04-06 Thread Heikki Linnakangas

On 04/06/2014 05:25 AM, Ryan Johnson wrote:

I've tried linux perf, but all it says is that lots of time is going to
LWLock (but callgraph tracing doesn't work in my not-bleeding-edge
kernel).


Make sure you compile with the -fno-omit-frame-pointer flag.
- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PGSQL, checkpoints, and file system syncs

2014-04-03 Thread Heikki Linnakangas

On 04/03/2014 08:39 PM, Reza Taheri wrote:

Hello PGSQL performance community,
You might remember that I pinged you in July 2012 to introduce the TPC-V 
benchmark. I am now back with more data, and a question about checkpoints. As 
far as the plans for the benchmark, we are hoping to release a benchmarking kit 
for multi-VM servers this year (and of course one can always simply configure 
it to run on one database)

I am now dealing with a situation of performance dips when checkpoints 
complete. To simplify the discussion, I have reproduced the problem on a single 
VM/single database.

Complete config info is in the attached files. Briefly, it is a 6-vCPU VM with 
91G of memory, and 70GB in PGSQL shared buffers. The host has 512GB of memory 
and 4 sockets of Westmere (E7-4870) processors with HT enabled.

The data tablespace is on an ext4 file system on a (virtual) disk which is 
striped on 16 SSD drives in RAID 0. This is obviously overkill for the load we 
are putting on this one VM, but in the usual benchmarking config, the 16 SSDs 
are shared by 24 VMs. Log is on an ext3 file system on 4 spinning drives in 
RAID 1.

We are running PGSQL version 9.2 on RHEL 6.4; here are some parameters of 
interest (postgresql.conf is in the attachment):
checkpoint_segments = 1200
checkpoint_timeout = 360s
checkpoint_completion_target = 0.8
wal_sync_method = open_datasync
wal_buffers = 16MB
wal_writer_delay = 10ms
effective_io_concurrency = 10
effective_cache_size = 1024MB

When running tests, I noticed that when a checkpoint completes, we have a big 
burst of writes to the data disk. The log disk has a very steady write rate 
that is not affected by checkpoints except for the known phenomenon of more 
bytes in each log write when a new checkpoint period starts. In a multi-VM 
config with all VMs sharing the same data disks, when these write bursts 
happen, all VMs take a hit.

So I set out to see what causes this write burst.  After playing around with 
PGSQL parameters and observing its behavior, it appears that the bursts aren't 
produced by the database engine; they are produced by the file system. I 
suspect PGSQL has to issue a sync(2)/fsync(2)/sync_file_range(2) system call at 
the completion of the checkpoint to ensure that all blocks are flushed to disk 
before creating a checkpoint marker. To test this, I ran a loop to call sync(8) 
once a second.

The graphs in file run280.mht have the throughput, data disk activity, and checkpoint 
start/completion timestamps for the baseline case. You can see that the checkpoint completion, the 
write burst, and the throughput dip all occur at the same time, so much so that it is hard to see 
the checkpoint completion line under the graph of writes. It looks like the file system does a mini 
flush every 30 seconds. The file run274.mht is the case with sync commands running in 
the background. You can see that everything is more smooth.

Is there something I can set in the PGSQL parameters or in the file system parameters to 
force a steady flow of writes to disk rather than waiting for a sync system call? 
Mounting with commit=1 did not make a difference.


Try setting the vm.dirty_bytes sysctl. Something like 256MB might be a 
good starting point.


This comes up fairly often, see e.g.: 
http://www.postgresql.org/message-id/flat/27c32fd4-0142-44fe-8488-9f366dc75...@mr-paradox.net


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Suboptimal query plan when using expensive BCRYPT functions

2014-03-24 Thread Heikki Linnakangas

On 03/22/2014 02:59 AM, Erik van Zijst wrote:

Is there any way I can get postgres to perform the hash calculations
on the *result* of the other parts of the where clause, instead of the
other way around? Or else rewrite the query?


The planner doesn't know that the crypt function is expensive. That can 
be fixed with ALTER FUNCTION crypt(text, text) COST high value. Even 
with that, I'm not sure if the planner is smart enough to optimize the 
query the way you'd want, but it's worth a try.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] not using my GIN index in JOIN expression

2014-02-27 Thread Heikki Linnakangas

On 02/27/2014 04:06 PM, Jean-Max Reymond wrote:

I am running the last version of PostgreSQL 9.3.3
I have two tables detm and corm and a lot of datas in the column
cormdata of corm table (1.4 GB).

I have a GIN index on cormdata:
CREATE INDEX ix_corm_fulltext_cormdata  ON corm
USING gin (to_tsvector('french'::regconfig, cormdata))
WHERE cormishtml IS FALSE AND length(cormdata)  2;

select distinct b.detmmailid from corm b where
(to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and
b.cormishtml is false and length(b.cormdata)  2)
is very fast and use the GIN index.

HashAggregate  (cost=2027.72..2031.00 rows=328 width=52)
  -  Bitmap Heap Scan on corm b  (cost=24.25..2026.35 rows=548 width=52)
Recheck Cond: ((to_tsvector('french'::regconfig, cormdata) @@
to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE) AND
(length(cormdata)  2))
-  Bitmap Index Scan on ix_corm_fulltext_cormdata
(cost=0.00..24.11 rows=548 width=0)
  Index Cond: (to_tsvector('french'::regconfig, cormdata)
@@ to_tsquery('mauritanie'::text))


With a join an another table detm, GIN index is not used


   explain select distinct a.detmmailid from detm a  JOIN corm b on
a.detmmailid = b.detmmailid  where ((to_tsvector('french',b.cormdata) @@
to_tsquery('mauritanie') and b.cormishtml is false and
length(b.cormdata)  2)  OR ( detmobjet ~* 'mauritanie' ))

HashAggregate  (cost=172418.27..172423.98 rows=571 width=52)
  -  Hash Join  (cost=28514.92..172416.85 rows=571 width=52)
Hash Cond: (b.detmmailid = a.detmmailid)
Join Filter: (((to_tsvector('french'::regconfig, b.cormdata) @@
to_tsquery('mauritanie'::text)) AND (b.cormishtml IS FALSE) AND
(length(b.cormdata)  2)) OR (a.detmobjet ~* 'mauritanie'::text))
-  Seq Scan on corm b  (cost=0.00..44755.07 rows=449507
width=689)
-  Hash  (cost=19322.74..19322.74 rows=338574 width=94)
  -  Seq Scan on detm a  (cost=0.00..19322.74 rows=338574
width=94)


If I remove   OR ( detmobjet ~* 'mauritanie' ) in the select, the GIN
index is used
   explain select distinct a.detmmailid from detm a  JOIN corm b on
a.detmmailid = b.detmmailid  where ((to_tsvector('french',b.cormdata) @@
to_tsquery('mauritanie') and b.cormishtml is false and
length(b.cormdata)  2))

HashAggregate  (cost=4295.69..4301.17 rows=548 width=52)
  -  Nested Loop  (cost=24.67..4294.32 rows=548 width=52)
-  Bitmap Heap Scan on corm b  (cost=24.25..2026.35 rows=548
width=52)
  Recheck Cond: ((to_tsvector('french'::regconfig,
cormdata) @@ to_tsquery('mauritanie'::text)) AND (cormishtml IS FALSE)
AND (length(cormdata)  2))
  -  Bitmap Index Scan on ix_corm_fulltext_cormdata
(cost=0.00..24.11 rows=548 width=0)
Index Cond: (to_tsvector('french'::regconfig,
cormdata) @@ to_tsquery('mauritanie'::text))
-  Index Only Scan using pkey_detm on detm a  (cost=0.42..4.13
rows=1 width=52)
  Index Cond: (detmmailid = b.detmmailid)

How can i force the use of the GIN index ?
thanks for your tips,


The problem with the OR detmobject ~* 'mauritanie' restriction is that 
the rows that match that condition cannot be found using the GIN index. 
I think you'd want the system to fetch all the rows that match the other 
condition using the GIN index, and do something else to find the other 
rows. The planner should be able to do that if you rewrite the query as 
a UNION:


select a.detmmailid from detm a JOIN corm b on
a.detmmailid = b.detmmailid
where (to_tsvector('french',b.cormdata) @@ to_tsquery('mauritanie') and 
b.cormishtml is false and length(b.cormdata)  2)

union
select a.detmmailid from detm a  JOIN corm b on
a.detmmailid = b.detmmailid
where detmobjet ~* 'mauritanie'

Note that that will not return rows in 'detm' that have no matching rows 
in 'corm' table, even if they match the detmobjet ~* 'mauritanie 
condition. That's what your original query also did, but if that's not 
what you want, leave out the JOIN from the second part of the union.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DB size and TABLE sizes don't seem to add up

2014-02-18 Thread Heikki Linnakangas

On 02/18/2014 12:14 AM, David Wall wrote:

I am running PG 9.2.4 and I am trying to figure out why my database size
shows one value, but the sum of my total relation sizes is so much less.

Basically, I'm told my database is 188MB, but the sum of my total
relation sizes adds up to just 8.7MB, which is 1/20th of the reported
total.  Where is the 19/20th of my data then?  We do make significant
use of large objects, so I suspect it's in there.  Is there a relation
size query that would include the large object data associated with any
OIDs in those tables?


You can use select pg_total_relation_size('pg_largeobject') to get the 
total size of the large objects. Attributing large objects to the tables 
that refer them is more difficult. For a single table, something like this:


select sum(pg_column_size(lo.data))
from lotest_stash_values t, pg_largeobject lo
where lo.loid = t.loid;

Replace lotest_stash_values with the table's name and lo.loid with the 
name of the OID column.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Strange performance boost with random()

2014-02-10 Thread Heikki Linnakangas

On 02/10/2014 09:52 PM, M Putz wrote:


Hello,

While analyzing performance, we encountered the following phenomenon,

SELECT sum(pow(.5*generate_series,.5))
FROM generate_series(1,100);

is much much (a hundred times) slower than

SELECT sum(pow(random()*generate_series,.5))
FROM generate_series(1,100);

and asymptotic difference is even more astounding.
This seems counter-intuitive, considering the cost of
an additional random() call instead of a constant factor.
What are the reasons for this strange performance boost?


Different data type. The first uses numeric, which is pretty slow for 
doing calculations. random() returns a double, which makes the pow and 
sum to also use double, which is a lot faster.


To see the effect, try these variants:

SELECT sum(pow(.5::float8 * generate_series,.5))
FROM generate_series(1,100);

SELECT sum(pow(random()::numeric * generate_series,.5))
FROM generate_series(1,100);

- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Increasing query time after updates

2014-01-21 Thread Heikki Linnakangas

On 01/21/2014 08:26 AM, Katharina Koobs wrote:

Hi,

We have a PostgreSQL DB, version 8.4 on a Suse Linux system.
Every night a script runs with several updates and inserts. The query time
at day increases after
approximately 3 weeks from a few minutes to about an hour.


Does it get gradually slower every day, or suddenly jump from few 
minutes to one hour after three weeks? The former would suggest some 
kind of bloating or fragmentation, while the latter would suggest a 
change in a query plan (possibly still caused by bloating).


Does the database size change over time?


After export, drop and import the DB the query time is again at a few
minutes.

We have tested vacuum full, vacuum analyze and reindex and get no
improvement.

Has anyone an idea why the queries are getting slower and slower?


One theory is that the tables are initially more or less ordered by one 
column, but get gradually shuffled by the updates. Exporting and 
importing would load the data back in order. However, a blow to that 
theory is that a pg_dump + reload will load the tuples in roughly the 
same physical order, but perhaps you used something else for the 
export+import.


You could try running CLUSTER on any large tables. Since version 9.0, 
VACUUM FULL does more or less the same as CLUSTER, ie. rewrites the 
whole table, but in 8.4 it's different.



Thank you so much for your help!


The DB configuration:

Virtual server, 7GB RAM, DB size = 16GB

shared_buffers = 1024MB
temp_buffers = 32MB
work_mem = 8MB
checkpoint_segments = 20
effective_cache_size = 512MB
max_locks_per_transaction = 256


With 7GB of RAM, you might want to raise effective_cache_size to 
something like 4GB. It doesn't allocate anything, but tells PostgreSQL 
how much memory it can expect the operating system to use as buffer 
cache, which can influence query plans. I doubt it makes any difference 
for the problem you're seeing, but just as general advice..


8.4 is quite old by now, and will no longer be supported by the 
community after July 2014. You'll have to upgrade pretty soon anyway, so 
you might as well upgrade now and see if it helps.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-25 Thread Heikki Linnakangas

On 25.11.2013 22:01, Lee Nguyen wrote:

Hi,

Having attended a few PGCons, I've always heard the remark from a few
presenters and attendees that Postgres shouldn't be run inside a VM. That
bare metal is the only way to go.

Here at work we were entertaining the idea of running our Postgres database
on our VM farm alongside our application vm's.  We are planning to run a
few Postgres synchronous replication nodes.

Why shouldn't we run Postgres in a VM?  What are the downsides? Does anyone
have any metrics or benchmarks with the latest Postgres?


I've also heard people say that they've seen PostgreSQL to perform worse 
in a VM. In the performance testing that we've done in VMware, though, 
we haven't seen any big impact. So I guess the answer is that it depends 
on the specific configuration of CPU, memory, disks and the software. 
Synchronous replication is likely going to be the biggest bottleneck by 
far, unless it's mostly read-only. I don't know if virtualization will 
have a measurable impact on network latency, which is what matters for 
synchronous replication.


So, I'd suggest that you try it yourself, and see how it performs. And 
please report back to the list, I'd also love to see some numbers!


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index on a range array

2013-08-15 Thread Heikki Linnakangas

On 13.08.2013 23:47, Daniel Cristian Cruz wrote:

Hello,

I'm trying to simplify a schema, where I had many ranges floating around.
My idea is to put them all in an array field and query like this:

SELECT
  event.*
FROM event
JOIN participant_details
  USING (participant_id)
WHERE
  tsrange(event.start, event.end)  ANY (participant_details.periods);

periods is tsrange[].

I've tryed and it worked, but without indexes. I've tried something, but
didn't found anything... Does someone know how to index this kind of field
(tsrange[])?

 From the docs I learn that there is some GIST magic, but I would need to
code in C. Is that true?


Yeah. It might be somewhat tricky to write an efficient GIST 
implementation for this anyway. What you'd really want to do is to index 
each value in the array separately, which is more like what GIN does. 
With the partial match infrastructure in GIN, it might be possible to 
write a GIN implementation that can speed up range overlap queries. 
However, that certainly requires C coding too.


A couple of alternatives come to mind:

You could create the index on just the min and max values of the 
periods, and in the query check for overlap with that. If there 
typically aren't big gaps between the periods of each participant, that 
might work well.


Or you could split the range of expected timestamps into discrete steps, 
for example at one-day granularity. Create a function to convert a range 
into an array of steps, e.g convert each range into an array of days 
that the range overlaps with. Create a GIN index on that array, and use 
it in the query. Something like this:


-- Returns an int representing the day the given timestamp falls into
create function epochday(timestamp) returns int4 as $$
  select extract (epoch from $1)::int4/(24*3600)
$$ language sql immutable;

-- Same for a range. Returns an array of ints representing all the
-- days that the given range overlaps with.
create function epochdays(tsrange) returns integer[]
as $$
  select array_agg(g) from generate_series(epochday(lower($1)), 
epochday(upper($1))) g

$$
language sql immutable;

-- Same for an array of ranges. Returns an array of ints representing -- 
all the days that overlap with any of the given timestamp ranges

create function epochdays(ranges tsrange[]) returns integer[]
as $$
declare
  r tsrange;
  result integer[];
begin
  foreach r in array ranges loop
result = result || (select array_agg(g) from 
generate_series(epochday(lower(r)), epochday(upper(r))) g);

  end loop;
  return result;
end;
$$ language plpgsql immutable;

-- Create the index on that:
create index period_days on participant_details using gin 
(epochdays(periods));


-- Query like this:
SELECT event.* FROM event
JOIN participant_details USING (participant_id)
-- This WHERE-clause is for correctness:
WHERE tsrange(event.start, event.end)   ANY (participant_details.periods);
-- and this is to make use of the index:
AND epochdays(tsrange(event.start, event.end))  
epochdays((participant_details.periods));


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-06-16 Thread Heikki Linnakangas

On 03.05.2013 15:41, Simon Riggs wrote:

What appears to be happening is we're spending a lot of time in
TransactionIdIsInProgress() so we can set hints and then when we find
it is still in progress we then spend more time in XidIsInSnapshot()
while we check that it is still invisible to us. Even if the
transaction we see repeatedly ends, we will still pay the cost in
XidIsInSnapshot repeatedly as we execute.

Given that code path, I would expect it to suck worse on a live system
with many sessions, and even worse with many subtransactions.

(1) A proposed fix is attached, but its only a partial one and barely tested.

Deeper fixes might be

(2)  to sort the xid array if we call XidIsInSnapshot too many times
in a transaction. I don't think that is worth it, because a long
running snapshot may be examined many times, but is unlikely to see
multiple in-progress xids repeatedly. Whereas your case seems
reasonably common.


Yeah, sorting would be a waste of time most of the time.

Instead of adding a new cache field, how about just swapping the matched 
XID to the beginning of the array?


Did you have some simple performance test script for this?

- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-06-16 Thread Heikki Linnakangas

On 06.05.2013 04:51, Mark Kirkwood wrote:

On 05/05/13 00:49, Simon Riggs wrote:

On 3 May 2013 13:41, Simon Riggs si...@2ndquadrant.com wrote:


(3) to make the check on TransactionIdIsInProgress() into a heuristic,
since we don't *need* to check that, so if we keep checking the same
xid repeatedly we can reduce the number of checks or avoid xids that
seem to be long running. That's slightly more coding than my quick
hack here but seems worth it.

I think we need both (1) and (3) but the attached patch does just (1).

This is a similar optimisation to the one I introduced for
TransactionIdIsKnownCompleted(), except this applies to repeated
checking of as yet-incomplete xids, and to bulk concurrent
transactions.


ISTM we can improve performance of TransactionIdIsInProgress() by
caching the procno of our last xid.

Mark, could you retest with both these patches? Thanks.



Thanks Simon, will do and report back.


Did anyone ever try (3) ?

I'm not sure if this the same idea as (3) above, but ISTM that 
HeapTupleSatisfiesMVCC doesn't actually need to call 
TransactionIdIsInProgress(), because it checks XidInMVCCSnapshot(). The 
comment at the top of tqual.c says:



 * NOTE: must check TransactionIdIsInProgress (which looks in PGXACT array)
 * before TransactionIdDidCommit/TransactionIdDidAbort (which look in
 * pg_clog).  Otherwise we have a race condition: we might decide that a
 * just-committed transaction crashed, because none of the tests succeed.
 * xact.c is careful to record commit/abort in pg_clog before it unsets
 * MyPgXact-xid in PGXACT array.  That fixes that problem, but it also
 * means there is a window where TransactionIdIsInProgress and
 * TransactionIdDidCommit will both return true.  If we check only
 * TransactionIdDidCommit, we could consider a tuple committed when a
 * later GetSnapshotData call will still think the originating transaction
 * is in progress, which leads to application-level inconsistency.  The
 * upshot is that we gotta check TransactionIdIsInProgress first in all
 * code paths, except for a few cases where we are looking at
 * subtransactions of our own main transaction and so there can't be any
 * race condition.


If TransactionIdIsInProgress() returns true for a given XID, then surely 
it was also running when the snapshot was taken (or had not even began 
yet). In which case the XidInMVCCSnapshot() call will also return true. 
Am I missing something?


There's one little problem: we currently only set the hint bits when 
TransactionIdIsInProgress() returns false. If we do that earlier, then 
even though HeapTupleSatisfiesMVCC works correctly thanks to the 
XidInMVCCSnapshot call, other HeapTupleSatisfies* functions that don't 
call XIdInMVCCSnapshot might see the tuple as committed or aborted too 
early, if they see the hint bit as set while the transaction is still 
in-progress according to the proc array. Would have to check all the 
callers of those other HeapTupleSatisfies* functions to verify if that's OK.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Check Pointer

2013-05-30 Thread Heikki Linnakangas

On 30.05.2013 15:09, itishree sukla wrote:

In our server Check pointer process is consuming 8 GB of memory, what could
be the possible reason? Can any one please help.


Are you sure you're measuring the memory correctly? The RES field in top 
output, for example, includes shared memory, ie. the whole buffer cache. 
Shared memory isn't really consumed by the checkpointer process, but 
shared by all postgres processes.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Thinking About Correlated Columns (again)

2013-05-15 Thread Heikki Linnakangas

On 15.05.2013 18:31, Shaun Thomas wrote:

I've seen conversations on this since at least 2005. There were even
proposed patches every once in a while, but never any consensus. Anyone
care to comment?


Well, as you said, there has never been any consensus.

There are basically two pieces to the puzzle:

1. What metric do you use to represent correlation between columns?

2. How do use collect that statistic?

Based on the prior discussions, collecting the stats seems to be tricky. 
It's not clear for which combinations of columns it should be collected 
(all possible combinations? That explodes quickly...), or how it can be 
collected without scanning the whole table.


I think it would be pretty straightforward to use such a statistic, once 
we have it. So perhaps we should get started by allowing the DBA to set 
a correlation metric manually, and use that in the planner.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Predicate information in EXPLAIN Command

2013-05-14 Thread Heikki Linnakangas

On 14.05.2013 12:23, Sameer Thakur wrote:

Hello,
I am trying to find predicate information for a given SQL query plan as
provided by Oracle using DBMS_XPLAN. I am looking at the EXPLAIN command
for getting this query plan information, with no luck so far.

Does the EXPLAIN command provide predicate information?


Sure. For example,

postgres=# explain select * from a where id = 123;
QUERY PLAN
---
 Seq Scan on a  (cost=0.00..40.00 rows=12 width=4)
   Filter: (id = 123)
(2 rows)

The predicate is right there on the Filter line. Likewise for a join:

postgres=# explain select * from a, b where a.id = b.id;
   QUERY PLAN
-
 Hash Join  (cost=64.00..134.00 rows=2400 width=8)
   Hash Cond: (a.id = b.id)
   -  Seq Scan on a  (cost=0.00..34.00 rows=2400 width=4)
   -  Hash  (cost=34.00..34.00 rows=2400 width=4)
 -  Seq Scan on b  (cost=0.00..34.00 rows=2400 width=4)
(5 rows)

The join predicate is on the Hash Cond line.

- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [PERFORM] Question about network bandwidth usage between PostgreSQL’s client and server

2013-04-25 Thread Heikki Linnakangas

On 25.04.2013 02:56, Kelphet Xiong wrote:

In all the experiments, the lineitem and partsupp tables reside in memory
because there is no io activities observed from iotop.
Since there is enough network bandwidth (1Gb/s or 128MB/s) between
client and server,
I would like to know what determines the data transferring rate or the
network bandwidth usage
between a client and a server when network bandwidth is enough.


Since there's enough network bandwidth available, the bottleneck is 
elsewhere. I don't know what it is in your example - maybe it's the I/O 
capacity, or CPU required to process the result in the server before 
it's sent over the network. It could also be in the client, on how fast 
it can process the results coming from the server.


I'd suggest running 'top' on the server while the query is executed, and 
keeping an eye on the CPU usage. If it's pegged at 100%, the bottleneck 
is the server's CPU.



For example, given that the size of each tuple of lineitem table is
88% of that of partsupp,
why the average network usage for sequential scan of lineitem table is only 50%
that of partsupp table? And why the average network usage of their
join is higher
than that of sequential scan of lineitem but lower than that of
sequential scan of partsupp table?


Here's a wild guess: the query on lineitem is bottlenecked by CPU usage 
in the server. A lot of CPU could be spent on converting the date fields 
from on-disk format to the text representation that's sent over the 
network; I've seen that conversion use up a lot of CPU time on some test 
workloads. Try leaving out the date columns from the query to test that 
theory.


If that's the bottleneck, you could try fetching the result in binary 
format, that should consume less CPU in the server. You didn't mention 
what client library you're using, but e.g with libpq, see the manual on 
PQexecParams on how to set the result format.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance with the new security release

2013-04-22 Thread Heikki Linnakangas

On 22.04.2013 19:48, Anne Rosset wrote:

Hi,
We are seeing some overall performance degradation in our application since we 
installed the security release. Other commits were also done at the same time 
in the application so we don't know yet if the degradation has any relationship 
with the security release.
While we are digging into this, I would like to know if it is possible that the release 
has some impact on performance. After reading this It was created as a side effect 
of a refactoring effort to make establishing new connections to a PostgreSQL server 
faster, and the associated code more maintainable., I am thinking it is quite 
possible.


I doubt that particular commit, the one that fixed the security issue, 
could cause any meaningful slowdown. But it's not impossible that some 
other fix included in the release would cause a regression, although we 
try to be careful to avoid that. If you narrow the culprit down to the 
new PostgreSQL version, we're going to need more details to find the 
root cause.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index usage for tstzrange?

2013-03-21 Thread Heikki Linnakangas

On 21.03.2013 06:07, Vasilis Ventirozos wrote:

On Thu, Mar 21, 2013 at 5:58 AM, Tom Lanet...@sss.pgh.pa.us  wrote:

What I find more disturbing is that this is what I get from the example
in HEAD:

regression=# explain SELECT * FROM a WHERE ts@
tstzrange('2013-01-01','2013-01-01 00:10:00');
ERROR:  XX000: type 1184 is not a range type
LOCATION:  range_get_typcache, rangetypes.c:1451

Haven't traced through it to determine exactly what's happening, but
isn't this a legitimate usage?  And if it isn't, surely a more
user-facing error ought to be getting thrown somewhere upstream of here.


It is a legit usage, this is from a test i did myself (9.2.3)

test=# explain SELECT * FROM a WHERE ts@
tstzrange('2013-01-01','2013-04-01 00:10:00');
  QUERY PLAN

  Seq Scan on a  (cost=0.00..23.75 rows=1 width=44)
Filter: (ts@ '[2013-01-01 00:00:00+02,2013-04-01
00:10:00+03)'::tstzrange)


Looks like the range type cost estimation patch broke this, back in 
August already. The case of var @ constant, where constant is a range 
and var is an element, that's broken. The cost estimation function, 
rangesel(), incorrectly assumes that the 'var' is always a range type.


It's a bit worrying that no-one noticed until now. I'll add a test for 
that operator to the rangetypes regression test.


The immediate fix is attached, but this made me realize that rangesel() 
is still missing estimation for the element @ range operator. It 
shouldn't be hard to implement, I'm pretty sure we have all the 
statistics we need for that.


- Heikki
diff --git a/src/backend/utils/adt/rangetypes_selfuncs.c b/src/backend/utils/adt/rangetypes_selfuncs.c
index 76dc913..c450c6a 100644
--- a/src/backend/utils/adt/rangetypes_selfuncs.c
+++ b/src/backend/utils/adt/rangetypes_selfuncs.c
@@ -154,8 +154,6 @@ rangesel(PG_FUNCTION_ARGS)
 		}
 	}
 
-	typcache = range_get_typcache(fcinfo, vardata.vartype);
-
 	/*
 	 * OK, there's a Var and a Const we're dealing with here.  We need the
 	 * Const to be of same range type as the column, else we can't do anything
@@ -169,6 +167,8 @@ rangesel(PG_FUNCTION_ARGS)
 	 */
 	if (operator == OID_RANGE_CONTAINS_ELEM_OP)
 	{
+		typcache = range_get_typcache(fcinfo, vardata.vartype);
+
 		if (((Const *) other)-consttype == typcache-rngelemtype-type_id)
 		{
 			RangeBound lower, upper;
@@ -185,6 +185,8 @@ rangesel(PG_FUNCTION_ARGS)
 	}
 	else
 	{
+		typcache = range_get_typcache(fcinfo, ((Const *) other)-consttype);
+
 		if (((Const *) other)-consttype == vardata.vartype)
 			constrange = DatumGetRangeType(((Const *) other)-constvalue);
 	}

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BUG: endless lseek(.., SEEK_END) from select queries on x64 builds

2013-02-23 Thread Heikki Linnakangas

On 22.02.2013 20:10, Markus Schulz wrote:

Am Freitag, 22. Februar 2013, 14:35:25 schrieb Heikki Linnakangas:

You could check what the generic plan looks like by taking the query
used in the java program, with the parameter markers, and running
EXPLAIN on that.


how can i do this?
I've tried the following in my ejb-test-function to:

String query = ...
entitymanager.createNativeQuery(query)...;
entitymanager.createNativeQuery(EXPLAIN ANALYZE  + query)...;

but the second createNativeQuery call runs fast every time and will show the
same plan and the first hangs after the fourth call to this function.


You can take the query, replace the ? parameter markers with $1, $2, and 
so forth, and explain it with psql like this:


prepare foo (text) as select * from mytable where id = $1;
explain analyze execute foo ('foo');

On 9.2, though, this will explain the specific plan for those 
parameters, so it might not be any different from what you already 
EXPLAINed.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BUG: endless lseek(.., SEEK_END) from select queries on x64 builds

2013-02-22 Thread Heikki Linnakangas

On 22.02.2013 10:25, Markus Schulz wrote:

i can run the query four times with good performance and after that postgresql
starts with the strange lseek() behavior.


By default, the JDBC driver re-plans the prepared statement for the 
first 4 invocations of the query. On the fifth invocation, it switches 
to using a generic plan, which will be reused on subsequent invocations. 
See http://jdbc.postgresql.org/documentation/head/server-prepare.html. 
The generic plan seems to perform much worse in this case. You can 
disable that mechanism and force re-planning the query every time by 
setting the prepareThreshold=0 parameter on the data source.


You could check what the generic plan looks like by taking the query 
used in the java program, with the parameter markers, and running 
EXPLAIN on that.


PostgreSQL version 9.2 might work better in this case. It has some 
smarts in the server to generate parameter-specific plans even when 
prepared statements are used, if the planner thinks a specific plan will 
be faster.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG_XLOG 27028 files running out of space

2013-02-14 Thread Heikki Linnakangas

On 14.02.2013 12:49, Tory M Blue wrote:

My postgres db ran out of space. I have 27028 files in the pg_xlog
directory. I'm unclear what happened this has been running flawless for
years. I do have archiving turned on and run an archive command every 10
minutes.

I'm not sure how to go about cleaning this up, I got the DB back up, but
I've only got 6gb free on this drive and it's going to blow up, if I can't
relieve some of the stress from this directory over 220gb.

What are my options?


You'll need to delete some of the oldest xlog files to release disk 
space. But first you need to make sure you don't delete any files that 
are still needed, and what got you into this situation in the first place.


You say that you run an archive command every 10 minutes. What do you 
mean by that? archive_command specified in postgresql.conf is executed 
automatically by the system, so you don't need to and should not run 
that manually. After archive_command has run successfully, and the 
system doesn't need the WAL file for recovery anymore (ie. after the 
next checkpoint), the system will delete the archived file to release 
disk space. Clearly that hasn't been working in your system for some 
reason. If archive_command doesn't succeed, ie. it returns a non-zero 
return code, the system will keep retrying forever until it succeeds, 
without deleting the file. Have you checked the logs for any 
archive_command errors?


To get out of the immediate trouble, run pg_controldata, and make note 
of this line:


Latest checkpoint's REDO WAL file:00010001

Anything older than that file is not needed for recovery. You can delete 
those, if you have them safely archived.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Query Help

2013-02-05 Thread Heikki Linnakangas

On 05.02.2013 05:45, Will Platnick wrote:

We upgraded from PG 9.1 to 9.2. Since the upgrade, the # of active queries has 
raised significantly, especially during our peak time where lots of users are 
logging in.  According to New Relic, this query is now taking up the most 
amount of time during peak activity and my pg_stat_activity and slow log 
sampling agrees. We have 3 DB servers referenced here, production running 
9.2.2, semi-idle (idle except for replication when I ran the test) running 
9.2.2, and 9.1.3 completely idle with an old dump restored.


The only thing that stands out is that it always checks both indexes for 
matches. Since you only want a single row as a result, it seems like it 
would be better to first check one index, and only check the other one 
if there's no match. Rewriting the query with UNION should do that:


SELECT id, username, password, email, verified, timezone FROM users 
WHERE lower(username) = 'randomuser'

UNION ALL
SELECT id, username, password, email, verified, timezone FROM users 
WHERE lower(email) = 'randomuser'

LIMIT 1;

Also, if you can assume that email addresses always contain the 
@-character, you could take advantage of that and only do the 
lower(email) = 'randomuser' search if there is one.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analyze and default_statistics_target

2013-01-21 Thread Heikki Linnakangas

On 21.01.2013 17:29, AJ Weber wrote:

I was under the impression that the default_statistics_target was a
percentage of rows to analyze. Maybe this is not the case?


Nope.


I ran an analyze during a quiet point last night and for a few of my
large tables, I didn't get what I consider a reasonable sampling of
rows. When running with verbose enabled, it appeared that a maximum of
24 rows were being analyzed, including on tables exceeding 4-8mm
rows. My default_statistics_target = 80.

Shouldn't I be analyzing a larger percentage of these big tables?


Analyze only needs a fairly small random sample of the rows in the table 
to get a picture of what the data looks like. Compare with e.g opinion 
polls; you only need to sample a few thousand people to get a result 
with reasonable error bound.


That's for estimating the histogram. Estimating ndistinct is a different 
story, and it's well-known that the estimates of ndistinct are sometimes 
wildly wrong.



What is the unit-of-measure used for default_statistics_target?


It's the number of entries stored in the histogram and 
most-common-values list in pg_statistics.


See also http://www.postgresql.org/docs/devel/static/planner-stats.html:

The amount of information stored in pg_statistic by ANALYZE, in 
particular the maximum number of entries in the most_common_vals and 
histogram_bounds arrays for each column, can be set on a 
column-by-column basis using the ALTER TABLE SET STATISTICS command, or 
globally by setting the default_statistics_target configuration 
variable. The default limit is presently 100 entries.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partition insert trigger using C language

2013-01-11 Thread Heikki Linnakangas

On 10.01.2013 21:48, Matheus de Oliveira wrote:

I have made a small modification to keep the plans, and it got from
33957.768ms to 43782.376ms.


If I'm reading results.txt correctly, the avg runtimes are:

C and SPI_execute_with_args: 58567.708 ms
C and SPI_(prepare/keepplan/execute_plan): 43782.376 ms
C and heap_insert: 33957.768 ms

So switching to prepared plans helped quite a lot, but it's still slower 
than direct heap_inserts.


One thing that caught my eye:


CREATE OR REPLACE FUNCTION partition_insert_trigger_spi()
RETURNS trigger
LANGUAGE C
VOLATILE STRICT
AS 'partition_insert_trigger_spi','partition_insert_trigger_spi'
SET DateStyle TO 'ISO';


Calling a function with SET options has a fair amount of overhead, to 
set/restore the GUC on every invocation. That should be avoided in a 
performance critical function like this.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partition insert trigger using C language

2013-01-10 Thread Heikki Linnakangas

On 10.01.2013 20:45, Matheus de Oliveira wrote:

Inspired by Charles' thread and the work of Emmanuel [1], I have made some
experiments trying to create a trigger to make partitioning using C
language.

The first attempt was not good, I tried to use SPI [2] to create a query to
insert into the correct child table, but it took almost no improvement
compared with the PL/pgSQL code.


The right way to do this with SPI is to prepare each insert-statement on 
first invocation (SPI_prepare + SPI_keepplan), and reuse the plan after 
that (SPI_execute_with_args).


If you construct and plan the query on every invocation, it's not 
surprising that it's no different from PL/pgSQL performance.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partition insert trigger using C language

2013-01-10 Thread Heikki Linnakangas

On 10.01.2013 21:11, Matheus de Oliveira wrote:

On Thu, Jan 10, 2013 at 4:54 PM, Heikki Linnakangashlinnakan...@vmware.com

wrote:



The right way to do this with SPI is to prepare each insert-statement on
first invocation (SPI_prepare + SPI_keepplan), and reuse the plan after
that (SPI_execute_with_args).

If you construct and plan the query on every invocation, it's not
surprising that it's no different from PL/pgSQL performance.


Yeah. I thought about that, but the problem was that I assumed the INSERTs
came with random date, so in the worst scenario I would have to keep the
plans of all of the child partitions. Am I wrong?

But thinking better, even with hundreds of partitions, it wouldn't use to
much memory/resource, would it?


Right, a few hundred saved plans would probably still be ok. And if that 
ever becomes a problem, you could keep the plans in a LRU list and only 
keep the last 100 plans or so.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] FW: performance issue with a 2.5gb joinded table

2013-01-03 Thread Heikki Linnakangas

On 03.01.2013 15:30, Daniel Westermann wrote:

What additionally makes me wonder is, that the same table in oracle is taking 
much less space than in postgresql:

SQL  select  sum(bytes) from dba_extents where segment_name = 'TEST1';
SUM(BYTES)
--
1610612736

select pg_relation_size('mgmtt_own.test1');
pg_relation_size
--
2502082560
(1 row)

(sysdba@[local]:) [bi_dwht]  \d+ mgmtt_own.test1
  Table mgmtt_own.test1
 Column| Type  | Modifiers | Storage | 
Description
--+---+---+-+-
slsales_batch| numeric(8,0)  |   | main|
slsales_checksum | numeric(8,0)  |   | main|
slsales_reg_id   | numeric(8,0)  |   | main|
slsales_prod_id  | numeric(8,0)  |   | main|
slsales_date_id  | numeric(8,0)  |   | main|
slsales_pos_id   | numeric(8,0)  |   | main|
slsales_amt_sales_gross  | numeric(16,6) |   | main|
slsales_amt_sales_discount   | numeric(16,6) |   | main|
slsales_units_sales_gross| numeric(8,0)  |   | main|
slsales_amt_returns  | numeric(16,6) |   | main|
slsales_amt_returns_discount | numeric(16,6) |   | main|
slsales_units_returns| numeric(8,0)  |   | main|
slsales_amt_est_winnings | numeric(16,6) |   | main|
Indexes:
 itest1 btree (slsales_date_id) CLUSTER, tablespace mgmtt_idx
 itest2 btree (slsales_prod_id), tablespace mgmtt_idx
Has OIDs: no
Tablespace: mgmtt_dat


One difference is that numerics are stored more tightly packed on 
Oracle. Which is particularly good for Oracle as they don't have other 
numeric data types than number. On PostgreSQL, you'll want to use int4 
for ID-fields, where possible. An int4 always takes up 4 bytes, while a 
numeric holding an integer value in the same range is typically 5-9 bytes.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres configuration for 8 CPUs, 6 GB RAM

2012-11-27 Thread Heikki Linnakangas

On 27.11.2012 09:47, Syed Asif Tanveer wrote:

I am using PostgreSQL 9.1.5 for Data warehousing and OLAP puposes. Data size
is around 100 GB and I have tuned my PostgreSQL accordingly still I am
facing performance issues. The query performance is too low despite tables
being properly indexed and are vacuumed and analyzed at regular basis. CPU
usage never exceeded 15% even at peak usage times. Kindly guide me through
if there are any mistakes in setting configuration parameters. Below are my
system specs and please find attached my postgresql configuration parameters
for current system.


The configuration looks OK to me at a quick glance. I'd suggest looking 
at the access plans of the queries that are too slow (ie. EXPLAIN 
ANALYZE). How low is too low, and how fast do the queries need to be? 
What kind of an I/O system does the server have? See also 
https://wiki.postgresql.org/wiki/Slow_Query_Questions


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to keep queries low latency as concurrency increases

2012-11-25 Thread Heikki Linnakangas

On 25.11.2012 18:30, Catalin Iacob wrote:

So it seems we're just doing too many connections and too many
queries. Each page view from a user translates to multiple requests to
the application server and each of those translates to a connection
and at least a few queries (which are done in middleware and therefore
happen for each and every query). One pgbouncer can handle lots of
concurrent idle connections and lots of queries/second but our 9000
queries/second to seem push it too much. The longer term solution for
us would probably be to do less connections (by doing less Django
requests for a page) and less queries, before our deadline we were
just searching for a short term solution to handle an expected traffic
spike.


The typical solution to that is caching, see 
https://docs.djangoproject.com/en/1.4/topics/cache/.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-22 Thread Heikki Linnakangas

On 22.11.2012 02:53, Jeff Janes wrote:

That gives the planner the information it needs to choose the right plan on
its own. That kind of hints would be much less implementation specific and
much more likely to still be useful, or at least not outright
counter-productive, in a future version with a smarter planner.


When I run into unexpectedly poor performance, I have an intuitive
enough feel for my own data that I know what plan it ought to be
using.  Figuring out why it is not using it is very hard.  For one
thing, EXPLAIN tells you about the winning plan, but there is no
visibility into what ought to be the winning plan but isn't, so no way
to see why it isn't.So you first have to use our existing non-hint
hints (enable_*, doing weird things with cost_*, CTE stuff) to trick
it into using the plan I want it to use, before I can figure out why
it isn't using it, before I could figure out what hints of the style
you are suggesting to supply to get it to use it.


I'm sure that happens too, but my gut feeling is that more often the 
EXPLAIN ANALYZE output reveals a bad estimate somewhere in the plan, and 
the planner chooses a bad plan based on the bad estimate. If you hint 
the planner by giving a better estimate for where the estimator got it 
wrong, the planner will choose the desired plan.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] fast read of binary data

2012-11-22 Thread Heikki Linnakangas

On 22.11.2012 09:54, Eildert Groeneveld wrote:

ok, I did have a look at the libpq librar, and you are right, there is a
way to obtain binary data from the backend through the  PQexecParams

 res = PQexecParams(conn,
DECLARE myportal CURSOR FOR select genotype_bits
from v_genotype_data,
0,   /* zero param */
NULL,/* let the backend deduce param type */
paramValues,
NULL,/* don't need param lengths since text*/
NULL,/* default to all text params */
1);  /* ask for binary results */

genotype_bits is defined as bit varying in the backend. When writing the
results:
 for (i = 0; i  PQntuples(res); i++)
 {
 for (j = 0; j  nFields; j++)
   fwrite(PQgetvalue(res, i, j),10,1,f);
 }

it is clear that the results are NOT in binary format:
eg(eno,snp): od -b junk |head
000 061 060 061 060 061 060 061 060 061 060 061 060 061 060 061 060


You need to ask for binary results when you execute the FETCH 
statements. Asking for binary results on the DECLARE CURSOR statement 
has no effect, as DECLARE CURSOR doesn't return any results; it's the 
FETCH that follows that returns the result set.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Heikki Linnakangas

On 21.11.2012 01:53, Tom Lane wrote:

I think the more interesting question is what cases wouldn't be covered
by such a rule.  Typically you need to use OFFSET 0 in situations where
the planner has guessed wrong about costs or rowcounts, and I think
people are likely using WITH for that as well.  Should we be telling
people that they ought to insert OFFSET 0 in WITH queries if they want
to be sure there's an optimization fence?


Yes, I strongly feel that we should. Writing a query using WITH often 
makes it more readable. It would be a shame if people have to refrain 
from using it, because the planner treats it as an optimization fence.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Heikki Linnakangas

On 21.11.2012 17:42, Gavin Flower wrote:

On 22/11/12 04:32, Andres Freund wrote:

On 2012-11-21 10:21:16 -0500, Andrew Dunstan wrote:

I wasn't talking about removing it. My point was that if the
optimization
fence around CTEs is removed a lot of people will need to rework apps
where
they have used them for that purpose. And I continue to think that
spelling
it OFFSET 0 is horribly obscure.

+1


FWIW, I'm happy with OFFSET 0. Granted, it's pretty obscure, but 
that's what we've historically recommended, and it's pretty ugly to have 
to specify a fence like that in the first place. Whenever you have to 
resort to it, you ought have a comment in the query explaining why you 
need to force the planner like that, anyway.



WITH foo AS (SELECT ...) (barrier=on|off)?

9.3 introduces the syntax, defaulting to on
9.4 switches the default to off.


WITH foo AS (SELECT ...) (fence=on|off)?

WITH foo AS (SELECT ...) (optimisation_fence=on|off)?


If we are to invent a new syntax for this, can we please come up with 
something that's more widely applicable than just the WITH syntax. 
Something that you could use to replace OFFSET 0 in a subquery, too.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Heikki Linnakangas

On 21.11.2012 15:42, Kevin Grittner wrote:

Better, IMV, would be to identify what sorts of hints people actually
find useful, and use that as the basis for TODO items for optimizer
improvement as well as inventing clear ways to specify the desired
coercion. I liked the suggestion that a CTE which didn't need to be
materialized because of side-effects or multiple references have a
keyword. Personally, I think that AS MATERIALIZED x (SELECT ...)
would be preferable to AS x (SELECT ... OFFSET 0) as the syntax to
specify that.


Rather than telling the planner what to do or not to do, I'd much rather 
have hints that give the planner more information about the tables and 
quals involved in the query. A typical source of bad plans is when the 
planner gets its cost estimates wrong. So rather than telling the 
planner to use a nested loop join for a INNER JOIN b ON a.id = b.id, 
the user could tell the planner that there are only 10 rows that match 
the a.id = b.id qual. That gives the planner the information it needs 
to choose the right plan on its own. That kind of hints would be much 
less implementation specific and much more likely to still be useful, or 
at least not outright counter-productive, in a future version with a 
smarter planner.


You could also attach that kind of hints to tables and columns, which 
would be more portable and nicer than decorating all queries.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 8.4, COPY, and high concurrency

2012-11-13 Thread Heikki Linnakangas

On 13.11.2012 21:13, Jon Nelson wrote:

I was working on a data warehousing project where a fair number of files
could be COPY'd more or less directly into tables. I have a somewhat nice
machine to work with, and I ran on 75% of the cores I have (75% of 32 is
24).

Performance was pretty bad. With 24 processes going, each backend (in COPY)
spent 98% of it's time in semop (as identified by strace).  I tried larger
and smaller shared buffers, all sorts of other tweaks, until I tried
reducing the number of concurrent processes from 24 to 4.

Disk I/O went up (on average) at least 10X and strace reports that the top
system calls are write (61%), recvfrom (25%), and lseek (14%) - pretty
reasonable IMO.

Given that each COPY is into it's own, newly-made table with no indices or
foreign keys, etc, I would have expected the interaction among the backends
to be minimal, but that doesn't appear to be the case.  What is the likely
cause of the semops?


I'd guess it's lock contention on WALInsertLock. That means, the system 
is experiencing lock contention on generating WAL records for the 
insertions. If that theory is correct, you ought to get a big gain if 
you have wal_level=minimal, and you create or truncate the table in the 
same transaction with the COPY. That allows the system to skip 
WAL-logging the COPY.


Or you could upgrade to 9.2. The WAL-logging of bulk COPY was optimized 
in 9.2, it should help precisely the scenario you're facing.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] dbt2 performance regresses from 9.1.6 to 9.2.1

2012-11-07 Thread Heikki Linnakangas

On 05.11.2012 16:32, Claudio Freire wrote:

On Sun, Nov 4, 2012 at 7:23 PM, Dong Yey...@vmware.com  wrote:

You are using prepared statements, this makes me think that this
regression might be due to support for parameter specific plans for
prepared statements. [1] Can you run the test on both versions without
prepared statements and see if the regressions remains.


Without prepare statement, we got 48837.33 avg notpm with 9.1.6 and 43264.54 
avg notpm with 9.2.1.
notps over time shows the slowdown of 9.2.1 is evident during the entire course 
of the run.
Their profiles are posted on http://pgsql.privatepaste.com/b770f72967 (9.1.6) 
and http://pgsql.privatepaste.com/6fa8b7f174 (9.2.1).


You know... it does look as if 9.2.1 is generating a lot more pressure
into the memory allocator (AllocSetAlloc notably higher).


Did you check the access plans of the queries? 9.2 planner might choose 
a slightly worse plan. Or perhaps index-only scans are hurting 
performance with the DBT-2 queries.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replaying 48 WAL files takes 80 minutes

2012-10-30 Thread Heikki Linnakangas

On 30.10.2012 10:50, Albe Laurenz wrote:

Why does WAL replay read much more than it writes?
I thought that pretty much every block read during WAL
replay would also get dirtied and hence written out.


Not necessarily. If a block is modified and written out of the buffer 
cache before next checkpoint, the latest version of the block is already 
on disk. On replay, the redo routine reads the block, sees that the 
change was applied, and does nothing.



I wonder why the performance is good in the first few seconds.
Why should exactly the pages that I need in the beginning
happen to be in cache?


This is probably because of full_page_writes=on. When replay has a full 
page image of a block, it doesn't need to read the old contents from 
disk. It can just blindly write the image to disk. Writing a block to 
disk also puts that block in the OS cache, so this also efficiently 
warms the cache from the WAL. Hence in the beginning of replay, you just 
write a lot of full page images to the OS cache, which is fast, and you 
only start reading from disk after you've filled up the OS cache. If 
this theory is true, you should see a pattern in the I/O stats, where in 
the first seconds there is no I/O, but the CPU is 100% busy while it 
reads from WAL and writes out the pages to the OS cache. After the OS 
cache fills up with the dirty pages (up to dirty_ratio, on Linux), you 
will start to see a lot of writes. As the replay progresses, you will 
see more and more reads, as you start to get cache misses.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] deadlock_timeout affect on performance

2012-10-02 Thread Heikki Linnakangas

On 01.10.2012 19:49, pg noob wrote:

Hi all,

I have a question about the deadlock_timeout in regards to performance.
Right now we have this timeout set at its default of 1s.
My understanding of it is that this means that every 1 second the server
will check for deadlocks.


Not quite. It means that when a backend gets blocked, waiting on a lock, 
it will check for deadlocks after waiting for 1 second. When no backend 
is waiting for a lock, there are no deadlock checks regardless of 
deadlock_timeout.



What I am wondering is how much of a performance improvement we would
expect to get if this was raised to 30 seconds?
Is it negligible or could it be a substantial performance improvement on a
busy system?
We very rarely have deadlocks and waiting 30 seconds to discover one
doesn't seem too bad.


It's almost certainly negligible. If you regularly have deadlocks, it 
might even better for performance to make the timeout shorter than 1 s, 
so that deadlocks are detected earlier, and backends will spend less 
time deadlocked, and more time doing real work. Although I doubt it will 
make any meaningful difference either way.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] exponential performance decrease in ISD transaction

2012-08-31 Thread Heikki Linnakangas

On 31.08.2012 15:27, John Nash wrote:

Program 1: dbtransfromfile: this program creates a simple table
consisting of a one int column table. After the creation, the program
inserts 1000 tuples in the table, which are never deleted, after that
the program reads a transaction pattern from a given file and executes
it a number of times determined when the program is launched.

The transaction we are launching is (INSERT/SELECT/DELETE) the following:

insert into T_TEST values (1);select * from T_TEST where
c1=1000;delete from T_TEST where c1=1;commit;


Sounds like the table keeps growing when rows are inserted and 
subsequently deleted. PostgreSQL doesn't immediately remove deleted 
tuples from the underlying file, but simply marks them as deleted. The 
rows are not physically removed until autovacuum kicks in and cleans it 
up, or the table is vacuumed manually.


I'd suggest creating an index on t_test(c1), if there isn't one already. 
It's not helpful when the table is small, but when the table is bloated 
with all the dead tuples from the deletions, it should help to keep the 
access fast despite the bloat.


- Heikki


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgress is taking lot of CPU on our embedded hardware.

2012-01-27 Thread Heikki Linnakangas

On 27.01.2012 15:34, Jayashankar K B wrote:

Hi,

We are having an embedded system with a freescale m68k architecture based 
micro-controller, 256MB RAM running a customized version of Slackware 12 linux.
It's a relatively modest Hardware.


Fascinating!


We have installed postgres 9.1 as our database engine. While testing, we found 
that the Postgres operations take more than 70% of CPU and the average also 
stays above 40%.
This is suffocating the various other processes running on the system. Couple 
of them are very critical ones.
The testing involves inserting bulk number of records (approx. 1 records 
having between 10 and 20 columns).
Please let us know how we can reduce CPU usage for the postgres.


The first step would be to figure out where all the time is spent. Are 
there unnecessary indexes you could remove? Are you using INSERT 
statements or COPY? Sending the data in binary format instead of text 
might shave some cycles.


If you can run something like oprofile on the system, that would be 
helpful to pinpoint the expensive part.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] regarding CLUSTER and HUGE work_mem / maintenance_work_mem

2012-01-27 Thread Heikki Linnakangas

On 27.01.2012 19:43, Jon Nelson wrote:

Let's say I have a 7GB table with 3-4 indices for a total of 10-12GB.
Furthermore, let's say I have a machine with sufficient memory for me
to set the work_mem  and maintenance_work_mem to 20GB (just for this
session).
When I issue a CLUSTER using one of the indices, I see PostgreSQL (by
way of strace) performing an index scan which amounts to large
quantities of random I/O.
In my case, that means it takes a very, very long time. PostgreSQL is
largely at defaults, except for a 2GB shared_buffers and a few
unrelated changes. The system itself has 32GB of physical RAM and has
plenty free.
Why didn't PostgreSQL just read the table into memory (and the
interesting index) as a sequential scan, sort, and then write it out?
It seems like there would be more than enough memory for that. The
sequential I/O rate on this machine is 50-100x the random I/O rate.

I'm using 8.4.10 (with the 'inet' de-toasting patch) on Scientific Linux 6.1.


The suppport for doing a seqscan+sort in CLUSTER was introduced in 
version 9.1. Before that, CLUSTER always did an indexscan. See release 
notes: http://www.postgresql.org/docs/9.1/static/release-9-1.html#AEN107416


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgress is taking lot of CPU on our embedded hardware.

2012-01-27 Thread Heikki Linnakangas

On 27.01.2012 20:30, Jayashankar K B wrote:

Hi Heikki Linnakangas: We are using series of Insert statements to insert the 
records into database.
Sending data in binary is not an option as the module that writes into DB has 
been finalized.
We do not have control over that.


That certainly limits your options.


Please let me know how we can proceed. On the net I couldn't get hold of any 
good example where Postgres has been used on limited Hardware system.


I don't think there's anything particular in postgres that would make it 
a poor choice on a small system, as far as CPU usage is concerned 
anyway. But inserting rows in a database is certainly slower than, say, 
writing them into a flat file.


At what rate are you doing the INSERTs? And how fast would they need to 
be? Remember that it's normal that while the INSERTs are running, 
postgres will use all the CPU it can to process them as fast as 
possible. So the question is, at what rate do they need to be processed 
to meet your target. Lowering the process priority with 'nice' might 
help too, to give the other important processes priority over postgres.


The easiest way to track down where the time is spent would be to run a 
profiler, if that's possible on your platform.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Cursor fetch performance issue

2012-01-25 Thread Heikki Linnakangas

On 24.01.2012 23:34, Tony Capobianco wrote:

Here's the explain:

pg=# explain select getMemberAdminPrevious_sp(247815829, 
1,'test.em...@hotmail.com', 'Email', 'Test');
 QUERY PLAN
--
  Result  (cost=0.00..0.26 rows=1 width=0)
(1 row)

Time: 1.167 ms


That's not very helpful. We'd need to see the plan of the query within 
the function, not the plan on invoking the function. The auto_explain 
contrib module with auto_explain_log_nested_statements=on might be 
useful to get that.



There was discussion of 'LIKE' v. '=' and wildcard characters are not
being entered into the $1 parameter.

This is not generating a sql string.  I feel it's something to do with
the fetch of the refcursor.  The cursor is a larger part of a function:

CREATE OR REPLACE FUNCTION PUBLIC.GETMEMBERADMINPREVIOUS_SP2 (
   p_memberid   IN numeric,
   p_websiteid  IN numeric,
   p_emailaddress   IN varchar,
   p_firstname  IN varchar,
   p_lastname   IN varchar)
RETURNS refcursor AS $$
DECLARE
   refrefcursor;
   l_sysdateidnumeric;
BEGIN
   l_sysdateid := sysdateid();
   if (p_memberid != 0) then
 if (p_emailaddress IS NOT NULL) then
   OPEN ref FOR
 SELECT m.memberid, m.websiteid, m.emailaddress,
m.firstname, m.lastname, m.regcomplete, m.emailok
 FROM   members m
 WHERE  m.emailaddress LIKE p_emailaddress
 ANDm.changedate_id  l_sysdateid ORDER BY m.emailaddress,
m.websiteid;
 end if;
   end if;
   Return ref;
EXCEPTION
WHEN NO_DATA_FOUND THEN
Return null;
END;
$$ LANGUAGE 'plpgsql';


The theory that the query takes a long time because LIKE 
p_emailaddress is not optimizeable by the planner seems the most likely 
to me.


If you don't actually use any wildcards in the email, try replacing LIKE 
with =. If you do, then you can try the OPEN ref FOR EXECUTE syntax. 
That way the query is re-planned every time, and the planner can take 
advantage of the parameter value. That enables it to use an index on the 
email address column, when there isn't in fact any wildcards in the 
value, and also estimate the selectivities better which can lead to a 
better plan. Like this:


CREATE OR REPLACE FUNCTION public.getmemberadminprevious_sp2(p_memberid 
numeric, p_websiteid numeric, p_emailaddress character varying, 
p_firstname character varying, p_lastname character varying)

 RETURNS refcursor
 LANGUAGE plpgsql
AS $function$
DECLARE
  refrefcursor;
  l_sysdateidnumeric;
BEGIN
  l_sysdateid := sysdateid();
  if (p_memberid != 0) then
if (p_emailaddress IS NOT NULL) then
  OPEN ref FOR EXECUTE $query$
SELECT m.memberid, m.websiteid, m.emailaddress,
   m.firstname, m.lastname, m.regcomplete, m.emailok
FROM   members m
WHERE  m.emailaddress LIKE $1
ANDm.changedate_id  $2 ORDER BY m.emailaddress,
m.websiteid;
  $query$ USING p_emailaddress, l_sysdateid;
end if;
  end if;
  Return ref;
EXCEPTION
WHEN NO_DATA_FOUND THEN
   Return null;
END;
$function$

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] autovacuum, exclude table

2011-12-12 Thread Heikki Linnakangas

On 12.12.2011 16:25, Anibal David Acosta wrote:

I have a couple of tables with about 400millions of records increasing about
5 millions per day.

I think that disabling autovac over those tables, and enabling daily manual
vacuum (in some idle hour) will be better.

I am right?


Possibly. If the system is otherwise idle, it sounds sensible to do 
routine maintenance at that time.



Is possible to exclude autovacuum over some tables?


Sure, see 
http://www.postgresql.org/docs/9.1/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS


ALTER TABLE foo SET (autovacuum_enabled=false, toast.autovacuum_enabled 
= false);


It might be better, though, to let autovacuum enabled, and just do the 
additional manual VACUUM in the idle period. If the daily manual VACUUM 
is enough to keep the bloat within the autovacuum thresholds, autovacuum 
will never kick in. If it's not enough, then you probably want 
autovacuum to run.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSL encryption makes bytea transfer slow

2011-11-04 Thread Heikki Linnakangas

On 04.11.2011 10:43, Albe Laurenz wrote:

Marti Raudsepp wrote:

Disabling OpenSSL compression in the source (which
is possible since OpenSSL 1.0.0) does not give me any performance
improvement.


If it doesn't give you any performance improvement then you haven't
disabled compression. Modern CPUs can easily saturate 1 GbitE with
AES256-encrypted connections. Compression is usually the bottleneck,
at 20-30 MB/s.


Hmm, my knowledge of OpenSSL is so little that it is well possible that
I did it wrong. I have attached the small patch I used; can you see
where I went wrong?


That only works with OpenSSL 1.0.0 - did you upgrade? I thought you were 
using 0.9.7a earlier.


FWIW, it would be better to test #ifdef SSL_OP_NO_COMPRESSION 
directly, rather than the version number.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSL encryption makes bytea transfer slow

2011-10-29 Thread Heikki Linnakangas

On 28.10.2011 14:02, Albe Laurenz wrote:

We selected a 30MB bytea with psql connected with
-h localhost and found that it makes a huge
difference whether we have SSL encryption on or off.

Without SSL the SELECT finished in about a second,
with SSL it took over 23 seconds (measured with
\timing in psql).
During that time, the CPU is 100% busy.
All data are cached in memory.

Is this difference as expected?


I tried to reproduce that, but only saw about 4x difference in the 
timing, not 23x.


$ PGSSLMODE=disable ~/pgsql.master/bin/psql -h localhost postgres
psql (9.2devel)
Type help for help.

postgres=# \o foo
postgres=# \timing
Timing is on.
postgres=# SELECT repeat(xx,65536)::bytea FROM (SELECT 
string_agg(lpad(to_hex(x),2, '0' ),'') AS xx FROM generate_series(0,255) 
x) AS xx;

Time: 460,782 ms

$ PGSSLMODE=require ~/pgsql.master/bin/psql -h localhost postgres
psql (9.2devel)
SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
Type help for help.

postgres=# \o foo
postgres=# \timing
Timing is on.
postgres=# SELECT repeat(xx,65536)::bytea FROM (SELECT 
string_agg(lpad(to_hex(x),2, '0' ),'') AS xx FROM generate_series(0,255) 
x) AS xx;

Time: 1874,276 ms


oprofile suggests that all that overhead is coming from compression. 
Apparently SSL does compression automatically. Oprofile report of the 
above test case with SSL enabled:


samples  %image name   symbol name
2817774.4753  libz.so.1.2.3.4  /usr/lib/libz.so.1.2.3.4
1814  4.7946  postgres byteain
1459  3.8563  libc-2.13.so __memcpy_ssse3_back
1437  3.7982  libcrypto.so.0.9.8   /usr/lib/libcrypto.so.0.9.8
896   2.3682  postgres hex_encode
304   0.8035  vmlinux-3.0.0-1-amd64clear_page_c
271   0.7163  libc-2.13.so __strlen_sse42
222   0.5868  libssl.so.0.9.8  /usr/lib/libssl.so.0.9.8

And without:

samples  %image name   symbol name
1601 27.4144  postgres byteain
865  14.8116  postgres hex_encode
835  14.2979  libc-2.13.so __memcpy_ssse3_back
290   4.9658  vmlinux-3.0.0-1-amd64clear_page_c
280   4.7945  libc-2.13.so __strlen_sse42
184   3.1507  vmlinux-3.0.0-1-amd64page_fault
174   2.9795  vmlinux-3.0.0-1-amd64put_mems_allowed


Maybe your data is very expensive to compress for some reason?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] : PG9.0 - Checkpoint tuning and pg_stat_bgwriter

2011-10-04 Thread Heikki Linnakangas

On 04.10.2011 13:50, Venkat Balaji wrote:

I have got a situation where in i see the production system is loaded with
the checkpoints and at-least 1000+ buffers are being written for every
checkpoint.


1000 buffers isn't very much, that's only 8 MB, so that's not alarming 
itself.



I am thinking of increasing the checkpoint_segments.

Below are our current settings -

checkpoint_segments = 8
checkpoint_timeout = 5 mins
checkpoint_completion_target = 0.5
bgwriter_delay = 200ms
bgwriter_lru_maxpages = 100
bgwriter_lru_multiplier = 2

Looking forward for suggestions.


Yep, increase checkpoint_segments. And you probably want to raise 
checkpoint_timeout too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hash index use presently(?) discouraged since 2005: revive or bury it?

2011-09-14 Thread Heikki Linnakangas

On 14.09.2011 09:39, Stefan Keller wrote:

Should I open a ticket?


What ticket? With whom?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query performance issue

2011-08-31 Thread Heikki Linnakangas

On 31.08.2011 12:00, Jayadevan M wrote:

Hello all,
I have a query which takes about 20 minutes to execute and retrieves
2000-odd records. The explain for the query is pasted here
http://explain.depesz.com/s/52f
The same query, with similar data structures/indexes and data comes back
in 50 seconds in Oracle. We just ported the product to PostgreSQL and are
testing it. Any input on what to look for?

Possible relevant parameters are
shared_buffers = 4GB
temp_buffers = 8MB
work_mem = 96MB
maintenance_work_mem = 1GB
effective_cache_size = 8GB
default_statistics_target = 50

It is a machine with 16 GB RAM.


Please run EXPLAIN ANALYZE on the query and post that, it's hard to say 
what's wrong from just the query plan, without knowing where the time is 
actually spent. And the schema of the tables involved, and any indexes 
on them. (see also http://wiki.postgresql.org/wiki/SlowQueryQuestions)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Infinite Cache

2011-07-08 Thread Heikki Linnakangas

On 05.07.2011 16:35, Shaun Thomas wrote:

I'd say it's probably safe enough these days. But it's also one of those
exclusive selling points they're using right now to garner EDB
customers. So I doubt it'll be released any time *soon*, though may make
it eventually.


I doubt the community would want it even if it was open sourced. As an 
open source project, what would probably make more sense is a similar 
caching mechanism built into the kernel, somewhere between the 
filesystem cache and user-space. That way any you could use it with any 
application that benefits from the kind of large cache that Infinite 
Cache provides.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Order of tables

2011-04-28 Thread Heikki Linnakangas

On 28.04.2011 12:20, Rishabh Kumar Jain wrote:

How the tables must be ordered in the list of tables in from statement?


There is no difference in performance, if that's what you mean. (If not, 
then pgsql-novice or pgsql-sql mailing list would've be more appropriate)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] C on Client versus C on Server

2011-04-03 Thread Heikki Linnakangas

On 02.04.2011 21:52, Eliot Gable wrote:

1) Each select statement from the User Client C Module would be a separate
transaction which would drastically increase transaction overhead for the
whole set of requests.


You could wrap the statements in BEGIN-COMMIT in the client code.


2) Writing the billing data at the end would mean that I not only have to
pull all the data down to the User Client, I must also push the data back up
to the server for writing the billing records.


Yeah, that sounds right.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance Test for PostgreSQL9

2011-02-28 Thread Heikki Linnakangas

On 28.02.2011 11:10, Selva manickaraja wrote:

OK, somehow I got these modules installed. Finally I successfully built and
installed PostgreSQL! I must thank you guys so much for helping.

Now coming to the real issue of the matter. According to the documentation
the gmake installcheck can be run in various directories. However it seem
to be only local. Can these tests be run from local but to stress test a
database on a remote machine. This way I don't need to go on building
postgresql from source in every new db server. I will wait for your reply.


Try

PGHOST=servername make installcheck

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] inheritance: planning time vs children number vs column number

2011-02-28 Thread Heikki Linnakangas

On 28.02.2011 11:38, Marc Cousin wrote:

I've been facing a very large (more than 15 seconds) planning time in a
partitioned configuration. The amount of partitions wasn't completely crazy,
around 500, not in the thousands. The problem was that there were nearly 1000
columns in the parent table (very special use case, there is a reason for this
application for having these many columns). The check constraint was extremely
simple (for each child, 1 column = 1 constant, always the same column).

As I was surprised by this very large planning time, I have been trying to
study the variation of planning time against several parameters:
- number of columns
- number of children tables
- constraint exclusion's value (partition or off)

What (I think) I measured is that the planning time seems to be O(n^2) for the
number of columns, and O(n^2) for the number of children tables.

Constraint exclusion had a limited impact on planning time (it added between
20% and 100% planning time when there were many columns).


Testing here with a table with 1000 columns and 100 partitions, about 
80% of the planning time is looking up the statistics on attribute 
width, to calculate average tuple width. I don't see O(n^2) behavior, 
though, it seems linear.



I'd like to know if this is a known behavior ? And if I could mitigate it
somehow ?


I'm out of ideas on how to make it faster, I'm afraid.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] application of KNN code to US zipcode searches?

2011-02-17 Thread Heikki Linnakangas

On 17.02.2011 17:20, Mark Stosberg wrote:

I thought the benefit of KNN was that you could retrieve the rows in
distance order, so that a query for the closest 20 locations (for
example) would be very fast.  I wouldn't have expected it to be
helpful when you're selecting all the rows regardless of distance.


Kevin,

Thanks for the feedback. You are right that my reduced test case
wasn't a good approximation. I added a limit, to simulate finding the
100 zipcodes closest to 90210.

Below I compare 4 approaches to the same query:

1. Cube search
2. Earth Distance Search
3. Simple point distance (no index)
4. Simple point distance (KNN)

Now KNN benchmarks to be almost 100x faster! That's very promising.
Then there's only the issue that simple point distance is not expected
to be a good enough approximation of earth-distances. Perhaps that can
be solved by pre-computing coordinates based on the lat/long pairs
much like the map projections used to present a curved surface on a flat
map? Given that's OK to be be a few miles off, it seems we have some
leeway here.

Recommendations?


The existing opclasses only support distance-to-a-point, but I believe 
the KNN gist code is flexible enough that it could be used for distance 
to the edge of a shape as well. Someone just needs to write the 
operators and support functions.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Field wise checking the performance.

2011-02-14 Thread Heikki Linnakangas

On 14.02.2011 14:06, dba wrote:

I have two identical tables. But the with of the fields are different. Need
to know whether changing from varchar(100) to varchar(30) will increase the
performance, or its just the memory access.


It will make no difference. The max length is just a constraint on what 
values can be stored, it doesn't affect how the strings are stored. In 
both cases, the strings are stored in a variable-length format.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-03 Thread Heikki Linnakangas

On 03.11.2010 17:52, Nick Matheson wrote:

We have an application that needs to do bulk reads of ENTIRE
Postgres tables very quickly (i.e. select * from table). We have
observed that such sequential scans run two orders of magnitude slower
than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is
due to the storage overhead we have observed in Postgres. In the
example below, it takes 1 GB to store 350 MB of nominal data. However
that suggests we would expect to get 35 MB/s bulk read rates.

Observations using iostat and top during these bulk reads suggest
that the queries are CPU bound, not I/O bound. In fact, repeating the
queries yields similar response times. Presumably if it were an I/O
issue the response times would be much shorter the second time through
with the benefit of caching.

We have tried these simple queries using psql, JDBC, pl/java stored
procedures, and libpq. In all cases the client code ran on the same
box as the server. We have experimented with Postgres 8.1, 8.3 and 9.0.


Try COPY, ie. COPY bulk_performance.counts TO STDOUT BINARY.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Useless sort by

2010-09-14 Thread Heikki Linnakangas

On 13/09/10 19:48, Tom Lane wrote:

Gaetano Mendolamend...@gmail.com  writes:

Of course I'm not suggesting to take away the sort by and give the user
an unsorted result, I'm asking why the the optimizer in cases like:



select unique(a) from v_table_with_order_by;



doesn't takes away the order by inside the view and puts it back rewriting 
the
query like this:



select unique(a) from v_table_without_order_by
order by a;


That changes the order in which the rows are fed to unique(a).  The
principal real-world use for a non-top-level ORDER BY is exactly to
determine the order in which rows are fed to a function, so we will
have a revolt on our hands if we break that.


You could check for volatile functions. I think this could be done 
safely. However, it doesn't seem worthwhile, it would be a fair amount 
of code, and it's not usually a good idea to put an ORDER BY in a view 
or subquery anyway unless you also have volatile functions in there, or 
you want to coerce the optimizer to choose a certain plan.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] now() gives same time within the session

2010-07-18 Thread Heikki Linnakangas

On 12/07/10 14:15, A. Kretschmer wrote:

Use timeofday() instead, now() returns the transaction starting time.


timeofday() is a legacy function kept only for backwards-compatibility. 
It returns a string, which is quite awkward. Use clock_timestamp() instead.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analysis Function

2010-06-13 Thread Heikki Linnakangas

On 11/06/10 23:38, David Jarvis wrote:

I added an explicit cast in the SQL:

 dateserial(extract(YEAR FROM
m.taken)::int,'||p_month1||','||p_day1||') d1,
 dateserial(extract(YEAR FROM
m.taken)::int,'||p_month2||','||p_day2||') d2

The function now takes three integer parameters; there was no performance
loss.


We had a little chat about this with Magnus. It's pretty surprising that 
there's no built-in function to do this, we should consider adding one.


We could have a function like:

construct_timestamp(year int4, month int4, date int4, hour int4, minute 
int4, second int4, milliseconds int4, timezone text)


Now that we have named parameter notation, callers can use it to 
conveniently fill in only the fields needed:


SELECT construct_timestamp(year := 1999, month := 10, date := 22);

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Analysis Function

2010-06-11 Thread Heikki Linnakangas

On 11/06/10 11:25, David Jarvis wrote:

Datum dateserial (PG_FUNCTION_ARGS) {
   int32 p_year = PG_GETARG_INT32(0);
   int32 p_month = PG_GETARG_INT32(1);
   int32 p_day = PG_GETARG_INT32(2);

   DateADT d = date2j (p_year, p_month, p_day) - POSTGRES_EPOCH_JDATE;
   PG_RETURN_DATEADT(d);
}

Compiles without errors or warnings. The function is integrated as follows:

CREATE OR REPLACE FUNCTION dateserial(integer, integer, integer)
   RETURNS text AS
'ymd.so', 'dateserial'
   LANGUAGE 'c' IMMUTABLE STRICT
   COST 1;

However, when I try to use it, the database segfaults:

select dateserial( 2007, 1, 3 )

Any ideas why?


The C function returns a DateADT, which is a typedef for int32, but the 
CREATE FUNCTION statement claims that it returns 'text'.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Autovaccum settings while Bulk Loading data

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 11:47, Ambarish Bhattacharya wrote:

It would be helpful if you can suggest me the appropriate Autovacuum settings 
for handling this large data as my autovacuum setting is hanging the entire 
process.


What do you mean by hanging the entire process?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Autovaccum settings while Bulk Loading data

2010-06-10 Thread Heikki Linnakangas

Please keep the mailing list CC'd, so that others can help.

On 10/06/10 15:30, Ambarish Bhattacharya wrote:

On 10/06/10 11:47, Ambarish Bhattacharya wrote:

It would be helpful if you can suggest me the appropriate Autovacuum settings 
for handling this large data as my autovacuum setting is hanging the entire 
process.


What do you mean by hanging the entire process?


Hanging the entire process means...the autovacuum and auto analyzes starts and 
after that there is no acitivity i could see in the postgres log related to the 
bulk loading and when checked the postgres processes from the task manager i 
could see few of the postgres porcess are still running and had to be killed 
from there..normal shut down in not happening in this case...


You'll have to provide a lot more details if you want people to help 
you. How do you bulk load the data? What kind of log messages do you 
normally get in the PostgreSQL log related to bulk loading?


Autovacuum or autoanalyze should not interfere with loading data, even 
if it runs simultaneously.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need to increase performance of a query

2010-06-10 Thread Heikki Linnakangas

On 10/06/10 23:08, Anne Rosset wrote:

Heikki Linnakangas wrote:

On 10/06/10 22:47, Craig James wrote:

Postgres normally doesn't index NULL values even if the column is
indexed, so it has to do a table scan when your query includes an IS
NULL condition.


That was addressed in version 8.3. 8.3 and upwards can use an index
for IS NULL.

I believe the NULLs were stored in the index in earlier releases too,
they just couldn't be searched for.


I am using postgres 8.3.6. So why doesn't it use my index?


Well, apparently the planner doesn't think it would be any cheaper.

I wonder if this helps:

CREATE INDEX item_rank_project_id ON item_rank(project_id, rank, pf_id);

And make sure you drop any of the indexes that are not being used, to 
make sure the planner doesn't choose them instead.


(You should upgrade to 8.3.11, BTW. There's been a bunch of bug-fixes 
in-between, though I don't know if any are related to this, but there's 
other important fixes there)


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] FSM - per database or per installation?

2009-11-19 Thread Heikki Linnakangas
Craig James wrote:
 Are the FSM parameters for each database, or the entire Postgres
 system?  In other words, if I have 100 databases, do I need to increase
 max_fsm_pages and max_fsm_relations by a factor of 100, or keep them the
 same as if I just have one database?
 
 I suspect they're per-database, i.e. as I add databases, I don't have to
 increase the FSM parameters, but the documentation isn't 100% clear on
 this point.

It's per cluster, ie *not* per-database.

The parameter is gone in 8.4, BTW.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2009-11-15 Thread Heikki Linnakangas
Craig James wrote:
 I've wondered whether this would work for a read-mostly application: Buy
 a big RAM machine, like 64GB, with a crappy little single disk.  Build
 the database, then make a really big RAM disk, big enough to hold the DB
 and the WAL.  Then build a duplicate DB on another machine with a decent
 disk (maybe a 4-disk RAID10), and turn on WAL logging.
 
 The system would be blazingly fast, and you'd just have to be sure
 before you shut it off to shut down Postgres and copy the RAM files back
 to the regular disk.  And if you didn't, you could always recover from
 the backup.  Since it's a read-mostly system, the WAL logging bandwidth
 wouldn't be too high, so even a modest machine would be able to keep up.

Should work, but I don't see any advantage over attaching the RAID array
directly to the 1st machine with the RAM and turning synchronous_commit=off.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2009-11-14 Thread Heikki Linnakangas
Merlin Moncure wrote:
 2009/11/13 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 Laszlo Nagy wrote:
* I need at least 32GB disk space. So DRAM based SSD is not a real
  option. I would have to buy 8x4GB memory, costs a fortune. And
  then it would still not have redundancy.
 At 32GB database size, I'd seriously consider just buying a server with
 a regular hard drive or a small RAID array for redundancy, and stuffing
 16 or 32 GB of RAM into it to ensure everything is cached. That's tried
 and tested technology.
 
 lots of ram doesn't help you if:
 *) your database gets written to a lot and you have high performance
 requirements

When all the (hot) data is cached, all writes are sequential writes to
the WAL, with the occasional flushing of the data pages at checkpoint.
The sequential write bandwidth of SSDs and HDDs is roughly the same.

I presume the fsync latency is a lot higher with HDDs, so if you're
running a lot of small write transactions, and don't want to risk losing
any recently committed transactions by setting synchronous_commit=off,
the usual solution is to get a RAID controller with a battery-backed up
cache. With a BBU cache, the fsync latency should be in the same
ballpark as with SDDs.

 *) your data is important

Huh? The data is safely on the hard disk in case of a crash. The RAM is
just for caching.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2009-11-14 Thread Heikki Linnakangas
Merlin Moncure wrote:
 On Sat, Nov 14, 2009 at 6:17 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 lots of ram doesn't help you if:
 *) your database gets written to a lot and you have high performance
 requirements
 When all the (hot) data is cached, all writes are sequential writes to
 the WAL, with the occasional flushing of the data pages at checkpoint.
 The sequential write bandwidth of SSDs and HDDs is roughly the same.

 I presume the fsync latency is a lot higher with HDDs, so if you're
 running a lot of small write transactions, and don't want to risk losing
 any recently committed transactions by setting synchronous_commit=off,
 the usual solution is to get a RAID controller with a battery-backed up
 cache. With a BBU cache, the fsync latency should be in the same
 ballpark as with SDDs.
 
 BBU raid controllers might only give better burst performance.  If you
 are writing data randomly all over the volume, the cache will overflow
 and performance will degrade.

We're discussing a scenario where all the data fits in RAM. That's what
the large amount of RAM is for. The only thing that's being written to
disk is the WAL, which is sequential, and the occasional flush of data
pages from the buffer cache at checkpoints, which doesn't happen often
and will be spread over a period of time.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2009-11-13 Thread Heikki Linnakangas
Laszlo Nagy wrote:
* I need at least 32GB disk space. So DRAM based SSD is not a real
  option. I would have to buy 8x4GB memory, costs a fortune. And
  then it would still not have redundancy.

At 32GB database size, I'd seriously consider just buying a server with
a regular hard drive or a small RAID array for redundancy, and stuffing
16 or 32 GB of RAM into it to ensure everything is cached. That's tried
and tested technology.

I don't know how you came to the 32 GB figure, but keep in mind that
administration is a lot easier if you have plenty of extra disk space
for things like backups, dumps+restore, temporary files, upgrades etc.
So if you think you'd need 32 GB of disk space, I'm guessing that 16 GB
of RAM would be enough to hold all the hot data in cache. And if you
choose a server with enough DIMM slots, you can expand easily if needed.

Just my 2 cents, I'm not really an expert on hardware..

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] maintain_cluster_order_v5.patch

2009-10-22 Thread Heikki Linnakangas
ph...@apra.asso.fr wrote:
 Hi Jeff,
 If you can help (either benchmark work or C coding), try reviving the
 features by testing them and merging them with the current tree.
 OK, that's the rule of the game in such a community.
 I am not a good C writer, but I will see what I could do.

The FSM rewrite in 8.4 opened up more options for implementing this. The
patch used to check the index for the block the nearest key is stored
in, read that page in, and insert there if there's enough free space on
it. with the new FSM, you can check how much space there is on that
particular page before fetching it. And if it's full, the new FSM data
structure can be searched for a page with enough free space as close as
possible to the old page, although there's no interface to do that yet.

A completely different line of attack would be to write a daemon that
concurrently moves tuples in order to keep the table clustered. It would
interfere with UPDATEs and DELETEs, and ctids of the tuples would
change, but for many use cases it would be just fine. We discussed a
utility like that as a replacement for VACUUM FULL on hackers a while
ago, see thread Feedback on getting rid of VACUUM FULL. A similar
approach would work here, the logic for deciding which tuples to move
and where would just be different.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] vacuumdb command

2009-10-11 Thread Heikki Linnakangas
soorjith p wrote:
 I used the vacuumdb command. But in its output I cann't see VACUUM.
 
 The last part of output is
 
 DETAIL:  0 dead row versions cannot be removed yet.
 There were 0 unused item pointers.
 1 pages contain useful free space.
 0 pages are entirely empty.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.
 INFO:  free space map contains 768 pages in 392 relations
 DETAIL:  A total of 6720 page slots are in use (including overhead).
 6720 page slots are required to track all free space.
 Current limits are:  153600 page slots, 1000 relations, using 965 kB.
 
 
 I think if the process is complete then last part of output is VACUUM.
 Is it means the process is not complete?

No. It is complete.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] updating a row in a table with only one row

2009-10-02 Thread Heikki Linnakangas
Robert Haas wrote:
 On Fri, Oct 2, 2009 at 9:54 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, Oct 2, 2009 at 4:18 AM, Michal Vitecek f...@mageo.cz wrote:
  Hello everyone,

  I'm using PostgreSQL 8.3.8 running on a server with 2 Xeon CPUs, 4GB
  RAM, 4+2 disks in RAID 5 and CentOS 5.3. There's only one database
  which dumped with pgdump takes ~0.5GB.

  There are ~100 tables in the database and one of them (tableOne) always
  contains only a single row. There's one index on it. However performing
  update on the single row (which occurs every 60 secs) takes a
  considerably long time -- around 200ms. The system is not loaded in any
  way.

  The table definition is:

  CREATE TABLE tableOne (
value1  BIGINT NOT NULL,
value2  INTEGER NOT NULL,
value3  INTEGER NOT NULL,
value4  INTEGER NOT NULL,
value5  INTEGER NOT NULL,
  );
  CREATE INDEX tableOne_index1 ON tableOne (value5);

  And the SQL query to update the _only_ row in the above table is:
  ('value5' can't be used to identify the row as I don't know it at the
  time)

  UPDATE tableOne SET value1 = newValue1, value2 = newValue2, value5 = 
 newValue5;

  And this is what EXPLAIN says on the above SQL query:

  DB= EXPLAIN UPDATE tableOne SET value1 = newValue1, value2 = newValue2, 
 value5 = newValue5;
  LOG:  duration: 235.948 ms  statement: EXPLAIN UPDATE tableOne SET value1 
 = newValue1, value2 = newValue2, value5 = newValue5;
QUERY PLAN
  
  Seq Scan on jackpot  (cost=0.00..1.01 rows=1 width=14)
  (1 row)

  What takes PostgreSQL so long? I guess I could add a fake 'id' column,
  create an index on it to identify the single row, but still -- the time
  seems quite ridiculous to me.
 it is ridiculous.  your problem is almost definitely dead rows.  I
 can't recall (and I can't find the info anywhere) if the 'hot' feature
 requires an index to be active -- I think it does.  If so, creating a
 dummy field and indexing it should resolve the problem.   Can you
 confirm the dead row issue by doing vacuum verbose and create the
 index?  please respond with your results, I'm curious.  Also, is
 autovacuum on?  Have you measured iowait?
 
 Since he's updating all the fields in the table, an index will
 certainly ensure that HOT does not apply, no?

An extra index shouldn't hurt if you don't update the indexed dummy
column. But the existing tableOne_index1 will cause HOT to not apply, if
value5 is updated. I'd suggest dropping it (and not creating any other
indexes either), it won't do any good on a table with only one row anyway.

If the table is indeed bloated, VACUUM FULL should shrink it back. I
wonder how it got to be that way, though. Autovacuum should keep a table
like that in check.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Persistent Plan Cache

2009-09-13 Thread Heikki Linnakangas
Tom Lane wrote:
 Joshua Rubin jru...@esoft.com writes:
 We have a very large, partitioned, table that we often need to query
 from new connections, but frequently with similar queries. We have
 constraint exclusion on to take advantage of the partitioning. This also
 makes query planning more expensive. As a result, the CPU is fully
 loaded, all the time, preparing queries, many of which have been
 prepared, identically, by other connections.
 
 If you're depending on constraint exclusion, it's hard to see how plan
 caching could help you at all.  The generated plan needs to vary
 depending on the actual WHERE-clause parameters.

That's what the OP really should've complained about. If we addressed
that, so that a generic plan was created that determines which child
tables can be excluded at run time, there would be no need for the
persistent plan cache.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Persistent Plan Cache

2009-09-13 Thread Heikki Linnakangas
Joshua Rubin wrote:
 We hardcode the parts of the where clause so that the prepared plan
 will not vary among the possible partitions of the table. The only
 values that are bound would not affect the planner's choice of table.

Then you would benefit from using prepared statements in the client,
and/or connection pooling to avoid having to re-prepare because of
reconnecting.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partitioning max() sql not using index

2009-09-09 Thread Heikki Linnakangas
Kevin Kempter wrote:
 Hi all I have a large table (2billion rows) that's partitioned by date based 
 on an epoch int value.  We're running a select max(id) where id is the PK. I 
 have a PK index on each of the partitions, no indexes at all on the base 
 table.
 
 If I hit a partition table directly I get an index scan as expected:

The planner isn't smart enough to create the plan you're expecting.
There was discussion and even a patch posted recently about that:

http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php

It seems the thread petered out, but the concept seems sane.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   >