Re: [PERFORM] Blocking every 20 sec while mass copying.

2014-07-23 Thread Benjamin Dugast
Finally we solved our problem by using a kind of trick

We have 2 kind of table : online table for read and temp table to mass
insert our data

We work on the temp tables (5 different tables) to insert every data
without any index that goes really fast compared to the previous method
then we create index on these tables simultanously,
then we drop online tables(also 5 tables) and rename the temp tables to
online (takes less than 1 sec)

This is the faster way to insert our data that we found.
On our config it goes pretty fast, we reduce our execution time to 50% and
there is no more need of many maintenance on the database.

Thanks for all answer that you give us.



2014-07-21 10:02 GMT+02:00 Albe Laurenz :

> Please keep the list on CC: in your responses.
>
> Benjamin Dugast wrote:
> > 2014-07-18 13:11 GMT+02:00 Albe Laurenz :
> >> This sounds a lot like checkpoint I/O spikes.
> >>
> >> Check with the database server log if the freezes coincide with
> checkpoints.
> >>
> >> You can increase checkpoint_segments when you load data to have them
> occur less often.
> >>
> >> If you are on Linux and you have a lot of memory, you might hit spikes
> because too
> >> much dirty data are cached; check /proc/sys/vm/dirty_ratio and
> /proc/sys/dirty_background_ratio.
>
> > The checkpoint_segments is set to 64 already
> >
> > the dirty_ration was set by default to 10 i put it down to 5
> > the dirty_background_ratio was set to 5 and I changed it to 2
> >
> > There is less freezes but the insert is so slower than before.
>
> That seems to indicate that my suspicion was right.
>
> I would say that your I/O system is saturated.
> Have you checked with "iostat -mNx 1"?
>
> If you really cannot drop the indexes during loading, there's probably not
> much more
> you can do to speed up the load.
> You can try to increase checkpoint_segments beyond 64 and see if that buys
> you anything.
>
> Tuning the file system write cache will not reduce the amount of I/O
> necessary, but it
> should reduce the spikes (which is what I thought was your problem).
>
> Yours,
> Laurenz Albe
>


[PERFORM] Very slow planning performance on partition table

2014-07-23 Thread Rural Hunter

Hi,

I have a table partitioned with about 60 children tables.  Now I found 
the planning time of simple query with partition key are very slow.
# explain analyze select count(*) as cnt from article where pid=88 and 
hash_code='2ca3ff8b17b163f0212c2ba01b80a064';

QUERY PLAN
-
 Aggregate  (cost=16.55..16.56 rows=1 width=0) (actual 
time=0.259..0.259 rows=1 loops=1)
   ->  Append  (cost=0.00..16.55 rows=2 width=0) (actual 
time=0.248..0.250 rows=1 loops=1)
 ->  Seq Scan on article  (cost=0.00..0.00 rows=1 width=0) 
(actual time=0.002..0.002 rows=0 loops=1)
   Filter: ((pid = 88) AND (hash_code = 
'2ca3ff8b17b163f0212c2ba01b80a064'::bpchar))
 ->  Index Scan using article_88_hash_idx on article_88 
article  (cost=0.00..16.55 rows=1 width=0) (actual time=0.246..0.248 
rows=1 loops=1)
   Index Cond: (hash_code = 
'2ca3ff8b17b163f0212c2ba01b80a064'::bpchar)

   Filter: (pid = 88)
 Total runtime: 3.816 ms
(8 rows)

Time: 30999.986 ms

You can see the timing output that the actual run time of the 'explain 
analyze' is 30 seconds while the select sql itself takes only 3 ms. My 
partition key is on article.pid and the constraint is simple like this: 
CONSTRAINT article_88_pid_check CHECK (pid = 88). What's wrong and how 
can I improve the planning performance?



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very slow planning performance on partition table

2014-07-23 Thread Douglas J Hunley
On Wed, Jul 23, 2014 at 6:21 AM, Rural Hunter  wrote:

> What's wrong and how can I improve the planning performance?


What is constraint exclusion set to?


-- 
Douglas J Hunley (doug.hun...@gmail.com)


Re: [PERFORM] Building multiple indexes on one table.

2014-07-23 Thread Marc Mamin
>Von: pgsql-performance-ow...@postgresql.org 
>[pgsql-performance-ow...@postgresql.org]" im Auftrag von "Claudio 
>Freire [klaussfre...@gmail.com]
>Gesendet: Freitag, 18. Juli 2014 01:21
>An: Chris Ruprecht
>Cc: pgsql-performance@postgresql.org
>Betreff: Re: [PERFORM] Building multiple indexes on one table.
>
>On Thu, Jul 17, 2014 at 7:47 PM, Chris Ruprecht  wrote:
>> Is there any way that I can build multiple indexes on one table without 
>> having to scan the table multiple times? For small tables, that's probably 
>> not an issue, but if I have a 500 GB table that I need to create 6 indexes 
>> on, I don't want to read that table 6 times.
>> Nothing I could find in the manual other than reindex, but that's not 
>> helping, since it only rebuilds indexes that are already there and I don't 
>> know if that reads the table once or multiple times. If I could create 
>> indexes inactive and then run reindex, which then reads the table once, I 
>> would have a solution. But that doesn't seem to exist either.
>
>Just build them with separate but concurrent connections, and the
>scans will be synchronized so it will be only one.
>
>Btw, reindex rebuilds one index at a time, so what I do is issue
>separate reindex for each index in parallel, to avoid the repeated
>scans as well.
>
>Just make sure you've got the I/O and CPU capacity for it (you'll be
>writing many indexes at once, so there is a lot of I/O).

Index creation on large tables are mostly CPU bound as long as no swap occurs.
I/O may be an issue when all your indexes are similar; e.g. all on single int4 
columns.
in other cases the writes will not all take place concurrently.
To reduce I/O due to swap, you can consider increasing maintenance_work_mem on 
the connextions/sessionns
that build the indexes.

regards,

Marc Mamin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes on one table.

2014-07-23 Thread Claudio Freire
On Wed, Jul 23, 2014 at 4:40 PM, Marc Mamin  wrote:
>>On Thu, Jul 17, 2014 at 7:47 PM, Chris Ruprecht  wrote:
>>> Is there any way that I can build multiple indexes on one table without 
>>> having to scan the table multiple times? For small tables, that's probably 
>>> not an issue, but if I have a 500 GB table that I need to create 6 indexes 
>>> on, I don't want to read that table 6 times.
>>> Nothing I could find in the manual other than reindex, but that's not 
>>> helping, since it only rebuilds indexes that are already there and I don't 
>>> know if that reads the table once or multiple times. If I could create 
>>> indexes inactive and then run reindex, which then reads the table once, I 
>>> would have a solution. But that doesn't seem to exist either.
>>
>>Just build them with separate but concurrent connections, and the
>>scans will be synchronized so it will be only one.
>>
>>Btw, reindex rebuilds one index at a time, so what I do is issue
>>separate reindex for each index in parallel, to avoid the repeated
>>scans as well.
>>
>>Just make sure you've got the I/O and CPU capacity for it (you'll be
>>writing many indexes at once, so there is a lot of I/O).
>
> Index creation on large tables are mostly CPU bound as long as no swap occurs.
> I/O may be an issue when all your indexes are similar; e.g. all on single 
> int4 columns.
> in other cases the writes will not all take place concurrently.
> To reduce I/O due to swap, you can consider increasing maintenance_work_mem 
> on the connextions/sessionns
> that build the indexes.

Usually there will always be swap, unless you've got toy indexes.

But swap I/O is all sequential I/O, with a good readahead setting
there should be no problem.

It's the final writing step that can be a bottleneck if you have a
lame I/O system and try to push 5 or 6 indexes at once.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes on one table.

2014-07-23 Thread Felipe Santos
Your question: Is there any way that I can build multiple indexes on one
table without having to scan the table multiple times?

My answer: I don't think so. Since each index has a different indexing
rule, it will analyze the same table in a different way. I've built indexes
on a 100GB table recently and it didn't take me too much time (Amazon EC2
with 8 CPU cores / 70 GB RAM). I don't remember how much time it took, but
that's a good sign right  ;-)  ? Painful jobs are always remembered... (ok,
the hardware helped a lot).

So, my advice is: get yourself a good maintenance window and just build
indexes, remember that they will help a lot of people querying this table.


2014-07-23 16:49 GMT-03:00 Claudio Freire :

> On Wed, Jul 23, 2014 at 4:40 PM, Marc Mamin  wrote:
> >>On Thu, Jul 17, 2014 at 7:47 PM, Chris Ruprecht 
> wrote:
> >>> Is there any way that I can build multiple indexes on one table
> without having to scan the table multiple times? For small tables, that's
> probably not an issue, but if I have a 500 GB table that I need to create 6
> indexes on, I don't want to read that table 6 times.
> >>> Nothing I could find in the manual other than reindex, but that's not
> helping, since it only rebuilds indexes that are already there and I don't
> know if that reads the table once or multiple times. If I could create
> indexes inactive and then run reindex, which then reads the table once, I
> would have a solution. But that doesn't seem to exist either.
> >>
> >>Just build them with separate but concurrent connections, and the
> >>scans will be synchronized so it will be only one.
> >>
> >>Btw, reindex rebuilds one index at a time, so what I do is issue
> >>separate reindex for each index in parallel, to avoid the repeated
> >>scans as well.
> >>
> >>Just make sure you've got the I/O and CPU capacity for it (you'll be
> >>writing many indexes at once, so there is a lot of I/O).
> >
> > Index creation on large tables are mostly CPU bound as long as no swap
> occurs.
> > I/O may be an issue when all your indexes are similar; e.g. all on
> single int4 columns.
> > in other cases the writes will not all take place concurrently.
> > To reduce I/O due to swap, you can consider increasing
> maintenance_work_mem on the connextions/sessionns
> > that build the indexes.
>
> Usually there will always be swap, unless you've got toy indexes.
>
> But swap I/O is all sequential I/O, with a good readahead setting
> there should be no problem.
>
> It's the final writing step that can be a bottleneck if you have a
> lame I/O system and try to push 5 or 6 indexes at once.
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


[PERFORM] High rate of transaction failure with the Serializable Isolation Level

2014-07-23 Thread Reza Taheri
Hello PGSQL performance community,
[By way of introduction, we are a TPC subcommittee that is developing a 
benchmark with cloud-like characteristics for virtualized databases. The 
end-to-end benchmarking kit will be publicly available, and will run on PGSQL]

I am running into very high failure rates when I run with the Serializable 
Isolation Level. I have simplified our configuration to a single database with 
a constant workload, a TPC-E workload if you will, to focus on this this 
problem. We are running with PGSQL 9.2.4, ODBC 2.2.14 (as well as 2.3.3pre, 
which didn't help), RHEL 6.4, and a 6-way VM with 96GB of memory on a 4-socket 
Westmere server.

With our 9 transactions running with a mix of SQL_TXN_READ_COMMITTED and 
SQL_TXN_REPEATABLE_READ, we get less than 1% deadlocks, all of which occur 
because each row in one table, BROKER, may be read or written by multiple 
transactions at the same time. So, there are legitimate conflicts, which we 
deal with using an exponential backoff algorithm that sleeps for 
10ms/30ms/90ms/etc.

When we raise the Trade-Result transaction to SQL_TXN_SERIALIZABLE, we face a 
storm of conflicts. Out of 37,342 Trade-Result transactions, 15,707 hit an 
error, and have to be rolled back and retired one or more times. The total 
failure count (due to many transactions failing more than once) is 31,388.

What is unusual is that the majority of the failures occur in a statement that 
should not have any isolation conflicts. About 17K of failures are from the 
statement below:
2014-07-23 11:27:15 PDT 26085 ERROR:  could not serialize access due to 
read/write dependencies among transactions
2014-07-23 11:27:15 PDT 26085 DETAIL:  Reason code: Canceled on identification 
as a pivot, during write.
2014-07-23 11:27:15 PDT 26085 HINT:  The transaction might succeed if retried.
2014-07-23 11:27:15 PDT 26085 CONTEXT:  SQL statement "update   TRADE
set T_COMM = comm_amount,
T_DTS = trade_dts,
T_ST_ID = st_completed_id,
T_TRADE_PRICE = trade_price
where   T_ID = trade_id"
PL/pgSQL function traderesultframe5(ident_t,value_t,character,timestamp 
without time zone,trade_t,s_price_t) line 15 at SQL statement

This doesn't make sense since at any given time, only one transaction might 
possibly be accessing the row that is being updated. There should be no 
conflicts if we have row-level locking/isolation

The second most common conflict happens 7.6K times in the statement below:
2014-07-23 11:27:23 PDT 26039 ERROR:  could not serialize access due to 
read/write dependencies among transactions
2014-07-23 11:27:23 PDT 26039 DETAIL:  Reason code: Canceled on identification 
as a pivot, during conflict in checking.
2014-07-23 11:27:23 PDT 26039 HINT:  The transaction might succeed if retried.
2014-07-23 11:27:23 PDT 26039 CONTEXT:  SQL statement "insert
intoSETTLEMENT (SE_T_ID,
SE_CASH_TYPE,
SE_CASH_DUE_DATE,
SE_AMT)
values (trade_id,
cash_type,
due_date,
se_amount
)"
PL/pgSQL function traderesultframe6(ident_t,timestamp without time 
zone,character varying,value_t,timestamp without time 
zone,trade_t,smallint,s_qty_t,character) line 23 at SQL statement

I don't understand why an insert would hit a serialization conflict

We also have 4.5K conflicts when we try to commit:
2014-07-23 11:27:23 PDT 26037 ERROR:  could not serialize access due to 
read/write dependencies among transactions
2014-07-23 11:27:23 PDT 26037 DETAIL:  Reason code: Canceled on identification 
as a pivot, during commit attempt.
2014-07-23 11:27:23 PDT 26037 HINT:  The transaction might succeed if retried.
2014-07-23 11:27:23 PDT 26037 STATEMENT:  COMMIT


Does PGSQL raise locks to page level when we run with SQL_TXN_SERIALIZABLE? Are 
there any knobs I can play with to alleviate this?  FWIW, the same transactions 
on MS SQL Server see almost no conflicts.

Thanks,
Reza


Re: [PERFORM] Very slow planning performance on partition table

2014-07-23 Thread Rural Hunter

  
  
It's the default value(partition): 
  # grep exclusion postgresql.conf
  #constraint_exclusion = partition?0?2?0?2?0?2 # on, off, or partition
  
  btw, I'm on postgresql 9.2.4
  
  ?? 2014/7/24 1:35, Douglas J Hunley :


  

  On Wed, Jul 23, 2014 at 6:21 AM,
Rural Hunter 
wrote:
What's
  wrong and how can I improve the planning performance?
  
  
  What is constraint exclusion set to?
  
  
  
  -- 
  Douglas J Hunley (doug.hun...@gmail.com)

  


  




Re: [PERFORM] High rate of transaction failure with the Serializable Isolation Level

2014-07-23 Thread Craig Ringer
On 07/24/2014 09:18 AM, Reza Taheri wrote:
> What is unusual is that the majority of the failures occur in a
> statement that should not have any isolation conflicts. About 17K of
> failures are from the statement below:

It's not just that statement that is relevant.

At SERIALIZABLE isolation the entire transaction's actions must be
considered, as must the conflicting transaction.

> This doesn’t make sense since at any given time, only one transaction
> might possibly be accessing the row that is being updated. There should
> be no conflicts if we have row-level locking/isolation.

Is that statement run standalone, or as part of a larger transaction?

> The second most common conflict happens 7.6K times in the statement below:
...
> I don’t understand why an insert would hit a serialization conflict

If the INSERTing transaction previously queried for a key that was
created by a concurrent transaction this can occur as there is no
serialization execution order of the transactions that could produce the
same result.

This doesn't produce exactly the same error, but demonstrates one such case:


regress=> CREATE TABLE demo (id integer primary key, value integer);
CREATE TABLE
regress=> INSERT INTO demo(id, value) VALUES (1, 42);
INSERT 0 1

then

regress=> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN
regress=> SELECT id FROM demo WHERE id = 2;
 id

(0 rows)


session1=> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN
session2=> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN

session1=> SELECT id FROM demo WHERE id = 2;
 id

(0 rows)

session2=> SELECT id FROM demo WHERE id = 3;
 id

(0 rows)


session1=> INSERT INTO demo VALUES (3, 43);
INSERT 0 1

session2=> INSERT INTO demo VALUES (2, 43);
INSERT 0 1

session2=> COMMIT;
COMMIT

session1=> COMMIT;
ERROR:  could not serialize access due to read/write dependencies among
transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during
commit attempt.
HINT:  The transaction might succeed if retried.

> Does PGSQL raise locks to page level when we run with
> SQL_TXN_SERIALIZABLE?

>From the documentation
(http://www.postgresql.org/docs/current/static/transaction-iso.html):

> Predicate locks in PostgreSQL, like in most other database systems, are based 
> on data actually accessed by a transaction. These will show up in the 
> pg_locks system view with a mode of SIReadLock. The particular locks acquired 
> during execution of a query will depend on the plan used by the query, and 
> multiple finer-grained locks (e.g., tuple locks) may be combined into fewer 
> coarser-grained locks (e.g., page locks) during the course of the transaction 
> to prevent exhaustion of the memory used to track the locks.

... so yes, it may raise locks to page level. That doesn't mean that's
necessarily what's happening here.

> Are there any knobs I can play with to alleviate
> this? 

A lower FILLFACTOR can spread data out at the cost of wasted space.

> FWIW, the same transactions on MS SQL Server see almost no conflicts.

Many DBMSs don't detect all serialization anomalies. PostgreSQL doesn't
detect all possible anomalies but it detects many that other systems may
not.

To see what's going on and why MS SQL Server (version?) doesn't
complain, it'd be best to boil each case down to a minimal reproducible
test case that can be analyzed in isolation.

PostgreSQL's isolationtester tool, in src/test/isolation, can be handy
for automating this kind of conflict, and provides some useful examples
of cases that are detected.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High rate of transaction failure with the Serializable Isolation Level

2014-07-23 Thread Craig Ringer
On 07/24/2014 09:18 AM, Reza Taheri wrote:
> Does PGSQL raise locks to page level when we run with
> SQL_TXN_SERIALIZABLE? Are there any knobs I can play with to alleviate
> this?  FWIW, the same transactions on MS SQL Server see almost no conflicts.
> 

Also, in the documentation
(http://www.postgresql.org/docs/current/static/transaction-iso.html):

> When the system is forced to combine multiple page-level predicate locks into 
> a single relation-level predicate lock because the predicate lock table is 
> short of memory, an increase in the rate of serialization failures may occur. 
> You can avoid this by increasing max_pred_locks_per_transaction.

... so I suggest experimenting with higher
max_pred_locks_per_transaction values.

http://www.postgresql.org/docs/9.1/static/runtime-config-locks.html#GUC-MAX-PRED-LOCKS-PER-TRANSACTION

... though that should only really affect object level locks (tables,
etc) according to the docs. I'd need to dig further to determine how to
reduce or eliminate lock combining of row-level to page-level and
page-level to object-level locks.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance