subject:"\[PERFORM\] Performance"

Re: [PERFORM] Performance loss upgrading from 9.3 to 9.6

2017-11-06 Thread Justin Pryzby

On 11/6/17, 9:21 AM, "Justin Pryzby"  wrote:
> see if statistics improve:
> SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, 
> tablename, attname, n_distinct, array_length(most_common_vals,1) n_mcv,
> FROM pg_stats WHERE attname~'customers_customer' AND tablename='id' GROUP 
> BY 1,2,3,4,5 ORDER BY 1

On Mon, Nov 06, 2017 at 09:12:01PM +, Adam Torres wrote:
> I changed the statistics on av.customer_id as suggested and the number
> returned by pg_stats went from 202,333 to 904,097.

Do you mean n_distinct ?  It' be useful to see that query on pg_stats.  Also I
don't know that we've seen \d output for the tables (or at least the joined
columns) or the full query ?

> There are 11.2 million distinct customer_ids on the 14.8 million vehicle
> records.

If there's so many distinct ids, updating stats won't help the rowcount
estimate (and could even hurt) - it can only store 1 most-common-values.

Are there as many distinct values for cc.id ?

I would try to reproduce the rowcount problem with a minimal query:
explain analyze SELECT FROM av JOIN cc ON av.customer_id=cc.id; --WHERE 
cc.id<99;
Maybe the rows estimate is okay for some values and not for others, so maybe
you need to try various WHERE (with JOIN an additional tables if need be...but
without reimplementing the whole query).

I just noticed there are two conditions on dealer_id, one from table av and one
from table cc_1.  It seems likely those are co-related/non-independent
conditions..but postgres probably doesn't know that (unless you used PG96 FK
logic, or PG10 multi-variable stats). 

As a test, you could try dropping one of those conditions, or maybe a hacky
change like ROW(av.dealer_id, cc_1.dealer_id)=ROW('EC79', 'EC79'),
which postgres estimates as no more selective than a single equality test.  BTW
this is all from src/backend/utils/adt/selfuncs.c.

Justin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance loss upgrading from 9.3 to 9.6

2017-11-06 Thread Adam Brusselback

> It has now been decided to try upgrading to 9.4 as that is the minimum to 
> support Django 1.11 (which we are trying to upgrade a backend service to).  
> The hope is whatever feature we have not configured properly in 9.6 is not 
> there in 9.4.
It's entirely possible whatever is causing your performance issue is
caused by the migration, rather than anything inherently different in
9.6.  The best test for that is setting another 9.3 server up,
restoring a backup, and testing there.  If that is very different than
what you are getting on 9.6 then it's something which changed in
Postgres, if not it's just bad stats.

I do think that it's probably better to fix your query rather than
choosing to upgrade to 9.4 rather than 9.6.  You have a crazy amount
of your query time spent in a single node.  That plan is not good.  If
that's the only query giving you trouble, work on optimizing it.

Just my $0.02

-Adam


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance loss upgrading from 9.3 to 9.6

2017-11-06 Thread Adam Torres

Justin,
Thanks for the reply.

I changed the statistics on av.customer_id as suggested and the number returned 
by pg_stats went from 202,333 to 904,097.  There are 11.2 million distinct 
customer_ids on the 14.8 million vehicle records.  Rerunning the query showed 
no significant change in time (624 seconds vs. 639 seconds) - plan is at 
https://explain.depesz.com/s/e2fo.

I went through the query looking for fields used in joins and conditions and 
applied the same steps to 7 other fields over 4 of the tables.  Most n_distinct 
values did not change much but two did change from 1.# million to -1 wrote:

On Mon, Nov 06, 2017 at 01:18:00PM +, Adam Torres wrote:
> Good morning all,
> 
> We have a problem with performance after upgrading from 9.3 to 9.6 where 
certain queries take 9 times longer to run.  On our initial attempt to upgrade, 
we noticed the system as a whole was taking longer to run through normal daily 
processes.  The query with the largest run time was picked to act as a 
measuring stick.

> https://explain.depesz.com/s/z71u
> Planning time: 8.218 ms
> Execution time: 639319.525 ms
> 
> Same query as run on 9.3
> https://explain.depesz.com/s/gjN3
> Total runtime: 272897.150 ms

Actually it looks to me like both query plans are poor..

..because of this:
| Hash Join (cost=85,086.25..170,080.80 ROWS=40 width=115) (actual 
time=32.673..84.427 ROWS=13,390 loops=1)
|Hash Cond: (av.customer_id = cc_1.id)

If there are a large number of distinct customer_ids (maybe with nearly 
equal
frequencies), it might help to
ALTER TABLE av ALTER customer_id SET STATISTICS 400
..same for cc_1.id.  And re-analyze those tables (are they large??).

see if statistics improve:
SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, 
tablename, attname, n_distinct, array_length(most_common_vals,1) n_mcv,
FROM pg_stats WHERE attname~'customers_customer' AND tablename='id' GROUP 
BY 1,2,3,4,5 ORDER BY 1

Goal is to get at least an accurate value for n_distinct (but preferably 
also
storing the most frequent IDs).  I wouldn't bother re-running the query 
unless
you find that increasing stats target causes the plan to change.

Justin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance loss upgrading from 9.3 to 9.6

2017-11-06 Thread Justin Pryzby

On Mon, Nov 06, 2017 at 01:18:00PM +, Adam Torres wrote:
> Good morning all,
> 
> We have a problem with performance after upgrading from 9.3 to 9.6 where 
> certain queries take 9 times longer to run.  On our initial attempt to 
> upgrade, we noticed the system as a whole was taking longer to run through 
> normal daily processes.  The query with the largest run time was picked to 
> act as a measuring stick.

> https://explain.depesz.com/s/z71u
> Planning time: 8.218 ms
> Execution time: 639319.525 ms
> 
> Same query as run on 9.3
> https://explain.depesz.com/s/gjN3
> Total runtime: 272897.150 ms

Actually it looks to me like both query plans are poor..

..because of this:
| Hash Join (cost=85,086.25..170,080.80 ROWS=40 width=115) (actual 
time=32.673..84.427 ROWS=13,390 loops=1)
|Hash Cond: (av.customer_id = cc_1.id)

If there are a large number of distinct customer_ids (maybe with nearly equal
frequencies), it might help to
ALTER TABLE av ALTER customer_id SET STATISTICS 400
..same for cc_1.id.  And re-analyze those tables (are they large??).

see if statistics improve:
SELECT (SELECT sum(x) FROM unnest(most_common_freqs) x) frac_MCV, tablename, 
attname, n_distinct, array_length(most_common_vals,1) n_mcv,
FROM pg_stats WHERE attname~'customers_customer' AND tablename='id' GROUP BY 
1,2,3,4,5 ORDER BY 1

Goal is to get at least an accurate value for n_distinct (but preferably also
storing the most frequent IDs).  I wouldn't bother re-running the query unless
you find that increasing stats target causes the plan to change.

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Performance loss upgrading from 9.3 to 9.6

2017-11-06 Thread Adam Torres

Good morning all,

We have a problem with performance after upgrading from 9.3 to 9.6 where 
certain queries take 9 times longer to run.  On our initial attempt to upgrade, 
we noticed the system as a whole was taking longer to run through normal daily 
processes.  The query with the largest run time was picked to act as a 
measuring stick.

We are using the staging server to test the upgrade.  It has two 1.3TB 
partitions, each of which holds a copy of the near 1TB database.  The active 
staging 9.3 database is on one of the partitions while a copy was made onto the 
other.  A second instance of 9.3 was set up to verify the copy was successful 
and some performace tests were done, then upgraded to 9.6 via pg_upgrade.  The 
same performance tests were done and this is where the 9 time slower number 
comes from.

OS Ubuntu 14.04.4
PG9.3 is 9.3.19-1.pgdg14.04+1 from http://apt.postgresql.org/pub/repos/apt/
PG9.6 is 9.6.5-1.pgdg14.04+2 from same.
The server has 24 cores and 64GB RAM.  Data drives are spinning platter in 
raid10 - not ssd.

upgrade steps:
* Ran rsync (excluding the xlog folder) from the active 9.3 partition to the 
unused partition while 9.3 was still running.
* Once initial rsync completed, shut down 9.3 and reran the rsync command 
without the exclude.
* Once second rsync completed, restarted 9.3 and left it alone.
* Copied the active 9.3 configuration files into a new /etc/postgresql/9.3/ 
folder so we could test before/after numbers.  Changed the config to point to 
the appropriate data/log/etc folders/files/port.
* Started second 9.3 instance.
* Altered the few foreign servers to account for the second instance's port.
* Ran some arbitrary queries to check performance.
* Installed 9.6 via apt
* Created a 9.6 instance with its data directory on the same partition as the 
newely cloned 9.3 instance.
* Ran pg_upgrade with --link option (includes running --check and 
compiling/installing postgis)
* Copied the config from 9.3 and made minimal changes.  Renamed log files, 
changed folders, removed checkpoint_segments is about it.
* Started the 9.6 instance and was able to do arbitrary queries.
* Ran the upgrade-suggested vacuumdb command on all databases to generate 
statistics

At that point, the database should be ready to use.  Running the same set of 
arbitrary queries that were run on 9.3 yielded much worse performance, though.
The data is broken out by dealers containing customers owning vehicles.  The 
arbitrary queries pull data from 8 tables (really 6 large[millions of records] 
and 2 smaller[hundreds] tables) to populate a new table via "select [fields] 
into [new table]".  Twenty different dealers were used for testing meaning 
twenty of these queries.  The system which runs these queries has 12 workers 
meaning up to 12 of these queries can be running concurrently.  While the 
workers were offline, all twenty were queued up and then the workers activated. 
 For each test, the order of the dealers was the same.  That order was a mix of 
small/large dealers mixed - not exactly high,low,high; more like a few large 
then a few small.  The run time for 9.3 was 21m9s and 9.6 was 3h18m25s.

Each test was done while the other database was idle - htop showed little to no 
activity before each test started.
perf reports (converted to flamegraph via 
https://github.com/brendangregg/FlameGraph) for the 9.6 test show about a 1/3 
of the processor usage similar to that of graph for 9.3.  The other 2/3 is 
still within the postgres process but starts with '[unknown]' and has 
'connect', 'socket', and 'close' as the next call in the chain.  I have not 
been able to figure out what postgres is doing to make these calls.

Changing the configuration based on pgtune (command line version 0.9.3-2) did 
not make much change.  The online pgtune at http://pgtune.leopard.in.ua/ had 
just a couple differences in settings I have yet to test.

Main question is what the connect/socket/close calls in the perf output are and 
how to make them go away as they appear to be what is using up the added time.  
I'm hoping there is just a setting I've missed.

Query plan for a small dealer on 9.6 run without anything else running on the 
server
https://explain.depesz.com/s/z71u
Planning time: 8.218 ms
Execution time: 639319.525 ms

Same query as run on 9.3
https://explain.depesz.com/s/gjN3
Total runtime: 272897.150 ms


--
Thanks,
Adam Torres

Re: [PERFORM] performance drop after upgrade (9.6 > 10)

2017-10-26 Thread Justin Pryzby

On Tue, Oct 24, 2017 at 04:15:59PM +0200, johannes graën wrote:
> Hi Pavel, *,
> 
> you were right with ANALYZing the DB first. However, even after doing
> so, I frequently see Seq Scans where an index was used before. This
> usually cooccurs with parallelization and looked different before
> upgrading to 10. I can provide an example for 10 [1], but I cannot
> generate a query plan for 9.6 anymore.
> 
> Any ideas what makes the new version more seqscanny?

Is it because max_parallel_workers_per_gather now defaults to 2 ?

BTW, I would tentatively expect a change in default to be documented in the
release notes but can't see that it's.
77cd477c4ba885cfa1ba67beaa82e06f2e182b85

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance drop after upgrade (9.6 > 10)

2017-10-24 Thread Alvaro Herrera

johannes graën wrote:
> Hi Pavel, *,
> 
> you were right with ANALYZing the DB first. However, even after doing
> so, I frequently see Seq Scans where an index was used before. This
> usually cooccurs with parallelization and looked different before
> upgrading to 10. I can provide an example for 10 [1], but I cannot
> generate a query plan for 9.6 anymore.

https://wiki.postgresql.org/wiki/Slow_Query_Questions

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance drop after upgrade (9.6 > 10)

2017-10-24 Thread Johannes Graën

On 2017-10-24 17:18, Justin Pryzby wrote:
> You could (re)install PG96 alongside PG10 and run a copy of the DB (even from
> your homedir, or on a difference server) and pg_dump |pg_restore the relevant
> tables (just be sure to specify the alternate host/port/user/etc as needed for
> the restore invocation).

I considered that but it is far too expensive just for getting the old
query plan. The database is more than 1 TB big and replaying it from a
dump to another server took us several days, primarily due to the heavy
use of materialized views that are calculated over all rows of some
large tables. As long as there is no safe pg_downgrade --link I'd rather
keep trying to improve query performance on the current version.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance drop after upgrade (9.6 > 10)

2017-10-24 Thread Justin Pryzby

On Tue, Oct 24, 2017 at 04:15:59PM +0200, johannes graën wrote:
> upgrading to 10. I can provide an example for 10 [1], but I cannot
> generate a query plan for 9.6 anymore.

You could (re)install PG96 alongside PG10 and run a copy of the DB (even from
your homedir, or on a difference server) and pg_dump |pg_restore the relevant
tables (just be sure to specify the alternate host/port/user/etc as needed for
the restore invocation).

Justin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance drop after upgrade (9.6 > 10)

2017-10-24 Thread johannes graën

Hi Pavel, *,

you were right with ANALYZing the DB first. However, even after doing
so, I frequently see Seq Scans where an index was used before. This
usually cooccurs with parallelization and looked different before
upgrading to 10. I can provide an example for 10 [1], but I cannot
generate a query plan for 9.6 anymore.

Any ideas what makes the new version more seqscanny?


[1] https://explain.depesz.com/s/gXD3


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance drop after upgrade (9.6 > 10)

2017-10-11 Thread johannes graën

On Wed, Oct 11, 2017 at 1:11 PM, Pavel Stehule  wrote:
> have you fresh statistics? After upgrade is necessary to run ANALYZE command

Yes, that was missing indeed. I did ANALYZE but apparently on all
databases but this one. I could have guessed that
1,098,956,679,131,935,754,413,282,631,696,252,928 is not a reasonable
cost value.

Thanks, Pavel.

Best
  Johannes


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance drop after upgrade (9.6 > 10)

2017-10-11 Thread Pavel Stehule

2017-10-11 13:06 GMT+02:00 johannes graën :

> Hi,
>
> I wrote a query that joins several tables usually returning less than
> 1000 rows, groups them and generates a JSON object of the result. In
> 9.6 is was a question of milliseconds for that query to return the
> requested data. Now, after upgrading to 10, the query never returns -
> at least it hasn't returned in the last hour.
>
> To see what happens, I requested the query plan [1]. It looks complex
> and shows a lot of parallelization. I don't have the query plan from
> 9.6, but I remember it being considerably simpler.
>
> Can anyone have a guess what altered the performance here so
> dramatically? Is there a way to disable new parallelization features
> just for this query to see if it makes any difference?
>
>

have you fresh statistics? After upgrade is necessary to run ANALYZE command

Regards

Pavel


Best
>   Johannes
>
>
> [1] https://explain.depesz.com/s/xsPP
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

[PERFORM] performance drop after upgrade (9.6 > 10)

2017-10-11 Thread johannes graën

Hi,

I wrote a query that joins several tables usually returning less than
1000 rows, groups them and generates a JSON object of the result. In
9.6 is was a question of milliseconds for that query to return the
requested data. Now, after upgrading to 10, the query never returns -
at least it hasn't returned in the last hour.

To see what happens, I requested the query plan [1]. It looks complex
and shows a lot of parallelization. I don't have the query plan from
9.6, but I remember it being considerably simpler.

Can anyone have a guess what altered the performance here so
dramatically? Is there a way to disable new parallelization features
just for this query to see if it makes any difference?

Best
  Johannes


[1] https://explain.depesz.com/s/xsPP


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance problem on big tables

2017-08-28 Thread Mariel Cherkassky

I have the newest version :
select oracle_diag();
 oracle_diag
-
 oracle_fdw 1.5.0, PostgreSQL 9.6.3, Oracle client 11.2.0.4.0,
ORACLE_HOME=/PostgreSQL/9.6/tools/instantclient_11_2/
(1 row)


Is there a prefetch also for local tables ? I mean If I run with a cursor
over results of a select query, mybe setting the prefetch for a local table
might also improve performance ?

2017-08-28 8:51 GMT+03:00 Claudio Freire :

> On Sun, Aug 27, 2017 at 1:34 PM, Mariel Cherkassky
>  wrote:
> > Hi, yes indeed I'm using laurenz`s oracle_fdw extension. I tried to run
> it
> > but I'm getting error
> >
> > dbch=# ALTER FOREIGN TABLE tc_sub_rate_ver_prod OPTIONS ( SET prefetch
> 10240
> > );
> > ERROR:  syntax error at or near "10240"
> > LINE 1: ...N TABLE tc_sub_rate_ver_prod OPTIONS ( SET prefetch 10240 );
> >
> >
> > dbch=#  alter foreign table tc_sub_rate_ver_prod OPTIONS (SET prefetch
> > '10240');
> > ERROR:  option "prefetch" not found
>
> Oh, sorry, I hadn't seen this until I hit send.
>
> Unless the documentation is inaccurate or you're using a really old
> version (from the changelog that option is from 2016), that should
> work.
>
> I don't have enough experience with oracle_fdw to help there, most of
> my dealings have been with postgres_fdw.
>

Re: [PERFORM] performance problem on big tables

2017-08-27 Thread Claudio Freire

On Sun, Aug 27, 2017 at 1:34 PM, Mariel Cherkassky
 wrote:
> Hi, yes indeed I'm using laurenz`s oracle_fdw extension. I tried to run it
> but I'm getting error
>
> dbch=# ALTER FOREIGN TABLE tc_sub_rate_ver_prod OPTIONS ( SET prefetch 10240
> );
> ERROR:  syntax error at or near "10240"
> LINE 1: ...N TABLE tc_sub_rate_ver_prod OPTIONS ( SET prefetch 10240 );
>
>
> dbch=#  alter foreign table tc_sub_rate_ver_prod OPTIONS (SET prefetch
> '10240');
> ERROR:  option "prefetch" not found

Oh, sorry, I hadn't seen this until I hit send.

Unless the documentation is inaccurate or you're using a really old
version (from the changelog that option is from 2016), that should
work.

I don't have enough experience with oracle_fdw to help there, most of
my dealings have been with postgres_fdw.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance problem on big tables

2017-08-27 Thread Claudio Freire

On Sun, Aug 27, 2017 at 1:34 PM, Mariel Cherkassky
 wrote:
> Hi, yes indeed I'm using laurenz`s oracle_fdw extension. I tried to run it
> but I'm getting error
>
> dbch=# ALTER FOREIGN TABLE tc_sub_rate_ver_prod OPTIONS ( SET prefetch 10240
> );
> ERROR:  syntax error at or near "10240"
> LINE 1: ...N TABLE tc_sub_rate_ver_prod OPTIONS ( SET prefetch 10240 );

Yeah, might need to put the 10240 in quotes.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance problem on big tables

2017-08-27 Thread Mariel Cherkassky

Hi, yes indeed I'm using laurenz`s oracle_fdw extension. I tried to run it
but I'm getting error

dbch=# ALTER FOREIGN TABLE tc_sub_rate_ver_prod OPTIONS ( SET prefetch
10240 );
ERROR:  syntax error at or near "10240"
LINE 1: ...N TABLE tc_sub_rate_ver_prod OPTIONS ( SET prefetch 10240 );


dbch=#  alter foreign table tc_sub_rate_ver_prod OPTIONS (SET prefetch
'10240');
ERROR:  option "prefetch" not found




2017-08-24 19:14 GMT+03:00 Claudio Freire :

> On Thu, Aug 24, 2017 at 4:51 AM, Mariel Cherkassky
>  wrote:
> > Hi Claudio, how can I do that ? Can you explain me what is this option ?
> >
> > 2017-08-24 2:15 GMT+03:00 Claudio Freire :
> >>
> >> On Mon, Aug 21, 2017 at 5:00 AM, Mariel Cherkassky
> >>  wrote:
> >> > To summarize, I still have performance problems. My current situation
> :
> >> >
> >> > I'm trying to copy the data of many tables in the oracle database into
> >> > my
> >> > postgresql tables. I'm doing so by running insert into
> >> > local_postgresql_temp
> >> > select * from remote_oracle_table. The performance of this operation
> are
> >> > very slow and I tried to check the reason for that and mybe choose a
> >> > different alternative.
> >> >
> >> > 1)First method - Insert into local_postgresql_table select * from
> >> > remote_oracle_table this generated total disk write of 7 M/s and
> actual
> >> > disk
> >> > write of 4 M/s(iotop). For 32G table it took me 2 hours and 30
> minutes.
> >> >
> >> > 2)second method - copy (select * from oracle_remote_table) to
> /tmp/dump
> >> > generates total disk write of 4 M/s and actuval disk write of 100 K/s.
> >> > The
> >> > copy utility suppose to be very fast but it seems very slow.
> >>
> >> Have you tried increasing the prefetch option in the remote table?
> >>
> >> If you left it in its default, latency could be hurting your ability
> >> to saturate the network.
> >
> >
>
> Please don't top-post.
>
> I'm assuming you're using this: http://laurenz.github.io/oracle_fdw/
>
> If you check the docs, you'll see this:
> https://github.com/laurenz/oracle_fdw#foreign-table-options
>
> So I'm guessing you could:
>
> ALTER FOREIGN TABLE remote_table OPTIONS ( SET prefetch 10240 );
>

Re: [PERFORM] performance problem on big tables

2017-08-24 Thread Claudio Freire

On Thu, Aug 24, 2017 at 4:51 AM, Mariel Cherkassky
 wrote:
> Hi Claudio, how can I do that ? Can you explain me what is this option ?
>
> 2017-08-24 2:15 GMT+03:00 Claudio Freire :
>>
>> On Mon, Aug 21, 2017 at 5:00 AM, Mariel Cherkassky
>>  wrote:
>> > To summarize, I still have performance problems. My current situation :
>> >
>> > I'm trying to copy the data of many tables in the oracle database into
>> > my
>> > postgresql tables. I'm doing so by running insert into
>> > local_postgresql_temp
>> > select * from remote_oracle_table. The performance of this operation are
>> > very slow and I tried to check the reason for that and mybe choose a
>> > different alternative.
>> >
>> > 1)First method - Insert into local_postgresql_table select * from
>> > remote_oracle_table this generated total disk write of 7 M/s and actual
>> > disk
>> > write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.
>> >
>> > 2)second method - copy (select * from oracle_remote_table) to /tmp/dump
>> > generates total disk write of 4 M/s and actuval disk write of 100 K/s.
>> > The
>> > copy utility suppose to be very fast but it seems very slow.
>>
>> Have you tried increasing the prefetch option in the remote table?
>>
>> If you left it in its default, latency could be hurting your ability
>> to saturate the network.
>
>

Please don't top-post.

I'm assuming you're using this: http://laurenz.github.io/oracle_fdw/

If you check the docs, you'll see this:
https://github.com/laurenz/oracle_fdw#foreign-table-options

So I'm guessing you could:

ALTER FOREIGN TABLE remote_table OPTIONS ( SET prefetch 10240 );


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance problem on big tables

2017-08-24 Thread Mariel Cherkassky

Hi Claudio, how can I do that ? Can you explain me what is this option ?

2017-08-24 2:15 GMT+03:00 Claudio Freire :

> On Mon, Aug 21, 2017 at 5:00 AM, Mariel Cherkassky
>  wrote:
> > To summarize, I still have performance problems. My current situation :
> >
> > I'm trying to copy the data of many tables in the oracle database into my
> > postgresql tables. I'm doing so by running insert into
> local_postgresql_temp
> > select * from remote_oracle_table. The performance of this operation are
> > very slow and I tried to check the reason for that and mybe choose a
> > different alternative.
> >
> > 1)First method - Insert into local_postgresql_table select * from
> > remote_oracle_table this generated total disk write of 7 M/s and actual
> disk
> > write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.
> >
> > 2)second method - copy (select * from oracle_remote_table) to /tmp/dump
> > generates total disk write of 4 M/s and actuval disk write of 100 K/s.
> The
> > copy utility suppose to be very fast but it seems very slow.
>
> Have you tried increasing the prefetch option in the remote table?
>
> If you left it in its default, latency could be hurting your ability
> to saturate the network.
>

Re: [PERFORM] performance problem on big tables

2017-08-23 Thread Claudio Freire

On Mon, Aug 21, 2017 at 5:00 AM, Mariel Cherkassky
 wrote:
> To summarize, I still have performance problems. My current situation :
>
> I'm trying to copy the data of many tables in the oracle database into my
> postgresql tables. I'm doing so by running insert into local_postgresql_temp
> select * from remote_oracle_table. The performance of this operation are
> very slow and I tried to check the reason for that and mybe choose a
> different alternative.
>
> 1)First method - Insert into local_postgresql_table select * from
> remote_oracle_table this generated total disk write of 7 M/s and actual disk
> write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.
>
> 2)second method - copy (select * from oracle_remote_table) to /tmp/dump
> generates total disk write of 4 M/s and actuval disk write of 100 K/s. The
> copy utility suppose to be very fast but it seems very slow.

Have you tried increasing the prefetch option in the remote table?

If you left it in its default, latency could be hurting your ability
to saturate the network.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Jeff Janes

On Sat, Aug 19, 2017 at 10:37 AM, anand086  wrote:

Your email is very hard to read, the formatting and line wrapping is
heavily mangled.  You might want to attach the plans as files attachments
instead of or in addition to putting the in the body.



>  -> Index Only Scan using ui_nkey_test_tab on test_tab test_tab_1
> (cost=0.56..2.03 rows=1 width=8) |
>
>  Index Cond: ((import_num = '4520460'::numeric) AND (login IS NOT 
> NULL))
>
>
It looks like the statistics for your table are desperately out of date, as
a later query showed there are 762599 rows (unless login is null for all of
them) but the above is estimating there is only one.   When was the table
last analyzed?

Cheers,

Jeff

On Sat, Aug 19, 2017 at 10:37 AM, anand086  wrote:

> I am a Postgres Newbie and trying to learn :) We have a scenario wherein,
> one of the SQL with different input value for import_num showing different
> execution plan. As an example, with import_num = '4520440' the execution
> plan shows Nested Loop and is taking ~12secs. With import_num = '4520460'
> execution plan showed using "Materialize" and never completed. After I set
> enable_material to off, the execution plan is changed using Hash Semi Join
> and completes in less than 3 secs. SELECT count(*) FROM test_tab WHERE
> login IN (SELECT DISTINCT login FROM test_tab WHERE import_num = '4520440'
> AND login IS NOT NULL EXCEPT SELECT DISTINCT login FROM test_tab WHERE
> import_num = '0' AND login IS NOT NULL) AND import_num = '4520440';
> ++ | count | ++ | 746982 | ++ (1 row) Time:
> 12054.274 ms
>
> +---+
> |
> QUERY PLAN
>  |
> +---+
> | Aggregate  (cost=351405.08..351405.09 rows=1 width=8)   
>   
> |
> |   ->  Nested Loop  (cost=349846.23..350366.17 rows=415562 width=0)  
>   
> |
> | ->  HashAggregate  (cost=349845.67..349847.67 rows=200 width=96)
>   
> |
> |   Group Key: ("ANY_subquery".login)::text   
>   
> |
> |   ->  Subquery Scan on "ANY_subquery"  
> (cost=340828.23..348557.47 rows=515282 width=96)  
>  |
> | ->  SetOp Except  (cost=340828.23..343404.65 
> rows=515282 width=100)
>|
> |   ->  Sort  (cost=340828.23..342116.44 rows=515283 
> width=100)
>  |
> | Sort Key: "*SELECT* 1".login
>   
> |
> | ->  Append  (cost=0.56..275836.74 
> rows=515283 width=100)
>   |
> |   ->  Subquery Scan on "*SELECT* 1"  
> (cost=0.56..275834.70 rows=515282 width=12)   
>|
> | ->  Unique  
> (cost=0.56..270681.88 rows=515282 width=8)
> |
> |   ->  Index Only Scan using 
> ui_nkey_test_tab on test_tab test_tab_1  (cost=0.56..268604.07 rows=831125 
> width=8) |
> | Index Cond: 
> ((import_num = '4520440'::numeric) AND (login IS NOT NULL))   
> |
> |   ->  Subquery Scan on "*SELECT* 2"  
> (cost=0.56..2.04 rows=1 width=12) 
>|
> | ->  Unique  (cost=0.56..2.03 
> rows=1 width=8)   
>|
> |   ->  Index Only Scan using 
> ui_nkey_test_tab on test_tab test_tab_2  (cost=0.56..2.03 rows=1 width=8) 
>   |
> | Index Cond: 
> ((import_num = '0'::numeric) AND (login IS NOT NULL)) 
> |
> | ->  Index Only Scan using

Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Carlos Augusto Machado

I think you query is a bit confusing and have many subqueries, so I tried
to simplify

If you cant´t have more import_num = 0 to the same login, try this

SELECT count(*)
FROM test_tab tab1
LEFT JOIN test_tab tab2
ON tab1.login = tab2.login AND tab2.import_num = '0'
WHERE
   tab2.login IS NULL AND
   import_num = '4520440'

otherwise try this

SELECT count(*)
FROM test_tab tab1
LEFT JOIN (
   SELECT DISTINCT login FROM test_tab WHERE import_num = '0'
) tab2
ON tab1.login = tab2.login
WHERE
   tab2.login IS NULL AND
   import_num = '4520440'


Em seg, 21 de ago de 2017 às 15:47, Carlos Augusto Machado <
caugus...@gmail.com> escreveu:

>
> Do you have an index on login column ?
>
> If not, try creating an index and taking off those DISTICTs.
>
> Em seg, 21 de ago de 2017 às 15:33, Justin Pryzby 
> escreveu:
>
>> On Sat, Aug 19, 2017 at 10:37:56AM -0700, anand086 wrote:
>> >
>> +---+|
>> > QUERY PLAN
>> >
>> |+---+|
>> > Aggregate  (cost=351405.08..351405.09 rows=1 width=8)
>>
>> Would you send explain ANALYZE and not just explain ?
>>
>> Justin
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
>> )
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>

Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Carlos Augusto Machado

Do you have an index on login column ?

If not, try creating an index and taking off those DISTICTs.

Em seg, 21 de ago de 2017 às 15:33, Justin Pryzby 
escreveu:

> On Sat, Aug 19, 2017 at 10:37:56AM -0700, anand086 wrote:
> >
> +---+|
> > QUERY PLAN
> >
> |+---+|
> > Aggregate  (cost=351405.08..351405.09 rows=1 width=8)
>
> Would you send explain ANALYZE and not just explain ?
>
> Justin
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread Justin Pryzby

On Sat, Aug 19, 2017 at 10:37:56AM -0700, anand086 wrote:
> +---+|
>
> QUERY PLAN
> 
> |+---+|
> Aggregate  (cost=351405.08..351405.09 rows=1 width=8) 
>

Would you send explain ANALYZE and not just explain ?

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance Issue -- "Materialize"

2017-08-21 Thread anand086

Any thoughts on this? 



--
View this message in context: 
http://www.postgresql-archive.org/Performance-Issue-Materialize-tp5979128p5979481.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance problem on big tables

2017-08-21 Thread Michael DNA

If your procedures to get the data is part is a query predicate, then you can 
still use ora2pg

Sent from my iPhone

> On Aug 21, 2017, at 10:35 AM, Igor Neyman <iney...@perceptron.com> wrote:
> 
>  
> From: pgsql-performance-ow...@postgresql.org 
> [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Mariel Cherkassky
> Sent: Monday, August 21, 2017 10:20 AM
> To: MichaelDBA <michael...@sqlexec.com>
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] performance problem on big tables
>  
> I had a system that consist from many objects(procedures,functions..) on an 
> oracle database. We decided to integrate that system to postgresql. That 
> system coppied alot of big tables from a different read only oracle database 
> and preformed on it alot of queries to produce reports. The part of getting 
> the data is part of some procedures, I cant change it so freely. I'm 
> searching a way to improve the perfomance of the database because I'm sure 
> that I didnt conifgure something well. Moreover, When I run complicted 
> queries (joint between 4 big tables and filtering) it takes alot of time and 
> I see that the server is cacheing all my ram memory.
>  
>  
> Probably your joins are done on Postgres side.
>  
> m.b. instead of Postgres pulling data from Oracle, you should try pushing 
> data from Oracle to Postgres using Oracle’s Heterogeneous Services and 
> Postgres ODBC driver. In this case you do your joins and filtering on Oracles 
> side and just push the result set to Postgres.
> That’s how I did migration from Oracle to Postgres.
>  
> Regards,
> Igor Neyman

Re: [PERFORM] performance problem on big tables

2017-08-21 Thread Mariel Cherkassky

I already finished migrating the system from oracle to postgresql. Right
now, I'm trying to improve its performance - Im bringing data from another
read only database that is updaded every minute. I cant push data from the
oracle side to the postgresql side because the oracle database is read only.

2017-08-21 17:35 GMT+03:00 Igor Neyman <iney...@perceptron.com>:

>
>
> *From:* pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] *On Behalf Of *Mariel Cherkassky
> *Sent:* Monday, August 21, 2017 10:20 AM
> *To:* MichaelDBA <michael...@sqlexec.com>
> *Cc:* pgsql-performance@postgresql.org
> *Subject:* Re: [PERFORM] performance problem on big tables
>
>
>
> I had a system that consist from many objects(procedures,functions..) on
> an oracle database. We decided to integrate that system to postgresql. That
> system coppied alot of big tables from a different read only oracle
> database and preformed on it alot of queries to produce reports. The part
> of getting the data is part of some procedures, I cant change it so freely.
> I'm searching a way to improve the perfomance of the database because I'm
> sure that I didnt conifgure something well. Moreover, When I run complicted
> queries (joint between 4 big tables and filtering) it takes alot of time
> and I see that the server is cacheing all my ram memory.
>
>
>
>
>
> Probably your joins are done on Postgres side.
>
>
>
> m.b. instead of Postgres pulling data from Oracle, you should try pushing
> data from Oracle to Postgres using Oracle’s Heterogeneous Services and
> Postgres ODBC driver. In this case you do your joins and filtering on
> Oracles side and just push the result set to Postgres.
>
> That’s how I did migration from Oracle to Postgres.
>
>
>
> Regards,
>
> Igor Neyman
>

Re: [PERFORM] performance problem on big tables

2017-08-21 Thread Igor Neyman

From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Mariel Cherkassky
Sent: Monday, August 21, 2017 10:20 AM
To: MichaelDBA <michael...@sqlexec.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] performance problem on big tables

I had a system that consist from many objects(procedures,functions..) on an 
oracle database. We decided to integrate that system to postgresql. That system 
coppied alot of big tables from a different read only oracle database and 
preformed on it alot of queries to produce reports. The part of getting the 
data is part of some procedures, I cant change it so freely. I'm searching a 
way to improve the perfomance of the database because I'm sure that I didnt 
conifgure something well. Moreover, When I run complicted queries (joint 
between 4 big tables and filtering) it takes alot of time and I see that the 
server is cacheing all my ram memory.

Probably your joins are done on Postgres side.

m.b. instead of Postgres pulling data from Oracle, you should try pushing data 
from Oracle to Postgres using Oracle’s Heterogeneous Services and Postgres ODBC 
driver. In this case you do your joins and filtering on Oracles side and just 
push the result set to Postgres.
That’s how I did migration from Oracle to Postgres.

Regards,
Igor Neyman

Re: [PERFORM] performance problem on big tables

2017-08-21 Thread Mariel Cherkassky

I had a system that consist from many objects(procedures,functions..) on an
oracle database. We decided to integrate that system to postgresql. That
system coppied alot of big tables from a different read only oracle
database and preformed on it alot of queries to produce reports. The part
of getting the data is part of some procedures, I cant change it so freely.
I'm searching a way to improve the perfomance of the database because I'm
sure that I didnt conifgure something well. Moreover, When I run complicted
queries (joint between 4 big tables and filtering) it takes alot of time
and I see that the server is cacheing all my ram memory.

2017-08-21 16:55 GMT+03:00 MichaelDBA :

> Maybe I missed it in this continuous thread activity, but have you tried
> '''ora2pg"?  You can export from Oracle and import to Postgres in parallel
> jobs.  The import commands use the efficient COPY command by default
> (unless you override it in the ora2pg configuration file).  You can do the
> export and subsequent import in memory, but I would suggest the actual file
> export and import so you can take advantage of the parallel feature.
>
> Regards,
> Michael Vitale
>
> Mariel Cherkassky 
> Monday, August 21, 2017 8:22 AM
> I`m searching for a way to improve the current performance, I'm not
> interesting in using a different tool or writing something new because I'm
> trying to migrate a system on oracle database to a postgresql database.
>
>
> Daniel Blanch Bataller 
> Monday, August 21, 2017 4:37 AM
>
> El 21 ago 2017, a las 10:00, Mariel Cherkassky <
> mariel.cherkas...@gmail.com> escribió:
>
> To summarize, I still have performance problems. My current situation :
>
> I'm trying to copy the data of many tables in the oracle database into my
> postgresql tables. I'm doing so by running insert into
> local_postgresql_temp select * from remote_oracle_table. The performance
> of this operation are very slow and I tried to check the reason for that
> and mybe choose a different alternative.
>
> 1)First method - Insert into local_postgresql_table select * from
> remote_oracle_table this generated total disk write of 7 M/s and actual
> disk write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.
>
> 2)second method - copy (select * from oracle_remote_table) to /tmp/dump 
> generates
> total disk write of 4 M/s and actuval disk write of 100 K/s. The copy
> utility suppose to be very fast but it seems very slow.
>
>
>
> Are you using a FDW to access oracle server and then dump it using copy?
> This is going to be slow, FDW isn't fast.
>
>
> -When I run copy from the local dump, the reading is very fast 300 M/s.
>
>
> You reported it was slow before. What has changed? How much does it take
> to load the 32G table then?
>
>
> -I created a 32G file on the oracle server and used scp to copy it and it
> took me a few minutes.
>
> -The wals directory is located on a different file system. The parameters
> I assigned :
>
> min_parallel_relation_size = 200MB
> max_parallel_workers_per_gather = 5
> max_worker_processes = 8
> effective_cache_size = 12GB
> work_mem = 128MB
> maintenance_work_mem = 4GB
> shared_buffers = 2000MB
> RAM : 16G
> CPU CORES : 8
>
> HOW can I increase the writes ? How can I get the data faster from the
> oracle database to my postgresql database?
>
>
>
> Extract the table to a file in the oracle server in a format that the COPY
> utility can read, then copy it to postgres server and load it. You can even
> pipe commands and do it in a single step.
>
> This is what I meant when I said that COPY is much faster than any thing
> else. To make it even faster, if I/O is not your bottleneck, you can chop
> the table in chunks and load it in parallel as I told you before, I have
> done this many times when migrating data from oracle to postgres. ora2pg
> uses this method to migrate data from oracle to postgres too.
>
>
> 2017-08-20 14:00 GMT+03:00 Mariel Cherkassky 
> :
>
>> I realized something weird. When I`m preforming the copy utility of
>> postgresql in order to create dump from a local table in my postgresql db
>> it takes for 32G table 20 minutes. When I try to use copy for a foregin
>> table (on oracle database) It takes more than 2 hours.. During the copy
>> operation from the foreign table I dont see alot of write operations, with
>> iotop i see that its writes 3 M/s. What else I can check ?
>>
>> 2017-08-20 9:39 GMT+03:00 Mariel Cherkassky 
>> :
>>
>>> This server is dedicated to be a postgresql production database,
>>> therefore postgresql is the only thing the runs on the server. The fs that
>>> I`m using is xfs. I`ll add two different disks - one for the wals and one
>>> for the temp tablespace. Regarding the disk, what size should they be
>>> considering that the database size is about 250G. Does 16G of ram
>>> considered little ? I installed

Re: [PERFORM] performance problem on big tables

2017-08-21 Thread MichaelDBA

Maybe I missed it in this continuous thread activity, but have you tried 
'''ora2pg"?  You can export from Oracle and import to Postgres in 
parallel jobs.  The import commands use the efficient COPY command by 
default (unless you override it in the ora2pg configuration file).  You 
can do the export and subsequent import in memory, but I would suggest 
the actual file export and import so you can take advantage of the 
parallel feature.

Regards,
Michael Vitale

Mariel Cherkassky 
Monday, August 21, 2017 8:22 AM
I`m searching for a way to improve the current performance, I'm not 
interesting in using a different tool or writing something new because 
I'm trying to migrate a system on oracle database to a postgresql 
database.

Daniel Blanch Bataller 
Monday, August 21, 2017 4:37 AM

El 21 ago 2017, a las 10:00, Mariel Cherkassky 
> 
escribió:

To summarize, I still have performance problems. My current situation :

I'm trying to copy the data of many tables in the oracle database 
into my postgresql tables. I'm doing so by running |insert into 
local_postgresql_temp select * from remote_oracle_table|. The 
performance of this operation are very slow and I tried to check the 
reason for that and mybe choose a different alternative.

1)First method - |Insert into local_postgresql_table select * from 
remote_oracle_table| this generated total disk write of 7 M/s and 
actual disk write of 4 M/s(iotop). For 32G table it took me 2 hours 
and 30 minutes.

2)second method - |copy (select * from oracle_remote_table) to 
/tmp/dump| generates total disk write of 4 M/s and actuval disk write 
of 100 K/s. The copy utility suppose to be very fast but it seems 
very slow.

Are you using a FDW to access oracle server and then dump it using 
copy? This is going to be slow, FDW isn't fast.

-When I run copy from the local dump, the reading is very fast 300 M/s.

You reported it was slow before. What has changed? How much does it 
take to load the 32G table then?

-I created a 32G file on the oracle server and used scp to copy it 
and it took me a few minutes.

-The wals directory is located on a different file system. The 
parameters I assigned :

|min_parallel_relation_size=  200MB
max_parallel_workers_per_gather=  5
max_worker_processes=  8
effective_cache_size=  12GB
work_mem=  128MB
maintenance_work_mem=  4GB
shared_buffers=  2000MB
RAM:  16G
CPU CORES:  8|

HOW can I increase the writes ? How can I get the data faster from 
the oracle database to my postgresql database?

Extract the table to a file in the oracle server in a format that the 
COPY utility can read, then copy it to postgres server and load it. 
You can even pipe commands and do it in a single step.

This is what I meant when I said that COPY is much faster than any 
thing else. To make it even faster, if I/O is not your bottleneck, you 
can chop the table in chunks and load it in parallel as I told you 
before, I have done this many times when migrating data from oracle to 
postgres. ora2pg uses this method to migrate data from oracle to 
postgres too.

2017-08-20 14:00 GMT+03:00 Mariel Cherkassky 
>:

I realized something weird. When I`m preforming the copy utility
of postgresql in order to create dump from a local table in my
postgresql db it takes for 32G table 20 minutes. When I try to
use copy for a foregin table (on oracle database) It takes more
than 2 hours.. During the copy operation from the foreign table I
dont see alot of write operations, with iotop i see that its
writes 3 M/s. What else I can check ?

2017-08-20 9:39 GMT+03:00 Mariel Cherkassky
>:

This server is dedicated to be a postgresql production
database, therefore postgresql is the only thing the runs on
the server. The fs that I`m using is xfs. I`ll add two
different disks - one for the wals and one for the temp
tablespace. Regarding the disk, what size should they be
considering that the database size is about 250G. Does 16G of
ram considered little ? I installed iotop and I see that
postgresql writer is writing most of the time and above all.

I mentioned that I perform alot of insert into table select *
from table. Before that I remove indexes,constraints and
truncate the table. Should I run vacuum before or after the
operation ?

2017-08-17 19:37 GMT+03:00 Claudio Freire
>:

On Thu, Aug 17, 2017 at 6:00 AM, Mariel Cherkassky
> wrote:
> I checked with the storage team

Re: [PERFORM] performance problem on big tables

2017-08-21 Thread Mariel Cherkassky

I`m searching for a way to improve the current performance, I'm not
interesting in using a different tool or writing something new because I'm
trying to migrate a system on oracle database to a postgresql database.

2017-08-21 14:53 GMT+03:00 Daniel Blanch Bataller <
daniel.blanch.batal...@gmail.com>:

>
> El 21 ago 2017, a las 13:27, Mariel Cherkassky <
> mariel.cherkas...@gmail.com> escribió:
>
> All this operation runs as part of a big transaction that I run.
>
> How can I create a dump in the oracle server and copy it to the postgresql
> server from a postgresql transaction ?
>
>
> I guess you could create a user defined function in any of the available
> languages (perl, python, java, …). Functions run inside transactions
> too…this is not simple, though.
>
> Chopping the table is optional when I use copy, but when I use copy to
> remote oracle table it takes longer to create the dump.
>
>
> It may take longer depending on how the oracle machine, table and database
> are configured. In my experience oracle is not very fast dumping whole
> tables, not to mention tables with BLOB data, which can be as slow as
> hundreds of records per second  (which is probably not your case).
>
> If this transaction is to synchronize data between transactional servers
> and data analysis servers you may consider using some type of replication
> where only changes are sent. EnterpriseDB has tools to do such things, I’m
> not aware of any other tool that can do this between oracle and postgres.
>
> Regards,
>
> Daniel.
>
>
> 2017-08-21 11:37 GMT+03:00 Daniel Blanch Bataller <
> daniel.blanch.batal...@gmail.com>:
>
>>
>> El 21 ago 2017, a las 10:00, Mariel Cherkassky <
>> mariel.cherkas...@gmail.com> escribió:
>>
>> To summarize, I still have performance problems. My current situation :
>>
>> I'm trying to copy the data of many tables in the oracle database into my
>> postgresql tables. I'm doing so by running insert into
>> local_postgresql_temp select * from remote_oracle_table. The performance
>> of this operation are very slow and I tried to check the reason for that
>> and mybe choose a different alternative.
>>
>> 1)First method - Insert into local_postgresql_table select * from
>> remote_oracle_table this generated total disk write of 7 M/s and actual
>> disk write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.
>>
>> 2)second method - copy (select * from oracle_remote_table) to /tmp/dump 
>> generates
>> total disk write of 4 M/s and actuval disk write of 100 K/s. The copy
>> utility suppose to be very fast but it seems very slow.
>>
>>
>>
>> Are you using a FDW to access oracle server and then dump it using copy?
>> This is going to be slow, FDW isn't fast.
>>
>>
>> -When I run copy from the local dump, the reading is very fast 300 M/s.
>>
>>
>> You reported it was slow before. What has changed? How much does it take
>> to load the 32G table then?
>>
>>
>> -I created a 32G file on the oracle server and used scp to copy it and it
>> took me a few minutes.
>>
>> -The wals directory is located on a different file system. The parameters
>> I assigned :
>>
>> min_parallel_relation_size = 200MB
>> max_parallel_workers_per_gather = 5
>> max_worker_processes = 8
>> effective_cache_size = 12GB
>> work_mem = 128MB
>> maintenance_work_mem = 4GB
>> shared_buffers = 2000MB
>> RAM : 16G
>> CPU CORES : 8
>>
>> HOW can I increase the writes ? How can I get the data faster from the
>> oracle database to my postgresql database?
>>
>>
>>
>> Extract the table to a file in the oracle server in a format that the
>> COPY utility can read, then copy it to postgres server and load it. You can
>> even pipe commands and do it in a single step.
>>
>> This is what I meant when I said that COPY is much faster than any thing
>> else. To make it even faster, if I/O is not your bottleneck, you can chop
>> the table in chunks and load it in parallel as I told you before, I have
>> done this many times when migrating data from oracle to postgres. ora2pg
>> uses this method to migrate data from oracle to postgres too.
>>
>>
>> 2017-08-20 14:00 GMT+03:00 Mariel Cherkassky > >:
>>
>>> I realized something weird. When I`m preforming the copy utility of
>>> postgresql in order to create dump from a local table in my postgresql db
>>> it takes for 32G table 20 minutes. When I try to use copy for a foregin
>>> table (on oracle database) It takes more than 2 hours.. During the copy
>>> operation from the foreign table I dont see alot of write operations, with
>>> iotop i see that its writes 3 M/s. What else I can check ?
>>>
>>> 2017-08-20 9:39 GMT+03:00 Mariel Cherkassky >> >:
>>>
 This server is dedicated to be a postgresql production database,
 therefore postgresql is the only thing the runs on the server. The fs that
 I`m using is xfs. I`ll add two different disks - one for the wals and one
 for the temp tablespace. Regarding the disk, what size should they

Re: [PERFORM] performance problem on big tables

2017-08-21 Thread Daniel Blanch Bataller


> El 21 ago 2017, a las 13:27, Mariel Cherkassky  
> escribió:
> 
> All this operation runs as part of a big transaction that I run.
> How can I create a dump in the oracle server and copy it to the postgresql 
> server from a postgresql transaction ?

I guess you could create a user defined function in any of the available 
languages (perl, python, java, …). Functions run inside transactions too…this 
is not simple, though. 

> Chopping the table is optional when I use copy, but when I use copy to remote 
> oracle table it takes longer to create the dump. 

It may take longer depending on how the oracle machine, table and database are 
configured. In my experience oracle is not very fast dumping whole tables, not 
to mention tables with BLOB data, which can be as slow as hundreds of records 
per second  (which is probably not your case).

If this transaction is to synchronize data between transactional servers and 
data analysis servers you may consider using some type of replication where 
only changes are sent. EnterpriseDB has tools to do such things, I’m not aware 
of any other tool that can do this between oracle and postgres.

Regards,

Daniel.

> 
> 2017-08-21 11:37 GMT+03:00 Daniel Blanch Bataller 
> >:
> 
>> El 21 ago 2017, a las 10:00, Mariel Cherkassky > > escribió:
>> 
>> To summarize, I still have performance problems. My current situation : 
>> I'm trying to copy the data of many tables in the oracle database into my 
>> postgresql tables. I'm doing so by running insert into local_postgresql_temp 
>> select * from remote_oracle_table. The performance of this operation are 
>> very slow and I tried to check the reason for that and mybe choose a 
>> different alternative.
>> 
>> 1)First method - Insert into local_postgresql_table select * from 
>> remote_oracle_table this generated total disk write of 7 M/s and actual disk 
>> write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.
>> 
>> 2)second method - copy (select * from oracle_remote_table) to /tmp/dump 
>> generates total disk write of 4 M/s and actuval disk write of 100 K/s. The 
>> copy utility suppose to be very fast but it seems very slow.
>> 
>> 
> 
> 
> Are you using a FDW to access oracle server and then dump it using copy? This 
> is going to be slow, FDW isn't fast.
> 
> 
>> -When I run copy from the local dump, the reading is very fast 300 M/s.
>> 
>> 
> 
> You reported it was slow before. What has changed? How much does it take to 
> load the 32G table then?
> 
> 
>> -I created a 32G file on the oracle server and used scp to copy it and it 
>> took me a few minutes.
>> 
>> 
>> -The wals directory is located on a different file system. The parameters I 
>> assigned :
>> 
>> min_parallel_relation_size = 200MB
>> max_parallel_workers_per_gather = 5 
>> max_worker_processes = 8 
>> effective_cache_size = 12GB
>> work_mem = 128MB
>> maintenance_work_mem = 4GB
>> shared_buffers = 2000MB
>> RAM : 16G
>> CPU CORES : 8
>> HOW can I increase the writes ? How can I get the data faster from the 
>> oracle database to my postgresql database?
>> 
>> 
> 
> 
> Extract the table to a file in the oracle server in a format that the COPY 
> utility can read, then copy it to postgres server and load it. You can even 
> pipe commands and do it in a single step.
> 
> This is what I meant when I said that COPY is much faster than any thing 
> else. To make it even faster, if I/O is not your bottleneck, you can chop the 
> table in chunks and load it in parallel as I told you before, I have done 
> this many times when migrating data from oracle to postgres. ora2pg uses this 
> method to migrate data from oracle to postgres too. 
> 
>> 
>> 2017-08-20 14:00 GMT+03:00 Mariel Cherkassky > >:
>> I realized something weird. When I`m preforming the copy utility of 
>> postgresql in order to create dump from a local table in my postgresql db it 
>> takes for 32G table 20 minutes. When I try to use copy for a foregin table 
>> (on oracle database) It takes more than 2 hours.. During the copy operation 
>> from the foreign table I dont see alot of write operations, with iotop i see 
>> that its writes 3 M/s. What else I can check ? 
>> 
>> 2017-08-20 9:39 GMT+03:00 Mariel Cherkassky > >:
>> This server is dedicated to be a postgresql production database, therefore 
>> postgresql is the only thing the runs on the server. The fs that I`m using 
>> is xfs. I`ll add two different disks - one for the wals and one for the temp 
>> tablespace. Regarding the disk, what size should they be considering that 
>> the database size is about 250G. Does 16G of ram considered little ? I 
>> installed iotop and I see that postgresql writer is writing most of

Re: [PERFORM] performance problem on big tables

2017-08-21 Thread Mariel Cherkassky

All this operation runs as part of a big transaction that I run. How can I
create a dump in the oracle server and copy it to the postgresql server
from a postgresql transaction ? Chopping the table is optional when I use
copy, but when I use copy to remote oracle table it takes longer to create
the dump.

2017-08-21 11:37 GMT+03:00 Daniel Blanch Bataller <
daniel.blanch.batal...@gmail.com>:

>
> El 21 ago 2017, a las 10:00, Mariel Cherkassky <
> mariel.cherkas...@gmail.com> escribió:
>
> To summarize, I still have performance problems. My current situation :
>
> I'm trying to copy the data of many tables in the oracle database into my
> postgresql tables. I'm doing so by running insert into
> local_postgresql_temp select * from remote_oracle_table. The performance
> of this operation are very slow and I tried to check the reason for that
> and mybe choose a different alternative.
>
> 1)First method - Insert into local_postgresql_table select * from
> remote_oracle_table this generated total disk write of 7 M/s and actual
> disk write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.
>
> 2)second method - copy (select * from oracle_remote_table) to /tmp/dump 
> generates
> total disk write of 4 M/s and actuval disk write of 100 K/s. The copy
> utility suppose to be very fast but it seems very slow.
>
>
>
> Are you using a FDW to access oracle server and then dump it using copy?
> This is going to be slow, FDW isn't fast.
>
>
> -When I run copy from the local dump, the reading is very fast 300 M/s.
>
>
> You reported it was slow before. What has changed? How much does it take
> to load the 32G table then?
>
>
> -I created a 32G file on the oracle server and used scp to copy it and it
> took me a few minutes.
>
> -The wals directory is located on a different file system. The parameters
> I assigned :
>
> min_parallel_relation_size = 200MB
> max_parallel_workers_per_gather = 5
> max_worker_processes = 8
> effective_cache_size = 12GB
> work_mem = 128MB
> maintenance_work_mem = 4GB
> shared_buffers = 2000MB
> RAM : 16G
> CPU CORES : 8
>
> HOW can I increase the writes ? How can I get the data faster from the
> oracle database to my postgresql database?
>
>
>
> Extract the table to a file in the oracle server in a format that the COPY
> utility can read, then copy it to postgres server and load it. You can even
> pipe commands and do it in a single step.
>
> This is what I meant when I said that COPY is much faster than any thing
> else. To make it even faster, if I/O is not your bottleneck, you can chop
> the table in chunks and load it in parallel as I told you before, I have
> done this many times when migrating data from oracle to postgres. ora2pg
> uses this method to migrate data from oracle to postgres too.
>
>
> 2017-08-20 14:00 GMT+03:00 Mariel Cherkassky 
> :
>
>> I realized something weird. When I`m preforming the copy utility of
>> postgresql in order to create dump from a local table in my postgresql db
>> it takes for 32G table 20 minutes. When I try to use copy for a foregin
>> table (on oracle database) It takes more than 2 hours.. During the copy
>> operation from the foreign table I dont see alot of write operations, with
>> iotop i see that its writes 3 M/s. What else I can check ?
>>
>> 2017-08-20 9:39 GMT+03:00 Mariel Cherkassky 
>> :
>>
>>> This server is dedicated to be a postgresql production database,
>>> therefore postgresql is the only thing the runs on the server. The fs that
>>> I`m using is xfs. I`ll add two different disks - one for the wals and one
>>> for the temp tablespace. Regarding the disk, what size should they be
>>> considering that the database size is about 250G. Does 16G of ram
>>> considered little ? I installed iotop and I see that postgresql writer is
>>> writing most of the time and above all.
>>>
>>> I mentioned that I perform alot of insert into table select * from
>>> table. Before that I remove indexes,constraints and truncate the table.
>>> Should I run vacuum before or after the operation ?
>>>
>>> 2017-08-17 19:37 GMT+03:00 Claudio Freire :
>>>
 On Thu, Aug 17, 2017 at 6:00 AM, Mariel Cherkassky
  wrote:
 > I checked with the storage team in the company and they saw that I
 have alot
 > of io on the server. How should I reduce the io that the postgresql
 uses ?

 Do you have concurrent activity on that server?

 What filesystem are you using wherever the data is sitting?

 If you've got concurrent fsyncs happening, some filesystems handle
 that poorly. When you've got WAL and data mixed in a single disk, or
 worse, filesystem, it happens often that the filesystem won't handle
 the write barriers for the WAL efficiently. I/O gets intermingled with
 bulk operations, and even small fsyncs will have to flush writes from
 bulk operations, which makes a mess of things.

Re: [PERFORM] performance problem on big tables

2017-08-21 Thread Daniel Blanch Bataller


> El 21 ago 2017, a las 10:00, Mariel Cherkassky  
> escribió:
> 
> To summarize, I still have performance problems. My current situation : 
> I'm trying to copy the data of many tables in the oracle database into my 
> postgresql tables. I'm doing so by running insert into local_postgresql_temp 
> select * from remote_oracle_table. The performance of this operation are very 
> slow and I tried to check the reason for that and mybe choose a different 
> alternative.
> 
> 1)First method - Insert into local_postgresql_table select * from 
> remote_oracle_table this generated total disk write of 7 M/s and actual disk 
> write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.
> 
> 2)second method - copy (select * from oracle_remote_table) to /tmp/dump 
> generates total disk write of 4 M/s and actuval disk write of 100 K/s. The 
> copy utility suppose to be very fast but it seems very slow.
> 
> 


Are you using a FDW to access oracle server and then dump it using copy? This 
is going to be slow, FDW isn't fast.


> -When I run copy from the local dump, the reading is very fast 300 M/s.
> 
> 

You reported it was slow before. What has changed? How much does it take to 
load the 32G table then?


> -I created a 32G file on the oracle server and used scp to copy it and it 
> took me a few minutes.
> 
> 
> -The wals directory is located on a different file system. The parameters I 
> assigned :
> 
> min_parallel_relation_size = 200MB
> max_parallel_workers_per_gather = 5 
> max_worker_processes = 8 
> effective_cache_size = 12GB
> work_mem = 128MB
> maintenance_work_mem = 4GB
> shared_buffers = 2000MB
> RAM : 16G
> CPU CORES : 8
> HOW can I increase the writes ? How can I get the data faster from the oracle 
> database to my postgresql database?
> 
> 


Extract the table to a file in the oracle server in a format that the COPY 
utility can read, then copy it to postgres server and load it. You can even 
pipe commands and do it in a single step.

This is what I meant when I said that COPY is much faster than any thing else. 
To make it even faster, if I/O is not your bottleneck, you can chop the table 
in chunks and load it in parallel as I told you before, I have done this many 
times when migrating data from oracle to postgres. ora2pg uses this method to 
migrate data from oracle to postgres too. 

> 
> 2017-08-20 14:00 GMT+03:00 Mariel Cherkassky  >:
> I realized something weird. When I`m preforming the copy utility of 
> postgresql in order to create dump from a local table in my postgresql db it 
> takes for 32G table 20 minutes. When I try to use copy for a foregin table 
> (on oracle database) It takes more than 2 hours.. During the copy operation 
> from the foreign table I dont see alot of write operations, with iotop i see 
> that its writes 3 M/s. What else I can check ? 
> 
> 2017-08-20 9:39 GMT+03:00 Mariel Cherkassky  >:
> This server is dedicated to be a postgresql production database, therefore 
> postgresql is the only thing the runs on the server. The fs that I`m using is 
> xfs. I`ll add two different disks - one for the wals and one for the temp 
> tablespace. Regarding the disk, what size should they be considering that the 
> database size is about 250G. Does 16G of ram considered little ? I installed 
> iotop and I see that postgresql writer is writing most of the time and above 
> all.
> 
> I mentioned that I perform alot of insert into table select * from table. 
> Before that I remove indexes,constraints and truncate the table. Should I run 
> vacuum before or after the operation ? 
> 
> 2017-08-17 19:37 GMT+03:00 Claudio Freire  >:
> On Thu, Aug 17, 2017 at 6:00 AM, Mariel Cherkassky
> > wrote:
> > I checked with the storage team in the company and they saw that I have alot
> > of io on the server. How should I reduce the io that the postgresql uses ?
> 
> Do you have concurrent activity on that server?
> 
> What filesystem are you using wherever the data is sitting?
> 
> If you've got concurrent fsyncs happening, some filesystems handle
> that poorly. When you've got WAL and data mixed in a single disk, or
> worse, filesystem, it happens often that the filesystem won't handle
> the write barriers for the WAL efficiently. I/O gets intermingled with
> bulk operations, and even small fsyncs will have to flush writes from
> bulk operations, which makes a mess of things.
> 
> It is a very good idea, and in fact a recommended practice, to put WAL
> on its own disk for that reason mainly.
> 
> With that little RAM, you'll also probably cause a lot of I/O in temp
> files, so I'd also recommend setting aside another disk for a temp
> tablespace so that I/O doesn't block other transactions as well.

Re: [PERFORM] performance problem on big tables

2017-08-21 Thread Mariel Cherkassky

To summarize, I still have performance problems. My current situation :

I'm trying to copy the data of many tables in the oracle database into my
postgresql tables. I'm doing so by running insert into
local_postgresql_temp select * from remote_oracle_table. The performance of
this operation are very slow and I tried to check the reason for that and
mybe choose a different alternative.

1)First method - Insert into local_postgresql_table select * from
remote_oracle_table this generated total disk write of 7 M/s and actual
disk write of 4 M/s(iotop). For 32G table it took me 2 hours and 30 minutes.

2)second method - copy (select * from oracle_remote_table) to
/tmp/dump generates
total disk write of 4 M/s and actuval disk write of 100 K/s. The copy
utility suppose to be very fast but it seems very slow.

-When I run copy from the local dump, the reading is very fast 300 M/s.

-I created a 32G file on the oracle server and used scp to copy it and it
took me a few minutes.

-The wals directory is located on a different file system. The parameters I
assigned :

min_parallel_relation_size = 200MB
max_parallel_workers_per_gather = 5
max_worker_processes = 8
effective_cache_size = 12GB
work_mem = 128MB
maintenance_work_mem = 4GB
shared_buffers = 2000MB
RAM : 16G
CPU CORES : 8

HOW can I increase the writes ? How can I get the data faster from the
oracle database to my postgresql database?

2017-08-20 14:00 GMT+03:00 Mariel Cherkassky :

> I realized something weird. When I`m preforming the copy utility of
> postgresql in order to create dump from a local table in my postgresql db
> it takes for 32G table 20 minutes. When I try to use copy for a foregin
> table (on oracle database) It takes more than 2 hours.. During the copy
> operation from the foreign table I dont see alot of write operations, with
> iotop i see that its writes 3 M/s. What else I can check ?
>
> 2017-08-20 9:39 GMT+03:00 Mariel Cherkassky :
>
>> This server is dedicated to be a postgresql production database,
>> therefore postgresql is the only thing the runs on the server. The fs that
>> I`m using is xfs. I`ll add two different disks - one for the wals and one
>> for the temp tablespace. Regarding the disk, what size should they be
>> considering that the database size is about 250G. Does 16G of ram
>> considered little ? I installed iotop and I see that postgresql writer is
>> writing most of the time and above all.
>>
>> I mentioned that I perform alot of insert into table select * from table.
>> Before that I remove indexes,constraints and truncate the table. Should I
>> run vacuum before or after the operation ?
>>
>> 2017-08-17 19:37 GMT+03:00 Claudio Freire :
>>
>>> On Thu, Aug 17, 2017 at 6:00 AM, Mariel Cherkassky
>>>  wrote:
>>> > I checked with the storage team in the company and they saw that I
>>> have alot
>>> > of io on the server. How should I reduce the io that the postgresql
>>> uses ?
>>>
>>> Do you have concurrent activity on that server?
>>>
>>> What filesystem are you using wherever the data is sitting?
>>>
>>> If you've got concurrent fsyncs happening, some filesystems handle
>>> that poorly. When you've got WAL and data mixed in a single disk, or
>>> worse, filesystem, it happens often that the filesystem won't handle
>>> the write barriers for the WAL efficiently. I/O gets intermingled with
>>> bulk operations, and even small fsyncs will have to flush writes from
>>> bulk operations, which makes a mess of things.
>>>
>>> It is a very good idea, and in fact a recommended practice, to put WAL
>>> on its own disk for that reason mainly.
>>>
>>> With that little RAM, you'll also probably cause a lot of I/O in temp
>>> files, so I'd also recommend setting aside another disk for a temp
>>> tablespace so that I/O doesn't block other transactions as well.
>>>
>>> This is all assuming you've got concurrent activity on the server. If
>>> not, install iotop and try to see who's causing that much I/O.
>>>
>>
>>

Re: [PERFORM] performance problem on big tables

2017-08-20 Thread Mariel Cherkassky

When I run copy from local table the speed of the writing is 22 M/S. When I
use the copy from remote_oracle_Table it writes 3 M/s. SCP between the
servers coppies very fast. How should I continue ?

2017-08-20 14:00 GMT+03:00 Mariel Cherkassky :

> I realized something weird. When I`m preforming the copy utility of
> postgresql in order to create dump from a local table in my postgresql db
> it takes for 32G table 20 minutes. When I try to use copy for a foregin
> table (on oracle database) It takes more than 2 hours.. During the copy
> operation from the foreign table I dont see alot of write operations, with
> iotop i see that its writes 3 M/s. What else I can check ?
>
> 2017-08-20 9:39 GMT+03:00 Mariel Cherkassky :
>
>> This server is dedicated to be a postgresql production database,
>> therefore postgresql is the only thing the runs on the server. The fs that
>> I`m using is xfs. I`ll add two different disks - one for the wals and one
>> for the temp tablespace. Regarding the disk, what size should they be
>> considering that the database size is about 250G. Does 16G of ram
>> considered little ? I installed iotop and I see that postgresql writer is
>> writing most of the time and above all.
>>
>> I mentioned that I perform alot of insert into table select * from table.
>> Before that I remove indexes,constraints and truncate the table. Should I
>> run vacuum before or after the operation ?
>>
>> 2017-08-17 19:37 GMT+03:00 Claudio Freire :
>>
>>> On Thu, Aug 17, 2017 at 6:00 AM, Mariel Cherkassky
>>>  wrote:
>>> > I checked with the storage team in the company and they saw that I
>>> have alot
>>> > of io on the server. How should I reduce the io that the postgresql
>>> uses ?
>>>
>>> Do you have concurrent activity on that server?
>>>
>>> What filesystem are you using wherever the data is sitting?
>>>
>>> If you've got concurrent fsyncs happening, some filesystems handle
>>> that poorly. When you've got WAL and data mixed in a single disk, or
>>> worse, filesystem, it happens often that the filesystem won't handle
>>> the write barriers for the WAL efficiently. I/O gets intermingled with
>>> bulk operations, and even small fsyncs will have to flush writes from
>>> bulk operations, which makes a mess of things.
>>>
>>> It is a very good idea, and in fact a recommended practice, to put WAL
>>> on its own disk for that reason mainly.
>>>
>>> With that little RAM, you'll also probably cause a lot of I/O in temp
>>> files, so I'd also recommend setting aside another disk for a temp
>>> tablespace so that I/O doesn't block other transactions as well.
>>>
>>> This is all assuming you've got concurrent activity on the server. If
>>> not, install iotop and try to see who's causing that much I/O.
>>>
>>
>>

Re: [PERFORM] performance problem on big tables

2017-08-20 Thread Mariel Cherkassky

I realized something weird. When I`m preforming the copy utility of
postgresql in order to create dump from a local table in my postgresql db
it takes for 32G table 20 minutes. When I try to use copy for a foregin
table (on oracle database) It takes more than 2 hours.. During the copy
operation from the foreign table I dont see alot of write operations, with
iotop i see that its writes 3 M/s. What else I can check ?

2017-08-20 9:39 GMT+03:00 Mariel Cherkassky :

> This server is dedicated to be a postgresql production database, therefore
> postgresql is the only thing the runs on the server. The fs that I`m using
> is xfs. I`ll add two different disks - one for the wals and one for the
> temp tablespace. Regarding the disk, what size should they be considering
> that the database size is about 250G. Does 16G of ram considered little ? I
> installed iotop and I see that postgresql writer is writing most of the
> time and above all.
>
> I mentioned that I perform alot of insert into table select * from table.
> Before that I remove indexes,constraints and truncate the table. Should I
> run vacuum before or after the operation ?
>
> 2017-08-17 19:37 GMT+03:00 Claudio Freire :
>
>> On Thu, Aug 17, 2017 at 6:00 AM, Mariel Cherkassky
>>  wrote:
>> > I checked with the storage team in the company and they saw that I have
>> alot
>> > of io on the server. How should I reduce the io that the postgresql
>> uses ?
>>
>> Do you have concurrent activity on that server?
>>
>> What filesystem are you using wherever the data is sitting?
>>
>> If you've got concurrent fsyncs happening, some filesystems handle
>> that poorly. When you've got WAL and data mixed in a single disk, or
>> worse, filesystem, it happens often that the filesystem won't handle
>> the write barriers for the WAL efficiently. I/O gets intermingled with
>> bulk operations, and even small fsyncs will have to flush writes from
>> bulk operations, which makes a mess of things.
>>
>> It is a very good idea, and in fact a recommended practice, to put WAL
>> on its own disk for that reason mainly.
>>
>> With that little RAM, you'll also probably cause a lot of I/O in temp
>> files, so I'd also recommend setting aside another disk for a temp
>> tablespace so that I/O doesn't block other transactions as well.
>>
>> This is all assuming you've got concurrent activity on the server. If
>> not, install iotop and try to see who's causing that much I/O.
>>
>
>

Re: [PERFORM] performance problem on big tables

2017-08-20 Thread Mariel Cherkassky

This server is dedicated to be a postgresql production database, therefore
postgresql is the only thing the runs on the server. The fs that I`m using
is xfs. I`ll add two different disks - one for the wals and one for the
temp tablespace. Regarding the disk, what size should they be considering
that the database size is about 250G. Does 16G of ram considered little ? I
installed iotop and I see that postgresql writer is writing most of the
time and above all.

I mentioned that I perform alot of insert into table select * from table.
Before that I remove indexes,constraints and truncate the table. Should I
run vacuum before or after the operation ?

2017-08-17 19:37 GMT+03:00 Claudio Freire :

> On Thu, Aug 17, 2017 at 6:00 AM, Mariel Cherkassky
>  wrote:
> > I checked with the storage team in the company and they saw that I have
> alot
> > of io on the server. How should I reduce the io that the postgresql uses
> ?
>
> Do you have concurrent activity on that server?
>
> What filesystem are you using wherever the data is sitting?
>
> If you've got concurrent fsyncs happening, some filesystems handle
> that poorly. When you've got WAL and data mixed in a single disk, or
> worse, filesystem, it happens often that the filesystem won't handle
> the write barriers for the WAL efficiently. I/O gets intermingled with
> bulk operations, and even small fsyncs will have to flush writes from
> bulk operations, which makes a mess of things.
>
> It is a very good idea, and in fact a recommended practice, to put WAL
> on its own disk for that reason mainly.
>
> With that little RAM, you'll also probably cause a lot of I/O in temp
> files, so I'd also recommend setting aside another disk for a temp
> tablespace so that I/O doesn't block other transactions as well.
>
> This is all assuming you've got concurrent activity on the server. If
> not, install iotop and try to see who's causing that much I/O.
>

[PERFORM] Performance Issue -- "Materialize"

2017-08-19 Thread anand086

I am a Postgres Newbie and trying to learn :)We have a scenario wherein, one
of the SQL with different input value for  import_num showing different
execution plan.As an example, with import_num = '4520440' the execution plan
shows Nested Loop and is taking ~12secs. With import_num = '4520460'
execution plan showed using "Materialize" and never completed. After I set
enable_material to off, the execution plan is changed using Hash Semi Join
and completes in less than 3 secs. SELECT count(*) FROM test_tab WHERE login
IN (SELECT DISTINCT login FROM test_tab WHERE import_num = '4520440' AND
login IS NOT NULL EXCEPT SELECT DISTINCT login FROM test_tab WHERE
import_num = '0' AND login IS NOT NULL) AND import_num =
'4520440';++| count  |++| 746982 |++(1 row)Time:
12054.274 ms
+---+|
   
QUERY PLAN  
  
|+---+|
Aggregate  (cost=351405.08..351405.09 rows=1 width=8)   
 
||   ->  Nested Loop  (cost=349846.23..350366.17 rows=415562 width=0)   

|| ->  HashAggregate  (cost=349845.67..349847.67 rows=200 width=96) 

||   Group Key: ("ANY_subquery".login)::text

||   ->  Subquery Scan on "ANY_subquery" 
(cost=340828.23..348557.47 rows=515282 width=96)
  
|| ->  SetOp Except  (cost=340828.23..343404.65
rows=515282 width=100)  

||   ->  Sort  (cost=340828.23..342116.44
rows=515283 width=100)  
  
|| Sort Key: "*SELECT* 1".login 

|| ->  Append  (cost=0.56..275836.74
rows=515283 width=100)  
   
||   ->  Subquery Scan on "*SELECT* 1" 
(cost=0.56..275834.70 rows=515282 width=12) 
|| ->  Unique 
(cost=0.56..270681.88 rows=515282 width=8)  
 
||   ->  Index Only Scan
using ui_nkey_test_tab on test_tab test_tab_1  (cost=0.56..268604.07
rows=831125 width=8) ||
Index Cond: ((import_num = '4520440'::numeric) AND (login IS NOT NULL)) 
 
||   ->  Subquery Scan on "*SELECT* 2" 
(cost=0.56..2.04 rows=1 width=12)   
|| ->  Unique  (cost=0.56..2.03
rows=1 width=8) 
||   ->  Index Only Scan
using ui_nkey_test_tab on test_tab test_tab_2  (cost=0.56..2.03 rows=1
width=8)   ||
Index Cond: ((import_num = '0'::numeric) AND (login IS NOT NULL))   
 
|| ->  Index Only Scan using ui_nkey_test_tab on test_tab 
(cost=0.56..2.58 rows=1 width=8)
 
||   Index Cond: ((import_num = '4520440'::numeric) AND (login =
("ANY_subquery".login)::text)) 
|+---+(19
rows)
SELECT count(*) FROM test_tab WHERE import_num = '4520460' and login IN
(SELECT DISTINCT login FROM test_tab WHERE import_num = '4520460' AND login
IS NOT NULL EXCEPT SELECT DISTINCT login FROM test_tab WHERE import_num =
'0' AND login IS NOT NULL);The SQL was never completing and had the below
SQL execution plan --
+---+|
   
QUERY PLAN

Re: [PERFORM] performance problem on big tables

2017-08-17 Thread Claudio Freire

On Thu, Aug 17, 2017 at 6:00 AM, Mariel Cherkassky
 wrote:
> I checked with the storage team in the company and they saw that I have alot
> of io on the server. How should I reduce the io that the postgresql uses ?

Do you have concurrent activity on that server?

What filesystem are you using wherever the data is sitting?

If you've got concurrent fsyncs happening, some filesystems handle
that poorly. When you've got WAL and data mixed in a single disk, or
worse, filesystem, it happens often that the filesystem won't handle
the write barriers for the WAL efficiently. I/O gets intermingled with
bulk operations, and even small fsyncs will have to flush writes from
bulk operations, which makes a mess of things.

It is a very good idea, and in fact a recommended practice, to put WAL
on its own disk for that reason mainly.

With that little RAM, you'll also probably cause a lot of I/O in temp
files, so I'd also recommend setting aside another disk for a temp
tablespace so that I/O doesn't block other transactions as well.

This is all assuming you've got concurrent activity on the server. If
not, install iotop and try to see who's causing that much I/O.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance problem on big tables

2017-08-17 Thread Daniel Blanch Bataller

I would just check how does it take to copy 3GB using an standard copy command. 
on my computer it took 10 secs. 

> El 17 ago 2017, a las 11:00, Mariel Cherkassky  
> escribió:
> 
> I checked with the storage team in the company and they saw that I have alot 
> of io on the server. How should I reduce the io that the postgresql uses ?
> 
> 2017-08-17 9:25 GMT+03:00 Mariel Cherkassky  >:
> Hi Daniel,
> I already tried to set the destination table to unlogged - it improved the 
> performance slightly. Is there a way to make sure that I/O is the problem ? 
> 
> 2017-08-17 0:46 GMT+03:00 Daniel Blanch Bataller 
> >:
> Seems your disks are too slow. On my laptop (nothing special, just one disk) 
> using COPY I can dump 3G in ~ 20 secs, loading takes 120 secs, bare copying 
> 3G takes 10 secs. 
> 
> Similar proportion you had, but much faster. 
> 
> confirm I/O is your bottleneck, and tell us how you solved your problem
> 
> Anyway, You can cut import time by half if you set your destination table to 
> unlogged (postgres will write half the data, it will save the transaction log 
> writing). Remember to set it to logged when finished!!
> 
> 
> Regards,
> 
> Daniel
> 
>> El 16 ago 2017, a las 16:32, Mariel Cherkassky > > escribió:
>> 
>> My server is virtual and it have virtual hd from a vnx storage machine. The 
>> logs and the data are on the same disk.
>> 
>> 2017-08-16 17:04 GMT+03:00 Daniel Blanch Bataller 
>> >:
>> Considering it has to write logs and data at checkpoints I don’t see it 
>> particularly slow compared to the extract phase. What kind of disks you have 
>> SSD or regular disks? Different disks for ltransaction logs and data?
>> 
>> 
>>> El 16 ago 2017, a las 15:54, Mariel Cherkassky >> > escribió:
>>> 
>>> I run the copy command via psql to create a local dump of a 3G table and it 
>>> took me 134059.732ms =~2 minutes. After that I imported the data via copy 
>>> and it took 458648.677ms =~7 minutes. So the copy command works but pretty 
>>> slow. 
>>> 
>>> 2017-08-16 16:08 GMT+03:00 Daniel Blanch Bataller 
>>> >> >:
>>> See if the copy command is actually working, copy should be very fast from 
>>> your local disk.
>>> 
>>> 
 El 16 ago 2017, a las 14:26, Mariel Cherkassky 
 > 
 escribió:

 After all the changes of the memory parameters the same operation(without 
 the copy utility) didnt run much faster - it  took one minute less. I made 
 a test with the copy command (without the 'with binary') and it took 1.5 
 hours to create the dumpfile in my local postgresql server. Then I tried 
 to run the copy from the local dump and it is already running two hours 
 and it didnt even finish. I looked at the server log and I saw that I run 
 the copy command at 13:18:05, 3 minutes later checkpoint started and 
 completed and there are no messages in the log after that. What can I do ? 
 Improving the memory parameters and the memory on the server didnt help 
 and for now the copy command doesnt help either.

 2017-08-15 20:14 GMT+03:00 Scott Marlowe >:
 On Tue, Aug 15, 2017 at 4:06 AM, Mariel Cherkassky
 > wrote:
 > Hi,
 > So I I run the cheks that jeff mentioned :
 > \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1 
 > hour
 > and 35 minutes

 So 26G takes 95 minutes, or 27 MB/minute or 456k/second? Sound about
 right (it's early, I haven't had enough coffee please check my math).
 That's pretty slow unless you're working across pretty big distances
 with mediocre connections.  My home internet downloads about 100MB/s
 by comparison.

 > \copy local_postresql_table from /tmp/tmp with binary - Didnt run because
 > the remote oracle database is currently under maintenance work.

 You shouldn't need the remote oracle server if you've already copied
 it over, you're just copying from local disk into the local pgsql db.
 Unless I'm missing something.

 > So I decided to follow MichaelDBA tips and I set the ram on my machine to
 > 16G and I configured the effective_cache memory to 14G,tshared_buffer to 
 > be
 > 2G and maintenance_work_mem to 4G.

 Good settings. Maybe set work_mem to 128MB or so while you're at it.

 > I started

Re: [PERFORM] performance problem on big tables

2017-08-17 Thread Mariel Cherkassky

I checked with the storage team in the company and they saw that I have
alot of io on the server. How should I reduce the io that the postgresql
uses ?

2017-08-17 9:25 GMT+03:00 Mariel Cherkassky :

> Hi Daniel,
> I already tried to set the destination table to unlogged - it improved the
> performance slightly. Is there a way to make sure that I/O is the problem ?
>
> 2017-08-17 0:46 GMT+03:00 Daniel Blanch Bataller <
> daniel.blanch.batal...@gmail.com>:
>
>> Seems your disks are too slow. On my laptop (nothing special, just one
>> disk) using COPY I can dump 3G in ~ 20 secs, loading takes 120 secs, bare
>> copying 3G takes 10 secs.
>>
>> Similar proportion you had, but much faster.
>>
>> confirm I/O is your bottleneck, and tell us how you solved your problem
>>
>> Anyway, You can cut import time by half if you set your destination table
>> to unlogged (postgres will write half the data, it will save the
>> transaction log writing). Remember to set it to logged when finished!!
>>
>>
>> Regards,
>>
>> Daniel
>>
>> El 16 ago 2017, a las 16:32, Mariel Cherkassky <
>> mariel.cherkas...@gmail.com> escribió:
>>
>> My server is virtual and it have virtual hd from a vnx storage machine.
>> The logs and the data are on the same disk.
>>
>> 2017-08-16 17:04 GMT+03:00 Daniel Blanch Bataller <
>> daniel.blanch.batal...@gmail.com>:
>>
>>> Considering it has to write logs and data at checkpoints I don’t see it
>>> particularly slow compared to the extract phase. What kind of disks you
>>> have SSD or regular disks? Different disks for ltransaction logs and data?
>>>
>>>
>>> El 16 ago 2017, a las 15:54, Mariel Cherkassky <
>>> mariel.cherkas...@gmail.com> escribió:
>>>
>>> I run the copy command via psql to create a local dump of a 3G table and
>>> it took me 134059.732ms =~2 minutes. After that I imported the data via
>>> copy and it took 458648.677ms =~7 minutes. So the copy command works but
>>> pretty slow.
>>>
>>> 2017-08-16 16:08 GMT+03:00 Daniel Blanch Bataller <
>>> daniel.blanch.batal...@gmail.com>:
>>>
 See if the copy command is actually working, copy should be very fast
 from your local disk.


 El 16 ago 2017, a las 14:26, Mariel Cherkassky <
 mariel.cherkas...@gmail.com> escribió:


 After all the changes of the memory parameters the same
 operation(without the copy utility) didnt run much faster - it  took one
 minute less. I made a test with the copy command (without the 'with
 binary') and it took 1.5 hours to create the dumpfile in my local
 postgresql server. Then I tried to run the copy from the local dump and it
 is already running two hours and it didnt even finish. I looked at the
 server log and I saw that I run the copy command at 13:18:05, 3 minutes
 later checkpoint started and completed and there are no messages in the log
 after that. What can I do ? Improving the memory parameters and the memory
 on the server didnt help and for now the copy command doesnt help either.




 2017-08-15 20:14 GMT+03:00 Scott Marlowe :

> On Tue, Aug 15, 2017 at 4:06 AM, Mariel Cherkassky
>  wrote:
> > Hi,
> > So I I run the cheks that jeff mentioned :
> > \copy (select * from oracle_remote_table) to /tmp/tmp with binary -
> 1 hour
> > and 35 minutes
>
> So 26G takes 95 minutes, or 27 MB/minute or 456k/second? Sound about
> right (it's early, I haven't had enough coffee please check my math).
> That's pretty slow unless you're working across pretty big distances
> with mediocre connections.  My home internet downloads about 100MB/s
> by comparison.
>
> > \copy local_postresql_table from /tmp/tmp with binary - Didnt run
> because
> > the remote oracle database is currently under maintenance work.
>
> You shouldn't need the remote oracle server if you've already copied
> it over, you're just copying from local disk into the local pgsql db.
> Unless I'm missing something.
>
> > So I decided to follow MichaelDBA tips and I set the ram on my
> machine to
> > 16G and I configured the effective_cache memory to
> 14G,tshared_buffer to be
> > 2G and maintenance_work_mem to 4G.
>
> Good settings. Maybe set work_mem to 128MB or so while you're at it.
>
> > I started running the copy checks again and for now it coppied 5G in
> 10
> > minutes. I have some questions :
> > 1)When I run insert into local_postresql_table select * from
> > remote_oracle_table I insert that data as bulk to the local table or
> row by
> > row ?  If the answer as bulk than why copy is a better option for
> this case
> > ?
>
> insert into select from oracle remote is one big copy, but it will
> take at least as long as copying from oracle to the local network
> took. Compare that to the same thing but

Re: [PERFORM] performance problem on big tables

2017-08-17 Thread Mariel Cherkassky

Hi Daniel,
I already tried to set the destination table to unlogged - it improved the
performance slightly. Is there a way to make sure that I/O is the problem ?

2017-08-17 0:46 GMT+03:00 Daniel Blanch Bataller <
daniel.blanch.batal...@gmail.com>:

> Seems your disks are too slow. On my laptop (nothing special, just one
> disk) using COPY I can dump 3G in ~ 20 secs, loading takes 120 secs, bare
> copying 3G takes 10 secs.
>
> Similar proportion you had, but much faster.
>
> confirm I/O is your bottleneck, and tell us how you solved your problem
>
> Anyway, You can cut import time by half if you set your destination table
> to unlogged (postgres will write half the data, it will save the
> transaction log writing). Remember to set it to logged when finished!!
>
>
> Regards,
>
> Daniel
>
> El 16 ago 2017, a las 16:32, Mariel Cherkassky <
> mariel.cherkas...@gmail.com> escribió:
>
> My server is virtual and it have virtual hd from a vnx storage machine.
> The logs and the data are on the same disk.
>
> 2017-08-16 17:04 GMT+03:00 Daniel Blanch Bataller <
> daniel.blanch.batal...@gmail.com>:
>
>> Considering it has to write logs and data at checkpoints I don’t see it
>> particularly slow compared to the extract phase. What kind of disks you
>> have SSD or regular disks? Different disks for ltransaction logs and data?
>>
>>
>> El 16 ago 2017, a las 15:54, Mariel Cherkassky <
>> mariel.cherkas...@gmail.com> escribió:
>>
>> I run the copy command via psql to create a local dump of a 3G table and
>> it took me 134059.732ms =~2 minutes. After that I imported the data via
>> copy and it took 458648.677ms =~7 minutes. So the copy command works but
>> pretty slow.
>>
>> 2017-08-16 16:08 GMT+03:00 Daniel Blanch Bataller <
>> daniel.blanch.batal...@gmail.com>:
>>
>>> See if the copy command is actually working, copy should be very fast
>>> from your local disk.
>>>
>>>
>>> El 16 ago 2017, a las 14:26, Mariel Cherkassky <
>>> mariel.cherkas...@gmail.com> escribió:
>>>
>>>
>>> After all the changes of the memory parameters the same
>>> operation(without the copy utility) didnt run much faster - it  took one
>>> minute less. I made a test with the copy command (without the 'with
>>> binary') and it took 1.5 hours to create the dumpfile in my local
>>> postgresql server. Then I tried to run the copy from the local dump and it
>>> is already running two hours and it didnt even finish. I looked at the
>>> server log and I saw that I run the copy command at 13:18:05, 3 minutes
>>> later checkpoint started and completed and there are no messages in the log
>>> after that. What can I do ? Improving the memory parameters and the memory
>>> on the server didnt help and for now the copy command doesnt help either.
>>>
>>>
>>>
>>>
>>> 2017-08-15 20:14 GMT+03:00 Scott Marlowe :
>>>
 On Tue, Aug 15, 2017 at 4:06 AM, Mariel Cherkassky
  wrote:
 > Hi,
 > So I I run the cheks that jeff mentioned :
 > \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1
 hour
 > and 35 minutes

 So 26G takes 95 minutes, or 27 MB/minute or 456k/second? Sound about
 right (it's early, I haven't had enough coffee please check my math).
 That's pretty slow unless you're working across pretty big distances
 with mediocre connections.  My home internet downloads about 100MB/s
 by comparison.

 > \copy local_postresql_table from /tmp/tmp with binary - Didnt run
 because
 > the remote oracle database is currently under maintenance work.

 You shouldn't need the remote oracle server if you've already copied
 it over, you're just copying from local disk into the local pgsql db.
 Unless I'm missing something.

 > So I decided to follow MichaelDBA tips and I set the ram on my
 machine to
 > 16G and I configured the effective_cache memory to 14G,tshared_buffer
 to be
 > 2G and maintenance_work_mem to 4G.

 Good settings. Maybe set work_mem to 128MB or so while you're at it.

 > I started running the copy checks again and for now it coppied 5G in
 10
 > minutes. I have some questions :
 > 1)When I run insert into local_postresql_table select * from
 > remote_oracle_table I insert that data as bulk to the local table or
 row by
 > row ?  If the answer as bulk than why copy is a better option for
 this case
 > ?

 insert into select from oracle remote is one big copy, but it will
 take at least as long as copying from oracle to the local network
 took. Compare that to the same thing but use file_fdw on the file
 locally.

 > 2)The copy from dump into the postgresql database should take less
 time than
 > the copy to dump ?

 Yes. The copy from Oracle to your local drive is painfully slow for a
 modern network connection.

 > 3)What do you think about the new memory parameters that I cofigured

Re: [PERFORM] performance problem on big tables

2017-08-16 Thread Daniel Blanch Bataller

Seems your disks are too slow. On my laptop (nothing special, just one disk) 
using COPY I can dump 3G in ~ 20 secs, loading takes 120 secs, bare copying 3G 
takes 10 secs. 

Similar proportion you had, but much faster. 

confirm I/O is your bottleneck, and tell us how you solved your problem

Anyway, You can cut import time by half if you set your destination table to 
unlogged (postgres will write half the data, it will save the transaction log 
writing). Remember to set it to logged when finished!!


Regards,

Daniel

> El 16 ago 2017, a las 16:32, Mariel Cherkassky  
> escribió:
> 
> My server is virtual and it have virtual hd from a vnx storage machine. The 
> logs and the data are on the same disk.
> 
> 2017-08-16 17:04 GMT+03:00 Daniel Blanch Bataller 
> >:
> Considering it has to write logs and data at checkpoints I don’t see it 
> particularly slow compared to the extract phase. What kind of disks you have 
> SSD or regular disks? Different disks for ltransaction logs and data?
> 
> 
>> El 16 ago 2017, a las 15:54, Mariel Cherkassky > > escribió:
>> 
>> I run the copy command via psql to create a local dump of a 3G table and it 
>> took me 134059.732ms =~2 minutes. After that I imported the data via copy 
>> and it took 458648.677ms =~7 minutes. So the copy command works but pretty 
>> slow. 
>> 
>> 2017-08-16 16:08 GMT+03:00 Daniel Blanch Bataller 
>> >:
>> See if the copy command is actually working, copy should be very fast from 
>> your local disk.
>> 
>> 
>>> El 16 ago 2017, a las 14:26, Mariel Cherkassky >> > escribió:
>>> 
>>> 
>>> After all the changes of the memory parameters the same operation(without 
>>> the copy utility) didnt run much faster - it  took one minute less. I made 
>>> a test with the copy command (without the 'with binary') and it took 1.5 
>>> hours to create the dumpfile in my local postgresql server. Then I tried to 
>>> run the copy from the local dump and it is already running two hours and it 
>>> didnt even finish. I looked at the server log and I saw that I run the copy 
>>> command at 13:18:05, 3 minutes later checkpoint started and completed and 
>>> there are no messages in the log after that. What can I do ? Improving the 
>>> memory parameters and the memory on the server didnt help and for now the 
>>> copy command doesnt help either.
>>> 
>>> 
>>> 
>>> 
>>> 2017-08-15 20:14 GMT+03:00 Scott Marlowe >> >:
>>> On Tue, Aug 15, 2017 at 4:06 AM, Mariel Cherkassky
>>> > wrote:
>>> > Hi,
>>> > So I I run the cheks that jeff mentioned :
>>> > \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1 hour
>>> > and 35 minutes
>>> 
>>> So 26G takes 95 minutes, or 27 MB/minute or 456k/second? Sound about
>>> right (it's early, I haven't had enough coffee please check my math).
>>> That's pretty slow unless you're working across pretty big distances
>>> with mediocre connections.  My home internet downloads about 100MB/s
>>> by comparison.
>>> 
>>> > \copy local_postresql_table from /tmp/tmp with binary - Didnt run because
>>> > the remote oracle database is currently under maintenance work.
>>> 
>>> You shouldn't need the remote oracle server if you've already copied
>>> it over, you're just copying from local disk into the local pgsql db.
>>> Unless I'm missing something.
>>> 
>>> > So I decided to follow MichaelDBA tips and I set the ram on my machine to
>>> > 16G and I configured the effective_cache memory to 14G,tshared_buffer to 
>>> > be
>>> > 2G and maintenance_work_mem to 4G.
>>> 
>>> Good settings. Maybe set work_mem to 128MB or so while you're at it.
>>> 
>>> > I started running the copy checks again and for now it coppied 5G in 10
>>> > minutes. I have some questions :
>>> > 1)When I run insert into local_postresql_table select * from
>>> > remote_oracle_table I insert that data as bulk to the local table or row 
>>> > by
>>> > row ?  If the answer as bulk than why copy is a better option for this 
>>> > case
>>> > ?
>>> 
>>> insert into select from oracle remote is one big copy, but it will
>>> take at least as long as copying from oracle to the local network
>>> took. Compare that to the same thing but use file_fdw on the file
>>> locally.
>>> 
>>> > 2)The copy from dump into the postgresql database should take less time 
>>> > than
>>> > the copy to dump ?
>>> 
>>> Yes. The copy from Oracle to your local drive is painfully slow for a
>>> modern network connection.
>>> 
>>> > 3)What do you think about the new memory parameters that I cofigured ?
>>> 
>>> They should be OK. I'm more worried about the performance of

Re: [PERFORM] performance problem on big tables

2017-08-16 Thread Mariel Cherkassky

My server is virtual and it have virtual hd from a vnx storage machine. The
logs and the data are on the same disk.

2017-08-16 17:04 GMT+03:00 Daniel Blanch Bataller <
daniel.blanch.batal...@gmail.com>:

> Considering it has to write logs and data at checkpoints I don’t see it
> particularly slow compared to the extract phase. What kind of disks you
> have SSD or regular disks? Different disks for ltransaction logs and data?
>
>
> El 16 ago 2017, a las 15:54, Mariel Cherkassky <
> mariel.cherkas...@gmail.com> escribió:
>
> I run the copy command via psql to create a local dump of a 3G table and
> it took me 134059.732ms =~2 minutes. After that I imported the data via
> copy and it took 458648.677ms =~7 minutes. So the copy command works but
> pretty slow.
>
> 2017-08-16 16:08 GMT+03:00 Daniel Blanch Bataller <
> daniel.blanch.batal...@gmail.com>:
>
>> See if the copy command is actually working, copy should be very fast
>> from your local disk.
>>
>>
>> El 16 ago 2017, a las 14:26, Mariel Cherkassky <
>> mariel.cherkas...@gmail.com> escribió:
>>
>>
>> After all the changes of the memory parameters the same operation(without
>> the copy utility) didnt run much faster - it  took one minute less. I made
>> a test with the copy command (without the 'with binary') and it took 1.5
>> hours to create the dumpfile in my local postgresql server. Then I tried to
>> run the copy from the local dump and it is already running two hours and it
>> didnt even finish. I looked at the server log and I saw that I run the copy
>> command at 13:18:05, 3 minutes later checkpoint started and completed and
>> there are no messages in the log after that. What can I do ? Improving the
>> memory parameters and the memory on the server didnt help and for now the
>> copy command doesnt help either.
>>
>>
>>
>>
>> 2017-08-15 20:14 GMT+03:00 Scott Marlowe :
>>
>>> On Tue, Aug 15, 2017 at 4:06 AM, Mariel Cherkassky
>>>  wrote:
>>> > Hi,
>>> > So I I run the cheks that jeff mentioned :
>>> > \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1
>>> hour
>>> > and 35 minutes
>>>
>>> So 26G takes 95 minutes, or 27 MB/minute or 456k/second? Sound about
>>> right (it's early, I haven't had enough coffee please check my math).
>>> That's pretty slow unless you're working across pretty big distances
>>> with mediocre connections.  My home internet downloads about 100MB/s
>>> by comparison.
>>>
>>> > \copy local_postresql_table from /tmp/tmp with binary - Didnt run
>>> because
>>> > the remote oracle database is currently under maintenance work.
>>>
>>> You shouldn't need the remote oracle server if you've already copied
>>> it over, you're just copying from local disk into the local pgsql db.
>>> Unless I'm missing something.
>>>
>>> > So I decided to follow MichaelDBA tips and I set the ram on my machine
>>> to
>>> > 16G and I configured the effective_cache memory to 14G,tshared_buffer
>>> to be
>>> > 2G and maintenance_work_mem to 4G.
>>>
>>> Good settings. Maybe set work_mem to 128MB or so while you're at it.
>>>
>>> > I started running the copy checks again and for now it coppied 5G in 10
>>> > minutes. I have some questions :
>>> > 1)When I run insert into local_postresql_table select * from
>>> > remote_oracle_table I insert that data as bulk to the local table or
>>> row by
>>> > row ?  If the answer as bulk than why copy is a better option for this
>>> case
>>> > ?
>>>
>>> insert into select from oracle remote is one big copy, but it will
>>> take at least as long as copying from oracle to the local network
>>> took. Compare that to the same thing but use file_fdw on the file
>>> locally.
>>>
>>> > 2)The copy from dump into the postgresql database should take less
>>> time than
>>> > the copy to dump ?
>>>
>>> Yes. The copy from Oracle to your local drive is painfully slow for a
>>> modern network connection.
>>>
>>> > 3)What do you think about the new memory parameters that I cofigured ?
>>>
>>> They should be OK. I'm more worried about the performance of the io
>>> subsystem tbh.
>>>
>>
>>
>>
>
>

Re: [PERFORM] performance problem on big tables

2017-08-16 Thread Daniel Blanch Bataller

Considering it has to write logs and data at checkpoints I don’t see it 
particularly slow compared to the extract phase. What kind of disks you have 
SSD or regular disks? Different disks for ltransaction logs and data?


> El 16 ago 2017, a las 15:54, Mariel Cherkassky  
> escribió:
> 
> I run the copy command via psql to create a local dump of a 3G table and it 
> took me 134059.732ms =~2 minutes. After that I imported the data via copy and 
> it took 458648.677ms =~7 minutes. So the copy command works but pretty slow. 
> 
> 2017-08-16 16:08 GMT+03:00 Daniel Blanch Bataller 
> >:
> See if the copy command is actually working, copy should be very fast from 
> your local disk.
> 
> 
>> El 16 ago 2017, a las 14:26, Mariel Cherkassky > > escribió:
>> 
>> 
>> After all the changes of the memory parameters the same operation(without 
>> the copy utility) didnt run much faster - it  took one minute less. I made a 
>> test with the copy command (without the 'with binary') and it took 1.5 hours 
>> to create the dumpfile in my local postgresql server. Then I tried to run 
>> the copy from the local dump and it is already running two hours and it 
>> didnt even finish. I looked at the server log and I saw that I run the copy 
>> command at 13:18:05, 3 minutes later checkpoint started and completed and 
>> there are no messages in the log after that. What can I do ? Improving the 
>> memory parameters and the memory on the server didnt help and for now the 
>> copy command doesnt help either.
>> 
>> 
>> 
>> 
>> 2017-08-15 20:14 GMT+03:00 Scott Marlowe > >:
>> On Tue, Aug 15, 2017 at 4:06 AM, Mariel Cherkassky
>> > wrote:
>> > Hi,
>> > So I I run the cheks that jeff mentioned :
>> > \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1 hour
>> > and 35 minutes
>> 
>> So 26G takes 95 minutes, or 27 MB/minute or 456k/second? Sound about
>> right (it's early, I haven't had enough coffee please check my math).
>> That's pretty slow unless you're working across pretty big distances
>> with mediocre connections.  My home internet downloads about 100MB/s
>> by comparison.
>> 
>> > \copy local_postresql_table from /tmp/tmp with binary - Didnt run because
>> > the remote oracle database is currently under maintenance work.
>> 
>> You shouldn't need the remote oracle server if you've already copied
>> it over, you're just copying from local disk into the local pgsql db.
>> Unless I'm missing something.
>> 
>> > So I decided to follow MichaelDBA tips and I set the ram on my machine to
>> > 16G and I configured the effective_cache memory to 14G,tshared_buffer to be
>> > 2G and maintenance_work_mem to 4G.
>> 
>> Good settings. Maybe set work_mem to 128MB or so while you're at it.
>> 
>> > I started running the copy checks again and for now it coppied 5G in 10
>> > minutes. I have some questions :
>> > 1)When I run insert into local_postresql_table select * from
>> > remote_oracle_table I insert that data as bulk to the local table or row by
>> > row ?  If the answer as bulk than why copy is a better option for this case
>> > ?
>> 
>> insert into select from oracle remote is one big copy, but it will
>> take at least as long as copying from oracle to the local network
>> took. Compare that to the same thing but use file_fdw on the file
>> locally.
>> 
>> > 2)The copy from dump into the postgresql database should take less time 
>> > than
>> > the copy to dump ?
>> 
>> Yes. The copy from Oracle to your local drive is painfully slow for a
>> modern network connection.
>> 
>> > 3)What do you think about the new memory parameters that I cofigured ?
>> 
>> They should be OK. I'm more worried about the performance of the io
>> subsystem tbh.
>> 
> 
>

Re: [PERFORM] performance problem on big tables

2017-08-16 Thread Mariel Cherkassky

I run the copy command via psql to create a local dump of a 3G table and it
took me 134059.732ms =~2 minutes. After that I imported the data via copy
and it took 458648.677ms =~7 minutes. So the copy command works but pretty
slow.

2017-08-16 16:08 GMT+03:00 Daniel Blanch Bataller <
daniel.blanch.batal...@gmail.com>:

> See if the copy command is actually working, copy should be very fast from
> your local disk.
>
>
> El 16 ago 2017, a las 14:26, Mariel Cherkassky <
> mariel.cherkas...@gmail.com> escribió:
>
>
> After all the changes of the memory parameters the same operation(without
> the copy utility) didnt run much faster - it  took one minute less. I made
> a test with the copy command (without the 'with binary') and it took 1.5
> hours to create the dumpfile in my local postgresql server. Then I tried to
> run the copy from the local dump and it is already running two hours and it
> didnt even finish. I looked at the server log and I saw that I run the copy
> command at 13:18:05, 3 minutes later checkpoint started and completed and
> there are no messages in the log after that. What can I do ? Improving the
> memory parameters and the memory on the server didnt help and for now the
> copy command doesnt help either.
>
>
>
>
> 2017-08-15 20:14 GMT+03:00 Scott Marlowe :
>
>> On Tue, Aug 15, 2017 at 4:06 AM, Mariel Cherkassky
>>  wrote:
>> > Hi,
>> > So I I run the cheks that jeff mentioned :
>> > \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1
>> hour
>> > and 35 minutes
>>
>> So 26G takes 95 minutes, or 27 MB/minute or 456k/second? Sound about
>> right (it's early, I haven't had enough coffee please check my math).
>> That's pretty slow unless you're working across pretty big distances
>> with mediocre connections.  My home internet downloads about 100MB/s
>> by comparison.
>>
>> > \copy local_postresql_table from /tmp/tmp with binary - Didnt run
>> because
>> > the remote oracle database is currently under maintenance work.
>>
>> You shouldn't need the remote oracle server if you've already copied
>> it over, you're just copying from local disk into the local pgsql db.
>> Unless I'm missing something.
>>
>> > So I decided to follow MichaelDBA tips and I set the ram on my machine
>> to
>> > 16G and I configured the effective_cache memory to 14G,tshared_buffer
>> to be
>> > 2G and maintenance_work_mem to 4G.
>>
>> Good settings. Maybe set work_mem to 128MB or so while you're at it.
>>
>> > I started running the copy checks again and for now it coppied 5G in 10
>> > minutes. I have some questions :
>> > 1)When I run insert into local_postresql_table select * from
>> > remote_oracle_table I insert that data as bulk to the local table or
>> row by
>> > row ?  If the answer as bulk than why copy is a better option for this
>> case
>> > ?
>>
>> insert into select from oracle remote is one big copy, but it will
>> take at least as long as copying from oracle to the local network
>> took. Compare that to the same thing but use file_fdw on the file
>> locally.
>>
>> > 2)The copy from dump into the postgresql database should take less time
>> than
>> > the copy to dump ?
>>
>> Yes. The copy from Oracle to your local drive is painfully slow for a
>> modern network connection.
>>
>> > 3)What do you think about the new memory parameters that I cofigured ?
>>
>> They should be OK. I'm more worried about the performance of the io
>> subsystem tbh.
>>
>
>
>

Re: [PERFORM] performance problem on big tables

2017-08-16 Thread Daniel Blanch Bataller

See if the copy command is actually working, copy should be very fast from your 
local disk.


> El 16 ago 2017, a las 14:26, Mariel Cherkassky  
> escribió:
> 
> 
> After all the changes of the memory parameters the same operation(without the 
> copy utility) didnt run much faster - it  took one minute less. I made a test 
> with the copy command (without the 'with binary') and it took 1.5 hours to 
> create the dumpfile in my local postgresql server. Then I tried to run the 
> copy from the local dump and it is already running two hours and it didnt 
> even finish. I looked at the server log and I saw that I run the copy command 
> at 13:18:05, 3 minutes later checkpoint started and completed and there are 
> no messages in the log after that. What can I do ? Improving the memory 
> parameters and the memory on the server didnt help and for now the copy 
> command doesnt help either.
> 
> 
> 
> 
> 2017-08-15 20:14 GMT+03:00 Scott Marlowe  >:
> On Tue, Aug 15, 2017 at 4:06 AM, Mariel Cherkassky
> > wrote:
> > Hi,
> > So I I run the cheks that jeff mentioned :
> > \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1 hour
> > and 35 minutes
> 
> So 26G takes 95 minutes, or 27 MB/minute or 456k/second? Sound about
> right (it's early, I haven't had enough coffee please check my math).
> That's pretty slow unless you're working across pretty big distances
> with mediocre connections.  My home internet downloads about 100MB/s
> by comparison.
> 
> > \copy local_postresql_table from /tmp/tmp with binary - Didnt run because
> > the remote oracle database is currently under maintenance work.
> 
> You shouldn't need the remote oracle server if you've already copied
> it over, you're just copying from local disk into the local pgsql db.
> Unless I'm missing something.
> 
> > So I decided to follow MichaelDBA tips and I set the ram on my machine to
> > 16G and I configured the effective_cache memory to 14G,tshared_buffer to be
> > 2G and maintenance_work_mem to 4G.
> 
> Good settings. Maybe set work_mem to 128MB or so while you're at it.
> 
> > I started running the copy checks again and for now it coppied 5G in 10
> > minutes. I have some questions :
> > 1)When I run insert into local_postresql_table select * from
> > remote_oracle_table I insert that data as bulk to the local table or row by
> > row ?  If the answer as bulk than why copy is a better option for this case
> > ?
> 
> insert into select from oracle remote is one big copy, but it will
> take at least as long as copying from oracle to the local network
> took. Compare that to the same thing but use file_fdw on the file
> locally.
> 
> > 2)The copy from dump into the postgresql database should take less time than
> > the copy to dump ?
> 
> Yes. The copy from Oracle to your local drive is painfully slow for a
> modern network connection.
> 
> > 3)What do you think about the new memory parameters that I cofigured ?
> 
> They should be OK. I'm more worried about the performance of the io
> subsystem tbh.
>

Re: [PERFORM] performance problem on big tables

2017-08-16 Thread Mariel Cherkassky

After all the changes of the memory parameters the same operation(without
the copy utility) didnt run much faster - it  took one minute less. I made
a test with the copy command (without the 'with binary') and it took 1.5
hours to create the dumpfile in my local postgresql server. Then I tried to
run the copy from the local dump and it is already running two hours and it
didnt even finish. I looked at the server log and I saw that I run the copy
command at 13:18:05, 3 minutes later checkpoint started and completed and
there are no messages in the log after that. What can I do ? Improving the
memory parameters and the memory on the server didnt help and for now the
copy command doesnt help either.




2017-08-15 20:14 GMT+03:00 Scott Marlowe :

> On Tue, Aug 15, 2017 at 4:06 AM, Mariel Cherkassky
>  wrote:
> > Hi,
> > So I I run the cheks that jeff mentioned :
> > \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1
> hour
> > and 35 minutes
>
> So 26G takes 95 minutes, or 27 MB/minute or 456k/second? Sound about
> right (it's early, I haven't had enough coffee please check my math).
> That's pretty slow unless you're working across pretty big distances
> with mediocre connections.  My home internet downloads about 100MB/s
> by comparison.
>
> > \copy local_postresql_table from /tmp/tmp with binary - Didnt run because
> > the remote oracle database is currently under maintenance work.
>
> You shouldn't need the remote oracle server if you've already copied
> it over, you're just copying from local disk into the local pgsql db.
> Unless I'm missing something.
>
> > So I decided to follow MichaelDBA tips and I set the ram on my machine to
> > 16G and I configured the effective_cache memory to 14G,tshared_buffer to
> be
> > 2G and maintenance_work_mem to 4G.
>
> Good settings. Maybe set work_mem to 128MB or so while you're at it.
>
> > I started running the copy checks again and for now it coppied 5G in 10
> > minutes. I have some questions :
> > 1)When I run insert into local_postresql_table select * from
> > remote_oracle_table I insert that data as bulk to the local table or row
> by
> > row ?  If the answer as bulk than why copy is a better option for this
> case
> > ?
>
> insert into select from oracle remote is one big copy, but it will
> take at least as long as copying from oracle to the local network
> took. Compare that to the same thing but use file_fdw on the file
> locally.
>
> > 2)The copy from dump into the postgresql database should take less time
> than
> > the copy to dump ?
>
> Yes. The copy from Oracle to your local drive is painfully slow for a
> modern network connection.
>
> > 3)What do you think about the new memory parameters that I cofigured ?
>
> They should be OK. I'm more worried about the performance of the io
> subsystem tbh.
>

Re: [PERFORM] performance problem on big tables

2017-08-15 Thread Pavel Stehule

2017-08-15 18:13 GMT+02:00 Jeff Janes :

> On Tue, Aug 15, 2017 at 3:06 AM, Mariel Cherkassky <
> mariel.cherkas...@gmail.com> wrote:
>
>> Hi,
>> So I I run the cheks that jeff mentioned :
>> \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1
>> hour and 35 minutes
>> \copy local_postresql_table from /tmp/tmp with binary - Didnt run because
>> the remote oracle database is currently under maintenance work.
>>
>
> The "\copy...from" doesn't depend on oracle, it would be only depend on
> local file system (/tmp/tmp), provided that the "\copy...to" finished.
> Anyway, given the length of time it took, I think you can conclude the
> bottleneck is in oracle_fdw itself, or in Oracle, or the network.
>

dumping from Oracle is not fast - I seen it when oracle_fdw or ora2pg cases.

Regards

Pavel



>
> Cheers,
>
> Jeff
>

Re: [PERFORM] performance problem on big tables

2017-08-15 Thread Scott Marlowe

On Tue, Aug 15, 2017 at 4:06 AM, Mariel Cherkassky
 wrote:
> Hi,
> So I I run the cheks that jeff mentioned :
> \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1 hour
> and 35 minutes

So 26G takes 95 minutes, or 27 MB/minute or 456k/second? Sound about
right (it's early, I haven't had enough coffee please check my math).
That's pretty slow unless you're working across pretty big distances
with mediocre connections.  My home internet downloads about 100MB/s
by comparison.

> \copy local_postresql_table from /tmp/tmp with binary - Didnt run because
> the remote oracle database is currently under maintenance work.

You shouldn't need the remote oracle server if you've already copied
it over, you're just copying from local disk into the local pgsql db.
Unless I'm missing something.

> So I decided to follow MichaelDBA tips and I set the ram on my machine to
> 16G and I configured the effective_cache memory to 14G,tshared_buffer to be
> 2G and maintenance_work_mem to 4G.

Good settings. Maybe set work_mem to 128MB or so while you're at it.

> I started running the copy checks again and for now it coppied 5G in 10
> minutes. I have some questions :
> 1)When I run insert into local_postresql_table select * from
> remote_oracle_table I insert that data as bulk to the local table or row by
> row ?  If the answer as bulk than why copy is a better option for this case
> ?

insert into select from oracle remote is one big copy, but it will
take at least as long as copying from oracle to the local network
took. Compare that to the same thing but use file_fdw on the file
locally.

> 2)The copy from dump into the postgresql database should take less time than
> the copy to dump ?

Yes. The copy from Oracle to your local drive is painfully slow for a
modern network connection.

> 3)What do you think about the new memory parameters that I cofigured ?

They should be OK. I'm more worried about the performance of the io
subsystem tbh.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance problem on big tables

2017-08-15 Thread Jeff Janes

On Tue, Aug 15, 2017 at 3:06 AM, Mariel Cherkassky <
mariel.cherkas...@gmail.com> wrote:

> Hi,
> So I I run the cheks that jeff mentioned :
> \copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1 hour
> and 35 minutes
> \copy local_postresql_table from /tmp/tmp with binary - Didnt run because
> the remote oracle database is currently under maintenance work.
>

The "\copy...from" doesn't depend on oracle, it would be only depend on
local file system (/tmp/tmp), provided that the "\copy...to" finished.
Anyway, given the length of time it took, I think you can conclude the
bottleneck is in oracle_fdw itself, or in Oracle, or the network.

Cheers,

Jeff

Re: [PERFORM] performance problem on big tables

2017-08-15 Thread Mariel Cherkassky

Hi,
So I I run the cheks that jeff mentioned :
\copy (select * from oracle_remote_table) to /tmp/tmp with binary - 1 hour
and 35 minutes
\copy local_postresql_table from /tmp/tmp with binary - Didnt run because
the remote oracle database is currently under maintenance work.

So I decided to follow MichaelDBA tips and I set the ram on my machine to
16G and I configured the effective_cache memory to 14G,tshared_buffer to be
2G and maintenance_work_mem to 4G.

I started running the copy checks again and for now it coppied 5G in 10
minutes. I have some questions :
1)When I run insert into local_postresql_table select * from
remote_oracle_table I insert that data as bulk to the local table or row by
row ?  If the answer as bulk than why copy is a better option for this case
?
2)The copy from dump into the postgresql database should take less time
than the copy to dump ?
3)What do you think about the new memory parameters that I cofigured ?






2017-08-14 16:24 GMT+03:00 Mariel Cherkassky :

> I have performance issues with two big tables. Those tables are located on
> an oracle remote database. I'm running the quert : insert into
> local_postgresql_table select * from oracle_remote_table.
>
> The first table has 45M records and its size is 23G. The import of the
> data from the oracle remote database is taking 1 hour and 38 minutes. After
> that I create 13 regular indexes on the table and it takes 10 minutes per
> table ->2 hours and 10 minutes in total.
>
> The second table has 29M records and its size is 26G. The import of the
> data from the oracle remote database is taking 2 hours and 30 minutes. The
> creation of the indexes takes 1 hours and 30 minutes (some are indexes on
> one column and the creation takes 5 min and some are indexes on multiples
> column and it takes 11 min.
>
> Those operation are very problematic for me and I'm searching for a
> solution to improve the performance. The parameters I assigned :
>
> min_parallel_relation_size = 200MB
> max_parallel_workers_per_gather = 5
> max_worker_processes = 8
> effective_cache_size = 2500MB
> work_mem = 16MB
> maintenance_work_mem = 1500MB
> shared_buffers = 2000MB
> RAM : 5G
> CPU CORES : 8
>
> *-I tried running select count(*) from table in oracle and in postgresql
> the running time is almost equal.*
>
> *-Before importing the data I drop the indexes and the constraints.*
>
> *-I tried to copy a 23G file from the oracle server to the postgresql
> server and it took me 12 minutes.*
>
> Please advice how can I continue ? How can I improve something in this
> operation ?
>

Re: [PERFORM] performance problem on big tables

2017-08-14 Thread Jeff Janes

On Mon, Aug 14, 2017 at 6:24 AM, Mariel Cherkassky <
mariel.cherkas...@gmail.com> wrote:

> I have performance issues with two big tables. Those tables are located on
> an oracle remote database. I'm running the quert : insert into
> local_postgresql_table select * from oracle_remote_table.
>
> The first table has 45M records and its size is 23G. The import of the
> data from the oracle remote database is taking 1 hour and 38 minutes.
>
To investigate this, I'd decouple the two steps and see how long each one
takes:

\copy (select * from oracle_remote_table) to /tmp/tmp with binary
\copy local_postresql_table from /tmp/tmp with binary

Cheers,

Jeff

Re: [PERFORM] performance problem on big tables

2017-08-14 Thread Rick Otten

Moving that many gigs of data across your network could also take a long
time simply depending on your network configuration.  Before spending a
huge amount of energy tuning postgresql, I'd probably look at how long it
takes to simply copy 20 or 30 G of data between the two machines.



> El 14 ago 2017, a las 15:24, Mariel Cherkassky <
> mariel.cherkas...@gmail.com> escribió:
>
> I have performance issues with two big tables. Those tables are located on
> an oracle remote database. I'm running the quert : insert into
> local_postgresql_table select * from oracle_remote_table.
>
> The first table has 45M records and its size is 23G. The import of the
> data from the oracle remote database is taking 1 hour and 38 minutes. After
> that I create 13 regular indexes on the table and it takes 10 minutes per
> table ->2 hours and 10 minutes in total.
>
> The second table has 29M records and its size is 26G. The import of the
> data from the oracle remote database is taking 2 hours and 30 minutes. The
> creation of the indexes takes 1 hours and 30 minutes (some are indexes on
> one column and the creation takes 5 min and some are indexes on multiples
> column and it takes 11 min.
>
>
>

Re: [PERFORM] performance problem on big tables

2017-08-14 Thread Daniel Blanch Bataller

Hi.

In general using COPY is *much faster* than anything else. You can even split 
the data load and run it in parallel, start with as many jobs as processors you 
have. Same with indexes, run them in parallel. With parallel I mean various 
psql running at the same time.

Tuning postgres will help too, but not as much as using COPY.

https://www.postgresql.org/docs/9.6/static/performance-tips.html 


https://www.postgresql.org/docs/9.6/static/populate.html 


https://www.postgresql.org/docs/9.6/static/populate.html#POPULATE-COPY-FROM

Regards,

Daniel Blanch..



> El 14 ago 2017, a las 15:24, Mariel Cherkassky  
> escribió:
> 
> I have performance issues with two big tables. Those tables are located on an 
> oracle remote database. I'm running the quert : insert into 
> local_postgresql_table select * from oracle_remote_table.
> 
> The first table has 45M records and its size is 23G. The import of the data 
> from the oracle remote database is taking 1 hour and 38 minutes. After that I 
> create 13 regular indexes on the table and it takes 10 minutes per table ->2 
> hours and 10 minutes in total.
> 
> The second table has 29M records and its size is 26G. The import of the data 
> from the oracle remote database is taking 2 hours and 30 minutes. The 
> creation of the indexes takes 1 hours and 30 minutes (some are indexes on one 
> column and the creation takes 5 min and some are indexes on multiples column 
> and it takes 11 min.
> 
> Those operation are very problematic for me and I'm searching for a solution 
> to improve the performance. The parameters I assigned :
> 
> min_parallel_relation_size = 200MB
> max_parallel_workers_per_gather = 5 
> max_worker_processes = 8 
> effective_cache_size = 2500MB
> work_mem = 16MB
> maintenance_work_mem = 1500MB
> shared_buffers = 2000MB
> RAM : 5G
> CPU CORES : 8
> -I tried running select count(*) from table in oracle and in postgresql the 
> running time is almost equal.
> 
> -Before importing the data I drop the indexes and the constraints.
> 
> -I tried to copy a 23G file from the oracle server to the postgresql server 
> and it took me 12 minutes.
> 
> Please advice how can I continue ? How can I improve something in this 
> operation ?
>

Re: [PERFORM] performance problem on big tables

2017-08-14 Thread MichaelDBA

Total RAM on your host is 5GB, really?  Before touching anything else, 
increase your RAM.  That will be your big performance boost right 
there.  Then, you can "up" your effective_cache_size and 
maintenance_work_mem.


Regards,
Michael Vitale


Mariel Cherkassky 
Monday, August 14, 2017 9:24 AM

I have performance issues with two big tables. Those tables are 
located on an oracle remote database. I'm running the quert : |insert 
into local_postgresql_table select * from oracle_remote_table.|


The first table has 45M records and its size is 23G. The import of the 
data from the oracle remote database is taking 1 hour and 38 minutes. 
After that I create 13 regular indexes on the table and it takes 10 
minutes per table ->2 hours and 10 minutes in total.


The second table has 29M records and its size is 26G. The import of 
the data from the oracle remote database is taking 2 hours and 30 
minutes. The creation of the indexes takes 1 hours and 30 minutes 
(some are indexes on one column and the creation takes 5 min and some 
are indexes on multiples column and it takes 11 min.


Those operation are very problematic for me and I'm searching for a 
solution to improve the performance. The parameters I assigned :


min_parallel_relation_size =200MB
||
max_parallel_workers_per_gather =5
max_worker_processes =8
effective_cache_size =2500MB
work_mem =16MB
maintenance_work_mem =1500MB
shared_buffers =2000MB
RAM :5G
CPU CORES :8

*-I tried running select count(*) from table in oracle and in 
postgresql the running time is almost equal.*


*-Before importing the data I drop the indexes and the constraints.*

*-I tried to copy a 23G file from the oracle server to the postgresql 
server and it took me 12 minutes.*


Please advice how can I continue ? How can I improve something in this 
operation ?

[PERFORM] performance problem on big tables

2017-08-14 Thread Mariel Cherkassky

I have performance issues with two big tables. Those tables are located on
an oracle remote database. I'm running the quert : insert into
local_postgresql_table select * from oracle_remote_table.

The first table has 45M records and its size is 23G. The import of the data
from the oracle remote database is taking 1 hour and 38 minutes. After that
I create 13 regular indexes on the table and it takes 10 minutes per table
->2 hours and 10 minutes in total.

The second table has 29M records and its size is 26G. The import of the
data from the oracle remote database is taking 2 hours and 30 minutes. The
creation of the indexes takes 1 hours and 30 minutes (some are indexes on
one column and the creation takes 5 min and some are indexes on multiples
column and it takes 11 min.

Those operation are very problematic for me and I'm searching for a
solution to improve the performance. The parameters I assigned :

min_parallel_relation_size = 200MB
max_parallel_workers_per_gather = 5
max_worker_processes = 8
effective_cache_size = 2500MB
work_mem = 16MB
maintenance_work_mem = 1500MB
shared_buffers = 2000MB
RAM : 5G
CPU CORES : 8

*-I tried running select count(*) from table in oracle and in postgresql
the running time is almost equal.*

*-Before importing the data I drop the indexes and the constraints.*

*-I tried to copy a 23G file from the oracle server to the postgresql
server and it took me 12 minutes.*

Please advice how can I continue ? How can I improve something in this
operation ?

[PERFORM] Performance degradation from PostgreSQL 8.2.21 to PostgreSQL 9.3.2

2017-07-18 Thread fx TATEISHI KOJI

Issuing exactly the same query as PostgreSQL 8.2.21 and PostgreSQL 9.3.2 will 
slow the response by 6.4 ms on average.
What could be the cause?
Measurement method is as follows.
・ PostgreSQL 8.2.21 installation
 ★Measurement
・ Export DUMP of PostgreSQL 8.2.21
・ PostgreSQL 8.2.21 uninstallation
・ PostgreSQL 9.3.2 installation
・ Dump import
 ★Measurement

[query]
select
 table4.a as col_0_0_,
 table4.a as col_1_0_,
 table4.a as col_2_0_,
 table4.b as col_0_1_,
 table4.c,
 table4.d
from
 table1,
 table2,
 table3,
 table4 
where
 table1.a=table2.a and
 table1.a="parameter$1" and
 table2.roleid=table3.roleid and
 table3.a="parameter$2" and
 table4.b='3' and
 table2.a=table4.a;


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance of information_schema with many schemata and tables

2017-06-28 Thread Ulf Lohbrügge

2017-06-28 10:43 GMT+02:00 Pritam Baral :

>
>
> On Wednesday 28 June 2017 02:00 PM, Ulf Lohbrügge wrote:
> > Nope, I didn't try that yet. But I don't have the impression that
> reindexing the indexes in information_schema will help. The table
> information_schema.tables consists of the following indexes:
> >
> > "pg_class_oid_index" UNIQUE, btree (oid)
> > "pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
> > "pg_class_tblspc_relfilenode_index" btree (reltablespace,
> relfilenode)
>
> information_schema.tables is not a table, it's a view; at least on 9.5[0].
> These indexes you list are actually indexes on the pg_catalog.pg_class
> table.
>

Yes, it's a view. \d+ information_schema.tables gives:

View definition:
 SELECT current_database()::information_schema.sql_identifier AS
table_catalog,
nc.nspname::information_schema.sql_identifier AS table_schema,
c.relname::information_schema.sql_identifier AS table_name,
CASE
WHEN nc.oid = pg_my_temp_schema() THEN 'LOCAL TEMPORARY'::text
WHEN c.relkind = 'r'::"char" THEN 'BASE TABLE'::text
WHEN c.relkind = 'v'::"char" THEN 'VIEW'::text
WHEN c.relkind = 'f'::"char" THEN 'FOREIGN TABLE'::text
ELSE NULL::text
END::information_schema.character_data AS table_type,
NULL::character varying::information_schema.sql_identifier AS
self_referencing_column_name,
NULL::character varying::information_schema.character_data AS
reference_generation,
CASE
WHEN t.typname IS NOT NULL THEN current_database()
ELSE NULL::name
END::information_schema.sql_identifier AS user_defined_type_catalog,
nt.nspname::information_schema.sql_identifier AS
user_defined_type_schema,
t.typname::information_schema.sql_identifier AS user_defined_type_name,
CASE
WHEN c.relkind = 'r'::"char" OR (c.relkind = ANY
(ARRAY['v'::"char", 'f'::"char"])) AND
(pg_relation_is_updatable(c.oid::regclass, false) & 8) = 8 THEN 'YES'::text
ELSE 'NO'::text
END::information_schema.yes_or_no AS is_insertable_into,
CASE
WHEN t.typname IS NOT NULL THEN 'YES'::text
ELSE 'NO'::text
END::information_schema.yes_or_no AS is_typed,
NULL::character varying::information_schema.character_data AS
commit_action
   FROM pg_namespace nc
 JOIN pg_class c ON nc.oid = c.relnamespace
 LEFT JOIN (pg_type t
 JOIN pg_namespace nt ON t.typnamespace = nt.oid) ON c.reloftype = t.oid
  WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char"]))
AND NOT pg_is_other_temp_schema(nc.oid) AND (pg_has_role(c.relowner,
'USAGE'::text) OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE,
DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
has_any_column_privilege(c.oid, 'SELECT, INSERT, UPDATE,
REFERENCES'::text));


>
> >
> > The costly sequence scan in question on pg_class happens with the
> following WHERE clause:
> >
> > WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char"]))
> AND NOT pg_is_other_temp_schema(nc.oid) AND (pg_has_role(c.relowner,
> 'USAGE'::text) OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE,
> DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR 
> has_any_column_privilege(c.oid,
> 'SELECT, INSERT, UPDATE, REFERENCES'::text));
>
> This is not the bottleneck WHERE clause the query plan from your first
> mail shows. That one is:
>
> ((relkind = ANY ('{r,v,f}'::"char"[])) AND (((relname)::information_
> schema.sql_identifier)::text = 'bar'::text) AND (pg_has_role(relowner,
> 'USAGE'::text) OR has_table_privilege(oid, 'SELECT, INSERT, UPDATE, DELETE,
> TRUNCATE, REFERENCES, TRIGGER'::text) OR has_any_column_privilege(oid,
> 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
>

The part you copied is from the EXPLAIN ANALYZE output. The WHERE clause I
posted earlier (or see view definition) above does unfortunately not
contain the relname.


>
> I can say with certainty that an index on pg_catalog.pg_class.relname is
> going to speed this up. Postgres doesn't allow modifying system catalogs,
> but the `REINDEX SYSTEM ;` command should rebuild the system
> indexes and pg_catalog.pg_class.relname should be included in them (I
> tested on 9.6).
>
> Do try that once. If you still see sequential scans, check what indexes
> are present on pg_catalog.pg_class.
>

I just fired a 'REINDEX SYSTEM ;' but the output of EXPLAIN ANALYZE
is unchanged and the query duration did not change.

Best Regards,
Ulf


>
>
> >
> > Besides pg_class_oid_index none of the referenced columns is indexed. I
> tried to add an index on relowner but didn't succeed because the column is
> used in the function call pg_has_role and the query is still forced to do a
> sequence scan.
> >
> > Regards,
> > Ulf
> >
> > 2017-06-28 3:31 GMT+02:00 Pritam Baral >:
> >
> > On Wednesday 28 June 2017 05:27 AM, Ulf Lohbrügge wrote:
> >

Re: [PERFORM] Performance of information_schema with many schemata and tables

2017-06-28 Thread Ulf Lohbrügge

Nope, I didn't try that yet. But I don't have the impression that
reindexing the indexes in information_schema will help. The table
information_schema.tables consists of the following indexes:

"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
"pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)

The costly sequence scan in question on pg_class happens with the following
WHERE clause:

WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'f'::"char"])) AND
NOT pg_is_other_temp_schema(nc.oid) AND (pg_has_role(c.relowner,
'USAGE'::text) OR has_table_privilege(c.oid, 'SELECT, INSERT, UPDATE,
DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
has_any_column_privilege(c.oid,
'SELECT, INSERT, UPDATE, REFERENCES'::text));

Besides pg_class_oid_index none of the referenced columns is indexed. I
tried to add an index on relowner but didn't succeed because the column is
used in the function call pg_has_role and the query is still forced to do a
sequence scan.

Regards,
Ulf

2017-06-28 3:31 GMT+02:00 Pritam Baral :

> On Wednesday 28 June 2017 05:27 AM, Ulf Lohbrügge wrote:
> > Hi all,
> >
> > we use schemata to separate our customers in a multi-tenant setup
> (9.5.7, Debian stable). Each tenant is managed in his own schema with all
> the tables that only he can access. All tables in all schemata are the same
> in terms of their DDL: Every tenant uses e.g. his own table 'address'. We
> currently manage around 1200 schemata (i.e. tenants) on one cluster. Every
> schema consists currently of ~200 tables - so we end up with ~24 tables
> plus constraints, indexes, sequences et al.
> >
> > Our current approach is quite nice in terms of data privacy because
> every tenant is isolated from all other tenants. A tenant uses his own user
> that gives him only access to the corresponding schema. Performance is
> great for us - we didn't expect Postgres to scale so well!
> >
> > But performance is pretty bad when we query things in the
> information_schema:
> >
> > SELECT
> >   *
> > FROM information_schema.tables
> > WHERE table_schema = 'foo'
> > AND table_name = 'bar';``
> >
> > Above query results in a large sequence scan with a filter that removes
> 1305161 rows:
> >
> >
>
>QUERY PLAN
> > 
> 
> 
> 
> 
> ---
> >  Nested Loop Left Join  (cost=0.70..101170.18 rows=3 width=265) (actual
> time=383.505..383.505 rows=0 loops=1)
> >->  Nested Loop  (cost=0.00..101144.65 rows=3 width=141) (actual
> time=383.504..383.504 rows=0 loops=1)
> >  Join Filter: (nc.oid = c.relnamespace)
> >  ->  Seq Scan on pg_class c  (cost=0.00..101023.01 rows=867
> width=77) (actual time=383.502..383.502 rows=0 loops=1)
> >Filter: ((relkind = ANY ('{r,v,f}'::"char"[])) AND
> (((relname)::information_schema.sql_identifier)::text = 'bar'::text) AND
> (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT,
> INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
> has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
> >Rows Removed by Filter: 1305161
> >  ->  Materialize  (cost=0.00..56.62 rows=5 width=68) (never
> executed)
> >->  Seq Scan on pg_namespace nc  (cost=0.00..56.60 rows=5
> width=68) (never executed)
> >  Filter: ((NOT pg_is_other_temp_schema(oid)) AND
> (((nspname)::information_schema.sql_identifier)::text = 'foo'::text))
> >->  Nested Loop  (cost=0.70..8.43 rows=1 width=132) (never executed)
> >  ->  Index Scan using pg_type_oid_index on pg_type t
> (cost=0.42..8.12 rows=1 width=72) (never executed)
> >Index Cond: (c.reloftype = oid)
> >  ->  Index Scan using pg_namespace_oid_index on pg_namespace nt
> (cost=0.28..0.30 rows=1 width=68) (never executed)
> >Index Cond: (oid = t.typnamespace)
> >  Planning time: 0.624 ms
> >  Execution time: 383.784 ms
> > (16 rows)
> >
> > We noticed the degraded performance first when using the psql cli.
> Pressing tab after beginning a WHERE clause results in a query against the
> information_schema which is pretty slow and ends in "lag" when trying to
> enter queries.
> >
> > We also use Flyway (https://flywaydb.org/) to handle our database
> migrations. Unfortunately Flyway is querying the information_schema to
> check if specific tables exist (I guess this is one of the reasons
> information_schema exists) and therefore vastly slows down the migration of
> our tenants. Our last migration run on all tenants (schemata) almost

Re: [PERFORM] Performance of information_schema with many schemata and tables

2017-06-27 Thread Pritam Baral

On Wednesday 28 June 2017 05:27 AM, Ulf Lohbrügge wrote:
> Hi all,
>
> we use schemata to separate our customers in a multi-tenant setup (9.5.7, 
> Debian stable). Each tenant is managed in his own schema with all the tables 
> that only he can access. All tables in all schemata are the same in terms of 
> their DDL: Every tenant uses e.g. his own table 'address'. We currently 
> manage around 1200 schemata (i.e. tenants) on one cluster. Every schema 
> consists currently of ~200 tables - so we end up with ~24 tables plus 
> constraints, indexes, sequences et al.
>
> Our current approach is quite nice in terms of data privacy because every 
> tenant is isolated from all other tenants. A tenant uses his own user that 
> gives him only access to the corresponding schema. Performance is great for 
> us - we didn't expect Postgres to scale so well!
>
> But performance is pretty bad when we query things in the information_schema:
>
> SELECT
>   *
> FROM information_schema.tables
> WHERE table_schema = 'foo'
> AND table_name = 'bar';``
>
> Above query results in a large sequence scan with a filter that removes 
> 1305161 rows:
>
>   
>   
> QUERY PLAN
> ---
>  Nested Loop Left Join  (cost=0.70..101170.18 rows=3 width=265) (actual 
> time=383.505..383.505 rows=0 loops=1)
>->  Nested Loop  (cost=0.00..101144.65 rows=3 width=141) (actual 
> time=383.504..383.504 rows=0 loops=1)
>  Join Filter: (nc.oid = c.relnamespace)
>  ->  Seq Scan on pg_class c  (cost=0.00..101023.01 rows=867 width=77) 
> (actual time=383.502..383.502 rows=0 loops=1)
>Filter: ((relkind = ANY ('{r,v,f}'::"char"[])) AND 
> (((relname)::information_schema.sql_identifier)::text = 'bar'::text) AND 
> (pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT, 
> INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR 
> has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
>Rows Removed by Filter: 1305161
>  ->  Materialize  (cost=0.00..56.62 rows=5 width=68) (never executed)
>->  Seq Scan on pg_namespace nc  (cost=0.00..56.60 rows=5 
> width=68) (never executed)
>  Filter: ((NOT pg_is_other_temp_schema(oid)) AND 
> (((nspname)::information_schema.sql_identifier)::text = 'foo'::text))
>->  Nested Loop  (cost=0.70..8.43 rows=1 width=132) (never executed)
>  ->  Index Scan using pg_type_oid_index on pg_type t  
> (cost=0.42..8.12 rows=1 width=72) (never executed)
>Index Cond: (c.reloftype = oid)
>  ->  Index Scan using pg_namespace_oid_index on pg_namespace nt  
> (cost=0.28..0.30 rows=1 width=68) (never executed)
>Index Cond: (oid = t.typnamespace)
>  Planning time: 0.624 ms
>  Execution time: 383.784 ms
> (16 rows)
>
> We noticed the degraded performance first when using the psql cli. Pressing 
> tab after beginning a WHERE clause results in a query against the 
> information_schema which is pretty slow and ends in "lag" when trying to 
> enter queries.
>
> We also use Flyway (https://flywaydb.org/) to handle our database migrations. 
> Unfortunately Flyway is querying the information_schema to check if specific 
> tables exist (I guess this is one of the reasons information_schema exists) 
> and therefore vastly slows down the migration of our tenants. Our last 
> migration run on all tenants (schemata) almost took 2h because the above 
> query is executed multiple times per tenant. The migration run consisted of 
> multiple sql files to be executed and triggered more than 10 queries on 
> information_schema per tenant.
>
> I don't think that Flyway is to blame because querying the information_schema 
> should be a fast operation (and was fast for us when we had less schemata). I 
> tried to speedup querying pg_class by adding indexes (after enabling 
> allow_system_table_mods) but didn't succeed. The function call 'pg_has_role' 
> is probably not easy to optimize.
>
> Postgres is really doing a great job to handle those many schemata and tables 
> but doesn't scale well when querying information_schema. I actually don't 
> want to change my current multi-tenant setup (one schema per tenant) as it is 
> working great but the slow information_schema is killing our deployments.
>
> Are there any other options besides switching from 
> one-schema-per-tenant-approach? Any help is greatly appreciated!

Have you tried a `REINDEX SYSTEM `?

>
>

[PERFORM] Performance of information_schema with many schemata and tables

2017-06-27 Thread Ulf Lohbrügge

Hi all,

we use schemata to separate our customers in a multi-tenant setup (9.5.7,
Debian stable). Each tenant is managed in his own schema with all the
tables that only he can access. All tables in all schemata are the same in
terms of their DDL: Every tenant uses e.g. his own table 'address'. We
currently manage around 1200 schemata (i.e. tenants) on one cluster. Every
schema consists currently of ~200 tables - so we end up with ~24 tables
plus constraints, indexes, sequences et al.

Our current approach is quite nice in terms of data privacy because every
tenant is isolated from all other tenants. A tenant uses his own user that
gives him only access to the corresponding schema. Performance is great for
us - we didn't expect Postgres to scale so well!

But performance is pretty bad when we query things in the
information_schema:

SELECT
  *
FROM information_schema.tables
WHERE table_schema = 'foo'
AND table_name = 'bar';``

Above query results in a large sequence scan with a filter that removes
1305161 rows:



QUERY PLAN
---
 Nested Loop Left Join  (cost=0.70..101170.18 rows=3 width=265) (actual
time=383.505..383.505 rows=0 loops=1)
   ->  Nested Loop  (cost=0.00..101144.65 rows=3 width=141) (actual
time=383.504..383.504 rows=0 loops=1)
 Join Filter: (nc.oid = c.relnamespace)
 ->  Seq Scan on pg_class c  (cost=0.00..101023.01 rows=867
width=77) (actual time=383.502..383.502 rows=0 loops=1)
   Filter: ((relkind = ANY ('{r,v,f}'::"char"[])) AND
(((relname)::information_schema.sql_identifier)::text = 'bar'::text) AND
(pg_has_role(relowner, 'USAGE'::text) OR has_table_privilege(oid, 'SELECT,
INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER'::text) OR
has_any_column_privilege(oid, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))
   Rows Removed by Filter: 1305161
 ->  Materialize  (cost=0.00..56.62 rows=5 width=68) (never
executed)
   ->  Seq Scan on pg_namespace nc  (cost=0.00..56.60 rows=5
width=68) (never executed)
 Filter: ((NOT pg_is_other_temp_schema(oid)) AND
(((nspname)::information_schema.sql_identifier)::text = 'foo'::text))
   ->  Nested Loop  (cost=0.70..8.43 rows=1 width=132) (never executed)
 ->  Index Scan using pg_type_oid_index on pg_type t
 (cost=0.42..8.12 rows=1 width=72) (never executed)
   Index Cond: (c.reloftype = oid)
 ->  Index Scan using pg_namespace_oid_index on pg_namespace nt
 (cost=0.28..0.30 rows=1 width=68) (never executed)
   Index Cond: (oid = t.typnamespace)
 Planning time: 0.624 ms
 Execution time: 383.784 ms
(16 rows)

We noticed the degraded performance first when using the psql cli. Pressing
tab after beginning a WHERE clause results in a query against the
information_schema which is pretty slow and ends in "lag" when trying to
enter queries.

We also use Flyway (https://flywaydb.org/) to handle our database
migrations. Unfortunately Flyway is querying the information_schema to
check if specific tables exist (I guess this is one of the reasons
information_schema exists) and therefore vastly slows down the migration of
our tenants. Our last migration run on all tenants (schemata) almost took
2h because the above query is executed multiple times per tenant. The
migration run consisted of multiple sql files to be executed and triggered
more than 10 queries on information_schema per tenant.

I don't think that Flyway is to blame because querying the
information_schema should be a fast operation (and was fast for us when we
had less schemata). I tried to speedup querying pg_class by adding indexes
(after enabling allow_system_table_mods) but didn't succeed. The function
call 'pg_has_role' is probably not easy to optimize.

Postgres is really doing a great job to handle those many schemata and
tables but doesn't scale well when querying information_schema. I actually
don't want to change my current multi-tenant setup (one schema per tenant)
as it is working great but the slow information_schema is killing our
deployments.

Are there any other options besides switching from
one-schema-per-tenant-approach? Any help is greatly appreciated!

Regards,
Ulf

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-04-27 Thread Justin Pryzby

On Mon, Mar 06, 2017 at 12:17:22PM +, Dinesh Chandra 12108 wrote:
> Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' 
> AND attname='domain_class_id' ;
> 
> 
> schemaname | tablename | attname | inherited | null_frac | avg_width 
> | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | 
> correlation
> 
> "evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.0039,0.0018,0.00146667,0.0005,0.0003,6.7e-05,6.7e-05}"|""|0.889078

On Fri, Mar 03, 2017 at 12:44:07PM +, Dinesh Chandra 12108 wrote:
>->  Index Scan using point_domain_class_id_index on point p  
> (cost=0.00..1483472.70 rows=1454751 width=16) (actual time=27.265..142101.1 
> 59 rows=1607491 loops=1)

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is well
> clustered on domain_class_id.  In which case, why isn't it just faster?

I missed your response until now, and can't see that anybody else responded,
but I suspect the issue is that the *table* is highly correlated WRT this
column, but the index may not be, probably due to duplicated index keys.
postgres only stores statistics on expression indices, and falls back to
correlation of table column of a simple indices.

If you're still fighting this, would you send result of:

SELECT domain_class_id, count(1) FROM point GROUP BY 1 ORDER BY 2 DESC LIMIT 22;
or,
SELECT count(1) FROM point GROUP BY domain_class_id ORDER BY 1 DESC LIMIT 22;

if there's much repetition in the index keys, then PG's planner thinks an index
scan has low random_page_cost, and effective_cache_size has little effect on
large tables, and it never uses bitmap scan, which blows up if the index is
fragmented and has duplicate keys.  The table reads end up costing something
like 1454751*random_page_cost nonsequential reads and fseek() calls when it
thinks it'll cost only 1454751*16*seq_page_cost.

Is the query much faster if you first reindex point_domain_class_id_index ?

This has come up before, see:
> https://www.postgresql.org/message-id/flat/520D6610.8040907%40emulex.com#520d6610.8040...@emulex.com
> https://www.postgresql.org/message-id/flat/20160524173914.GA11880%40telsasoft.com#20160524173914.ga11...@telsasoft.com
> https://www.postgresql.org/message-id/flat/n6cmpug13b9rk1srebjvhphg0lm8dou1kn%404ax.com#n6cmpug13b9rk1srebjvhphg0lm8dou...@4ax.com

Justin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue after upgrading from 9.4 to 9.6

2017-03-27 Thread Merlin Moncure

On Mon, Mar 6, 2017 at 7:20 AM, Piotr Gasidło  wrote:
> We are having some performance issues after we upgraded to newest
> version of PostgreSQL, before it everything was fast and smooth.
>
> Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
> upgraded to 9.6.2 with no improvement.
>
> Some information about our setup: Freebsd, Solaris (SmartOS), simple
> master-slave using streaming replication.
>
> Problem:
> Very high system CPU when master is streaming replication data, CPU
> goes up to 77%. Only one process is generating this load, it's a
> postgresql startup process. When I attached a truss to this process I
> saw a lot o read calls with almost the same number of errors (EAGAIN).
>
> root@d8:~ # truss -c -p 38091
> ^Csyscall seconds   calls  errors
> semop   0.001611782 198   0
> write   0.74404   2   0
> read2.281535100   17266   12375
> openat  0.000683532  48   0
> lseek   0.177612479   20443   0
> close   0.000395549  48   0
>   - --- ---
> 2.461912846   38005   12375
>
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(7,0x0,SEEK_END) = 164487168 (0x9cde000)
> lseek(778,0x0,SEEK_END) = 57344 (0xe000)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
> lseek(894,0x4b7e000,SEEK_SET) = 79159296 (0x4b7e000)
> read(894," ~\0\08\a\M--m\0\0\^A\0\M^T\"...,8192) = 8192 (0x2000)
> lseek(3,0xfa6000,SEEK_SET) = 16408576 (0xfa6000)
> read(3,"\M^S\M-P\^E\0\^A\0\0\0\0`\M-z"...,8192) = 8192 (0x2000)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
> lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
> lseek(894,0x449c000,SEEK_SET) = 71942144 (0x449c000)
> read(894,"\^_~\0\0\M-H\M-H\M-B\M-b\0\0\^E"...,8192) = 8192 (0x2000)
> lseek(818,0x0,SEEK_END) = 57344 (0xe000)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
> lseek(442,0x0,SEEK_END) = 10174464 (0x9b4000)
> lseek(442,0x4c4000,SEEK_SET) = 4997120 (0x4c4000)
> read(442,"\^_~\0\0\M-P\M-+\M-1\M-b\0\0\0\0"...,8192) = 8192 (0x2000)
> read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
>
> Descriptor 6 is a pipe
>
> Read call try to read one byte over and over, I looked up to source
> code and I think this file is responsible for this behavior
> src/backend/storage/ipc/latch.c. There was no such file in 9.4.

Is a git bisect out of the question?

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Performance issue after upgrading from 9.4 to 9.6

2017-03-06 Thread Piotr Gasidło

We are having some performance issues after we upgraded to newest
version of PostgreSQL, before it everything was fast and smooth.

Upgrade was done by pg_upgrade from 9.4 directly do 9.6.1. Now we
upgraded to 9.6.2 with no improvement.

Some information about our setup: Freebsd, Solaris (SmartOS), simple
master-slave using streaming replication.

Problem:
Very high system CPU when master is streaming replication data, CPU
goes up to 77%. Only one process is generating this load, it's a
postgresql startup process. When I attached a truss to this process I
saw a lot o read calls with almost the same number of errors (EAGAIN).

root@d8:~ # truss -c -p 38091
^Csyscall seconds   calls  errors
semop   0.001611782 198   0
write   0.74404   2   0
read2.281535100   17266   12375
openat  0.000683532  48   0
lseek   0.177612479   20443   0
close   0.000395549  48   0
  - --- ---
2.461912846   38005   12375

read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(444,0x0,SEEK_END) = 32571392 (0x1f1)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(7,0x0,SEEK_END) = 164487168 (0x9cde000)
lseek(778,0x0,SEEK_END) = 57344 (0xe000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
lseek(894,0x4b7e000,SEEK_SET) = 79159296 (0x4b7e000)
read(894," ~\0\08\a\M--m\0\0\^A\0\M^T\"...,8192) = 8192 (0x2000)
lseek(3,0xfa6000,SEEK_SET) = 16408576 (0xfa6000)
read(3,"\M^S\M-P\^E\0\^A\0\0\0\0`\M-z"...,8192) = 8192 (0x2000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
lseek(894,0x0,SEEK_END) = 139296768 (0x84d8000)
lseek(894,0x449c000,SEEK_SET) = 71942144 (0x449c000)
read(894,"\^_~\0\0\M-H\M-H\M-B\M-b\0\0\^E"...,8192) = 8192 (0x2000)
lseek(818,0x0,SEEK_END) = 57344 (0xe000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'
lseek(442,0x0,SEEK_END) = 10174464 (0x9b4000)
lseek(442,0x4c4000,SEEK_SET) = 4997120 (0x4c4000)
read(442,"\^_~\0\0\M-P\M-+\M-1\M-b\0\0\0\0"...,8192) = 8192 (0x2000)
read(6,0x7fffa0c7,1) ERR#35 'Resource temporarily unavailable'

Descriptor 6 is a pipe

Read call try to read one byte over and over, I looked up to source
code and I think this file is responsible for this behavior
src/backend/storage/ipc/latch.c. There was no such file in 9.4.


-- 
Piotr Gasidło


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-06 Thread Dinesh Chandra 12108

Dear Justin,

Below is the output of Query SELECT * FROM pg_stats WHERE tablename='point' AND 
attname='domain_class_id' ;


schemaname | tablename | attname | inherited | null_frac | avg_width | 
n_distinct | most_common_vals | most_common_freqs | histogram_bounds | 
correlation

"evidence"|"point"|"domain_class_id"|f|0|8|10|"{7,9,2,11,43,3,1,10,4,17}"|"{0.9322,0.0451333,0.0145,0.0039,0.0018,0.00146667,0.0005,0.0003,6.7e-05,6.7e-05}"|""|0.889078


Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.


-Original Message-
From: Justin Pryzby [mailto:pry...@telsasoft.com]
Sent: 06 March, 2017 10:54 AM
To: Dinesh Chandra 12108 <dinesh.chan...@cyient.com>
Cc: Nur Agus <nuragus.li...@gmail.com>; Jeff Janes <jeff.ja...@gmail.com>; 
pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issue in PostgreSQL server...

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 
> <dinesh.chan...@cyient.com> wrote:
> > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> > feature_id FROM evidence.point p INNER JOIN
> > evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id
> > WHERE p.domain_class_id IN (11) AND (p.modification_time >
> > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10
> > 00:06:56.056 IST') ORDER BY feature_id
> ...
>
> >  ->  Index Scan using point_domain_class_id_index on 
> > point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual 
> > time=27.265..142101.1 59 rows=1607491 loops=1)
> >Index Cond: (domain_class_id = 11)
>
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is
> well clustered on domain_class_id.  In which case, why isn't it just faster?

Could you send:

SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

.. or if that's too verbose or you don't want to share the histogram or MCV
list:

SELECT correlation FROM pg_stats WHERE tablename='point' AND 
attname='domain_class_id' ;

Justin



DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply email and destroy all copies of 
the original message. Check all attachments for viruses before opening them. 
All views or opinions presented in this e-mail are those of the author and may 
not reflect the opinion of Cyient or those of our affiliates.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-05 Thread Justin Pryzby

On Sun, Mar 05, 2017 at 08:23:08PM -0800, Jeff Janes wrote:
> On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 
>  wrote:
> > The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> > feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence
> > oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
> > (p.modification_time > '2015-05-10 00:06:56.056 IST' OR
> > oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id
> ...
> 
> >  ->  Index Scan using point_domain_class_id_index on 
> > point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual 
> > time=27.265..142101.1 59 rows=1607491 loops=1)
> >Index Cond: (domain_class_id = 11)
> 
> Why wouldn't this be using a bitmap scan rather than a regular index scan?
> It seems like it should prefer the bitmap scan, unless the table is well
> clustered on domain_class_id.  In which case, why isn't it just faster?

Could you send:

SELECT * FROM pg_stats WHERE tablename='point' AND attname='domain_class_id' ;

.. or if that's too verbose or you don't want to share the histogram or MCV
list:

SELECT correlation FROM pg_stats WHERE tablename='point' AND 
attname='domain_class_id' ;

Justin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-05 Thread Jeff Janes

On Fri, Mar 3, 2017 at 4:44 AM, Dinesh Chandra 12108 <
dinesh.chan...@cyient.com> wrote:

> Dear Nur,
>
>
>
> The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT
> feature_id FROM evidence.point p INNER JOIN evidence.observation_evidence
> oe ON p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND
> (p.modification_time > '2015-05-10 00:06:56.056 IST' OR
> oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id
>
>
>
>
...

>  ->  Index Scan using point_domain_class_id_index on
> point p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual
> time=27.265..142101.1
>
> 59 rows=1607491 loops=1)
>
>Index Cond: (domain_class_id = 11)
>

Why wouldn't this be using a bitmap scan rather than a regular index scan?
It seems like it should prefer the bitmap scan, unless the table is well
clustered on domain_class_id.  In which case, why isn't it just faster?

You could try repeating the explain analyze after setting enable_indexscan
=off to see what that gives.  If it gives a seq scan, then repeat with
enable_seqscan also turned off.  Or If it gives the bitmap scan, then
repeat with enable_bitmapscan turned off.

How many rows is in point, and how big is it?

The best bet for making this better might be to have an index on
(domain_class_id, modification_time) and hope for an index only scan.
Except that you are on 9.1, so first you would have to upgrade.  Which
would allow you to use BUFFERS in the explain analyze, as well as
track_io_timings, both of which would also be pretty nice to see.  Using
9.1 is like having one hand tied behind your back.

Also, any idea why this execution of this query 15 is times faster than the
execution you found in the log file?  Was the top output you showed in the
first email happening at the time the really slow query was running, or was
that from a different period?

Cheers,

Jeff

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-03 Thread Tom Lane

Dinesh Chandra 12108  writes:
> The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id 
> FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON 
> p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND 
> (p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time 
> > '2015-05-10 00:06:56.056 IST') ORDER BY feature_id

I think that's a fundamentally slow query and you're not going to be able
to make it better without rethinking your requirements and/or data
representation.  As written, that requires the server to form the entire
join of p to oe on feature_id, with the only filter before the join being
the evidently-none-too-selective domain_class_id condition.  Only after
joining can it apply the OR condition.  So this is inherently processing a
lot of rows.

If the OR arms were individually pretty selective you could rewrite this
into a UNION of two joins, a la the discussion at
https://www.postgresql.org/message-id/flat/7f70bd5a-5d16-e05c-f0b4-2fdfc8873...@bluetreble.com
but given the dates involved I'm betting that won't help very much.

Or maybe you could try

select feature_id from p where domain_class_id IN (11) AND p.modification_time 
> '2015-05-10 00:06:56.056 IST'
intersect
select feature_id from oe where oe.modification_time > '2015-05-10 00:06:56.056 
IST'
order by feature_id

although I'm not entirely certain that that has exactly the same
semantics (-ENOCAFFEINE), and it might still be none too quick.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-03 Thread Dinesh Chandra 12108

Dear Nur,

The below is the output for psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id 
FROM evidence.point p INNER JOIN evidence.observation_evidence oe ON 
p.feature_id = oe.evd_feature_id WHERE p.domain_class_id IN (11) AND 
(p.modification_time > '2015-05-10 00:06:56.056 IST' OR oe.modification_time > 
'2015-05-10 00:06:56.056 IST') ORDER BY feature_id



  QUERY PLAN

-
-
Unique  (cost=1679730.32..1679837.46 rows=21428 width=8) (actual 
time=154753.528..155657.818 rows=1607489 loops=1)
   ->  Sort  (cost=1679730.32..1679783.89 rows=21428 width=8) (actual 
time=154753.514..155087.734 rows=4053270 loops=1)
 Sort Key: p.feature_id
 Sort Method: quicksort  Memory: 288302kB
 ->  Hash Join  (cost=1501657.09..1678188.87 rows=21428 width=8) 
(actual time=144146.620..152050.311 rows=4053270 loops=1)
   Hash Cond: (oe.evd_feature_id = p.feature_id)
   Join Filter: ((p.modification_time > '2015-05-10 
03:36:56.056+05:30'::timestamp with time zone) OR (oe.modification_time > 
'2015-05-10 03:36:5
6.056+05:30'::timestamp with time zone))
   ->  Seq Scan on observation_evidence oe  (cost=0.00..121733.18 
rows=5447718 width=16) (actual time=0.007..1534.905 rows=5434406 loops=1)
   ->  Hash  (cost=1483472.70..1483472.70 rows=1454751 width=16) 
(actual time=144144.653..144144.653 rows=1607491 loops=1)
 Buckets: 262144  Batches: 1  Memory Usage: 75352kB
 ->  Index Scan using point_domain_class_id_index on point 
p  (cost=0.00..1483472.70 rows=1454751 width=16) (actual time=27.265..142101.1
59 rows=1607491 loops=1)
   Index Cond: (domain_class_id = 11)
Total runtime: 155787.379 ms
(13 rows)


Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.
--
Mobile: +91-9953975849 | Ext 1078 
|dinesh.chan...@cyient.com<mailto:%7cdinesh.chan...@cyient.com>
Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.

From: Nur Agus [mailto:nuragus.li...@gmail.com]
Sent: 03 March, 2017 5:54 PM
To: Dinesh Chandra 12108 <dinesh.chan...@cyient.com>
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Performance issue in PostgreSQL server...

Hello Dinesh,

You can try the EXPLAIN tool

psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p INNER 
JOIN evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 
IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY 
feature_id

Then paste here the result.

Thanks

On Fri, Mar 3, 2017 at 5:29 PM, Dinesh Chandra 12108 
<dinesh.chan...@cyient.com<mailto:dinesh.chan...@cyient.com>> wrote:
Dear Experts,

I need your suggestions to resolve the performance issue reported on our 
PostgreSQL9.1 production database having 1.5 TB Size. I have observed that, 
some select queries with order by clause are taking lot of time in execution 
and forcing applications to give slow response.

The configuration of database server is :

Architecture: x86_64
CPU op-mode(s):32-bit, 64-bit
CPU’s : 8
Core(s) per socket:4
Socket(s): 2
Model name:Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

RAM : 32 GB
SWAP :8 Gb

Kernel parameter:

kernel.shmmax = 32212254720
kernel.shmall = 1073741824


Values of PostgreSQL.conf parameters are :

shared_buffers = 10GB
temp_buffers = 32MB
work_mem = 512MB
maintenance_work_mem = 2048MB
max_files_per_process = 2000
checkpoint_segments = 200
max_wal_senders = 5
wal_buffers = -1  # min 32kB, -1 sets based on 
shared_buffers


Queries taking lot of time are:
==


2017-03-02 00:46:50 IST LOG:  duration: 2492951.927 ms  execute : 
SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN 
evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 
IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY 
feature_id


2017-03-02 01:05:16 IST LOG:  duration: 516250.512 ms  execute : 
SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN 
evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 01:22:59.059 
IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') ORDER BY 
feature_id


Top command output:

top - 15:13:15 up 66 days,  3:45,  8 users,  load average: 1.8

Re: [PERFORM] Performance issue in PostgreSQL server...

2017-03-03 Thread Nur Agus

Hello Dinesh,

You can try the EXPLAIN tool

psql=> EXPLAIN ANALYZE SELECT DISTINCT feature_id FROM evidence.point p
INNER JOIN evidence.observation_evidence oe ON p.feature_id =
oe.evd_feature_id WHERE p.domain_class_id IN (11) AND (p.modification_time
> '2015-05-10 00:06:56.056 IST' OR oe.modification_time > '2015-05-10
00:06:56.056 IST') ORDER BY feature_id

Then paste here the result.

Thanks

On Fri, Mar 3, 2017 at 5:29 PM, Dinesh Chandra 12108 <
dinesh.chan...@cyient.com> wrote:

> Dear Experts,
>
>
>
> I need your suggestions to resolve the performance issue reported on our
> *PostgreSQL9.1* production database having 1.5 TB *Size*. I have observed
> that, some select queries with order by clause are taking lot of time in
> execution and forcing applications to give slow response.
>
>
>
> The configuration of database server is :
>
>
>
> Architecture: x86_64
>
> CPU op-mode(s):32-bit, 64-bit
>
> CPU’s : 8
>
> Core(s) per socket:4
>
> Socket(s): 2
>
> Model name:Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
>
>
>
> *RAM* : 32 GB
>
> *SWAP* :8 Gb
>
>
>
> *Kernel parameter*:
>
>
>
> kernel.shmmax = 32212254720
>
> kernel.shmall = 1073741824
>
>
>
>
>
> Values of PostgreSQL.conf parameters are :
>
>
>
> shared_buffers = 10GB
>
> temp_buffers = 32MB
>
> work_mem = 512MB
>
> maintenance_work_mem = 2048MB
>
> max_files_per_process = 2000
>
> checkpoint_segments = 200
>
> max_wal_senders = 5
>
> wal_buffers = -1  # min 32kB, -1 sets based on
> shared_buffers
>
>
>
>
>
> *Queries taking lot of time are:*
>
> ==
>
>
>
>
>
> 2017-03-02 00:46:50 IST LOG:  duration: 2492951.927 ms  execute :
> SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN
> evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE
> p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10
> 00:06:56.056 IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST')
> ORDER BY feature_id
>
>
>
>
>
> 2017-03-02 01:05:16 IST LOG:  duration: 516250.512 ms  execute :
> SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN
> evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE
> p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10
> 01:22:59.059 IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST')
> ORDER BY feature_id
>
>
>
>
>
> *Top command output*:
>
>
>
> top - 15:13:15 up 66 days,  3:45,  8 users,  load average: 1.84, 1.59, 1.57
>
> Tasks: 830 total,   1 running, 828 sleeping,   0 stopped,   1 zombie
>
> Cpu(s):  3.4%us,  0.7%sy,  0.0%ni, 81.7%id, 14.2%wa,  0.0%hi,  0.0%si,
> 0.0%st
>
> *Mem:*  32830016k total, *32142596k* used,   *687420k* free,77460k
> buffers
>
> Swap:  8190972k total,   204196k used,  7986776k free, 27981268k cached
>
>
>
>   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>
> 30639 postgres  20   0 10.5g 4.7g 4.7g S 13.5 14.9  10:20.95 postgres
>
> 18185 postgres  20   0 10.5g 603m 596m S  4.9  1.9   2:51.16 postgres
>
> 16543 postgres  20   0 10.5g 2.8g 2.8g S  4.3  8.8   1:34.04 postgres
>
> 14710 postgres  20   0 10.5g 2.9g 2.9g S  3.9  9.2   1:20.84 postgres
>
> 1214 root  20   0 15668 1848  896 S  1.0  0.0 130:46.43 top
>
> 13462 postgres  20   0 10.5g 1.4g 1.3g S  1.0  4.3   0:25.56 postgres
>
> 20081 root  20   0 15668 1880  936 R  1.0  0.0   0:00.12 top
>
> 13478 postgres  20   0 10.5g 2.1g 2.1g S  0.7  6.9   0:56.43 postgres
>
> 41107 root  20   0  416m  10m 4892 S  0.7  0.0 305:25.71 pgadmin3
>
> 2680 root  20   0 000 S  0.3  0.0 103:38.54 nfsiod
>
> 3558 root  20   0 13688 1100  992 S  0.3  0.0  45:00.36 gam_server
>
> 15576 root  20   0 000 S  0.3  0.0   0:01.16 flush-253:1
>
> 18430 postgres  20   0 10.5g  18m  13m S  0.3  0.1   0:00.64 postgres
>
> 20083 postgres  20   0  105m 1852 1416 S  0.3  0.0   0:00.01 bash
>
> 24188 postgres  20   0  102m 1856  832 S  0.3  0.0   0:23.39 sshd
>
> 28250 postgres  20   0  156m 1292  528 S  0.3  0.0   0:46.86 postgres
>
> 1 root  20   0 19356 1188  996 S  0.0  0.0   0:05.00 init
>
>
>
> *Regards,*
>
> *Dinesh Chandra*
>
> *|Database administrator (Oracle/PostgreSQL)| *
>
>
>
> --
>
> DISCLAIMER:
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message. Check all attachments for viruses
> before opening them. All views or opinions presented in this e-mail are
> those of the author and may not reflect the opinion of Cyient or those of
> our affiliates.
>

[PERFORM] Performance issue in PostgreSQL server...

2017-03-03 Thread Dinesh Chandra 12108

Dear Experts,

I need your suggestions to resolve the performance issue reported on our 
PostgreSQL9.1 production database having 1.5 TB Size. I have observed that, 
some select queries with order by clause are taking lot of time in execution 
and forcing applications to give slow response.

The configuration of database server is :

Architecture: x86_64
CPU op-mode(s):32-bit, 64-bit
CPU's : 8
Core(s) per socket:4
Socket(s): 2
Model name:Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz

RAM : 32 GB
SWAP :8 Gb

Kernel parameter:

kernel.shmmax = 32212254720
kernel.shmall = 1073741824


Values of PostgreSQL.conf parameters are :

shared_buffers = 10GB
temp_buffers = 32MB
work_mem = 512MB
maintenance_work_mem = 2048MB
max_files_per_process = 2000
checkpoint_segments = 200
max_wal_senders = 5
wal_buffers = -1  # min 32kB, -1 sets based on 
shared_buffers


Queries taking lot of time are:
==


2017-03-02 00:46:50 IST LOG:  duration: 2492951.927 ms  execute : 
SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN 
evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (11) AND (p.modification_time > '2015-05-10 00:06:56.056 
IST' OR oe.modification_time > '2015-05-10 00:06:56.056 IST') ORDER BY 
feature_id


2017-03-02 01:05:16 IST LOG:  duration: 516250.512 ms  execute : 
SELECT DISTINCT feature_id FROM evidence.point p INNER JOIN 
evidence.observation_evidence oe ON p.feature_id = oe.evd_feature_id WHERE 
p.domain_class_id IN (3) AND (p.modification_time > '2015-05-10 01:22:59.059 
IST' OR oe.modification_time > '2015-05-10 01:22:59.059 IST') ORDER BY 
feature_id


Top command output:

top - 15:13:15 up 66 days,  3:45,  8 users,  load average: 1.84, 1.59, 1.57
Tasks: 830 total,   1 running, 828 sleeping,   0 stopped,   1 zombie
Cpu(s):  3.4%us,  0.7%sy,  0.0%ni, 81.7%id, 14.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  32830016k total, 32142596k used,   687420k free,77460k buffers
Swap:  8190972k total,   204196k used,  7986776k free, 27981268k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
30639 postgres  20   0 10.5g 4.7g 4.7g S 13.5 14.9  10:20.95 postgres
18185 postgres  20   0 10.5g 603m 596m S  4.9  1.9   2:51.16 postgres
16543 postgres  20   0 10.5g 2.8g 2.8g S  4.3  8.8   1:34.04 postgres
14710 postgres  20   0 10.5g 2.9g 2.9g S  3.9  9.2   1:20.84 postgres
1214 root  20   0 15668 1848  896 S  1.0  0.0 130:46.43 top
13462 postgres  20   0 10.5g 1.4g 1.3g S  1.0  4.3   0:25.56 postgres
20081 root  20   0 15668 1880  936 R  1.0  0.0   0:00.12 top
13478 postgres  20   0 10.5g 2.1g 2.1g S  0.7  6.9   0:56.43 postgres
41107 root  20   0  416m  10m 4892 S  0.7  0.0 305:25.71 pgadmin3
2680 root  20   0 000 S  0.3  0.0 103:38.54 nfsiod
3558 root  20   0 13688 1100  992 S  0.3  0.0  45:00.36 gam_server
15576 root  20   0 000 S  0.3  0.0   0:01.16 flush-253:1
18430 postgres  20   0 10.5g  18m  13m S  0.3  0.1   0:00.64 postgres
20083 postgres  20   0  105m 1852 1416 S  0.3  0.0   0:00.01 bash
24188 postgres  20   0  102m 1856  832 S  0.3  0.0   0:23.39 sshd
28250 postgres  20   0  156m 1292  528 S  0.3  0.0   0:46.86 postgres
1 root  20   0 19356 1188  996 S  0.0  0.0   0:05.00 init

Regards,
Dinesh Chandra
|Database administrator (Oracle/PostgreSQL)|




DISCLAIMER:

This email message is for the sole use of the intended recipient(s) and may 
contain confidential and privileged information. Any unauthorized review, use, 
disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply email and destroy all copies of 
the original message. Check all attachments for viruses before opening them. 
All views or opinions presented in this e-mail are those of the author and may 
not reflect the opinion of Cyient or those of our affiliates.

Re: [PERFORM] performance contradiction

2017-02-15 Thread Feike Steenbergen

On 23 January 2017 at 17:55, Gabriel Dodan  wrote:
>
> BUT if I run a trivial select on both servers, on a similar table, the
select
> perform much much better on second server!

You're comparing two very different systems it seems, therefore you might be
looking at difference in the performance of EXPLAIN, just getting timing
information of your system may be the most expensive part[1], you could
disable
the timing explicity:

EXPLAIN (ANALYZE ON, TIMING OFF) 

And, there is something that stands out:

So it seems there is also some difference in the data, we could validate the
actual numbers:

SELECT sum(pg_column_size(symbols_tests))/count(*) FROM symbols_tests;

regards,

Feike

[1]
https://www.postgresql.org/docs/current/static/using-explain.html#USING-EXPLAIN-CAVEATS

[PERFORM] performance contradiction

2017-01-23 Thread Gabriel Dodan

Hi All,

I have two servers. On the first one I have postgresql version 9.6 . On the
second one I have version 9.3 . I ran pgbench on both servers.

First server results:
scaling factor: 10
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 4639
latency average = 2.156 ms
tps = 463.818971 (including connections establishing)
tps = 464.017489 (excluding connections establishing)

Second server results:
scaling factor: 10
query mode: simple
number of clients: 1
number of threads: 1
duration: 10 s
number of transactions actually processed: 3771
tps = 377.084162 (including connections establishing)
tps = 377.134546 (excluding connections establishing)
So first server perform much better.

BUT if I run a trivial select on both servers, on a similar table, the
select perform much much better on second server!

First server explain results:

Limit  (cost=0.00..0.83 rows=30 width=33) (actual time=0.152..0.794 rows=30
loops=1)
  Buffers: shared hit=1
  ->  Seq Scan on symbols_tests  (cost=0.00..1.57 rows=57 width=33) (actual
time=0.040..0.261 rows=30 loops=1)
Buffers: shared hit=1
Planning time: 0.282 ms
Execution time: 1.062 ms
Second server explain results:

Limit  (cost=0.00..0.83 rows=30 width=622) (actual time=0.006..0.010
rows=30 loops=1)
  Buffers: shared hit=1
  ->  Seq Scan on symbols_tests  (cost=0.00..1.57 rows=57 width=622)
(actual time=0.006..0.007 rows=30 loops=1)
Buffers: shared hit=1
Total runtime: 0.020 ms

Both servers have SSD. First server is a VPS, the second server is a
dedicated server.

Any idea why this contradiction ? If you need more details regarding server
resources(CPU, memory etc) please let me know.

Regards

-- 
Stock Monitor - Stock Analysis Made EASY

Re: [PERFORM] Performance issue with castings args of the function

2017-01-02 Thread Андрей Хозов

Thanks all for explain!

On Mon, Jan 2, 2017 at 9:36 PM, Tom Lane  wrote:

> =?UTF-8?B?0JDQvdC00YDQtdC5INCl0L7Qt9C+0LI=?=  writes:
> > create table t1 (id serial, str char(32));
>
> > create function f1(line text) returns void as $$
> > begin
> >   perform * from t1 where str = line;
> > end;
> > $$ language plpgsql;
>
> This query is specifying a text comparison (text = text operator).
> Since the table column isn't text, a char-to-text conversion must
> happen at each line.
>
> > create function f2(line char) returns void as $$
> > begin
> >   perform * from t1 where str = line;
> > end;
> > $$ language plpgsql;
>
> This query is specifying a char(n) comparison (char = char operator).
> No type conversion step needed, so it's faster.
>
> regards, tom lane
>



-- 
Andrey Khozov

Re: [PERFORM] Performance issue with castings args of the function

2017-01-02 Thread Tom Lane

=?UTF-8?B?0JDQvdC00YDQtdC5INCl0L7Qt9C+0LI=?=  writes:
> create table t1 (id serial, str char(32));

> create function f1(line text) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;

This query is specifying a text comparison (text = text operator).
Since the table column isn't text, a char-to-text conversion must
happen at each line.

> create function f2(line char) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;

This query is specifying a char(n) comparison (char = char operator).
No type conversion step needed, so it's faster.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issue with castings args of the function

2017-01-02 Thread Pavel Stehule

Hi

2017-01-02 15:34 GMT+01:00 Андрей Хозов :

> Hello there!
>
> I have an performance issue with functions and args type.
>
> Table and data:
> create table t1 (id serial, str char(32));
> insert into t1 (str) select md5(s::text) from generate_series(1, 100)
> as s;
>
> And simple functions:
> create function f1(line text) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;
>
> create function f2(line char) returns void as $$
> begin
>   perform * from t1 where str = line;
> end;
> $$ language plpgsql;
>
> Query:
> test=> explain analyze select f2('2b00042f7481c7b056c4b410d28f33cf');
>QUERY PLAN
>
> 
> 
>  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=189.008..189.010
> rows=1 loops=1)
>  Planning time: 0.039 ms
>  Execution time: 189.039 ms
> (3 rows)
>
> Time: 189,524 ms
> test=> explain analyze select f1('2b00042f7481c7b056c4b410d28f33cf');
>QUERY PLAN
>
> 
> 
>  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=513.734..513.735
> rows=1 loops=1)
>  Planning time: 0.024 ms
>  Execution time: 513.757 ms
> (3 rows)
>
> Time: 514,125 ms
> test=> explain analyze select f1('2b00042f7481c7b056c4b410d2
> 8f33cf'::char(32));
>QUERY PLAN
>
> 
> 
>  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=513.507..513.509
> rows=1 loops=1)
>  Planning time: 0.074 ms
>  Execution time: 513.535 ms
> (3 rows)
>

This explain shows nothing - you need to use nested explain

look on auto-explain
https://www.postgresql.org/docs/current/static/auto-explain.html

Maybe index was not used due different types.

Regards

Pavel


> Time: 514,104 ms
> test=>
> 
> Seems that casting param from text to char(32) needs to be done only once
> and f1 and f2 must be identical on performance. But function f2 with text
> param significantly slower, even with casting arg while pass it to function.
>
> Tested postgresql versions 9.5.5 and 9.6.1 on Ubuntu 16.04. It's normal
> behavior or it's can be fixed?
>
> --
> Andrey Khozov
>

[PERFORM] Performance issue with castings args of the function

2017-01-02 Thread Андрей Хозов

Hello there!

I have an performance issue with functions and args type.

Table and data:
create table t1 (id serial, str char(32));
insert into t1 (str) select md5(s::text) from generate_series(1, 100)
as s;

And simple functions:
create function f1(line text) returns void as $$
begin
  perform * from t1 where str = line;
end;
$$ language plpgsql;

create function f2(line char) returns void as $$
begin
  perform * from t1 where str = line;
end;
$$ language plpgsql;

Query:
test=> explain analyze select f2('2b00042f7481c7b056c4b410d28f33cf');
   QUERY PLAN



 Result  (cost=0.00..0.26 rows=1 width=0) (actual time=189.008..189.010
rows=1 loops=1)
 Planning time: 0.039 ms
 Execution time: 189.039 ms
(3 rows)

Time: 189,524 ms
test=> explain analyze select f1('2b00042f7481c7b056c4b410d28f33cf');
   QUERY PLAN



 Result  (cost=0.00..0.26 rows=1 width=0) (actual time=513.734..513.735
rows=1 loops=1)
 Planning time: 0.024 ms
 Execution time: 513.757 ms
(3 rows)

Time: 514,125 ms
test=> explain analyze select f1('2b00042f7481c7b056c4b410d28f33
cf'::char(32));
   QUERY PLAN



 Result  (cost=0.00..0.26 rows=1 width=0) (actual time=513.507..513.509
rows=1 loops=1)
 Planning time: 0.074 ms
 Execution time: 513.535 ms
(3 rows)

Time: 514,104 ms
test=>

Seems that casting param from text to char(32) needs to be done only once
and f1 and f2 must be identical on performance. But function f2 with text
param significantly slower, even with casting arg while pass it to function.

Tested postgresql versions 9.5.5 and 9.6.1 on Ubuntu 16.04. It's normal
behavior or it's can be fixed?

-- 
Andrey Khozov

Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-12-09 Thread Merlin Moncure

On Wed, Nov 30, 2016 at 6:26 AM, Marc-Olaf Jaschke
 wrote:
> Hi,
>
> i have a performance issue with bitmap index scans on huge amounts of big 
> jsonb documents.
>
>
> = Background =
>
> - table with big jsonb documents
> - gin index on these documents
> - queries using index conditions with low selectivity
>
>
> = Example =
>
> select version();
>> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 
>> 20120313 (Red Hat 4.4.7-17), 64-bit
>
> show work_mem;
>> 1GB
>
> -- setup test data
> create table bitmap_scan_test as
> select
> i,
> (select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) 
> from generate_series(0, 100) j) big_jsonb
> from
> generate_series(0, 10) i;
>
> create index on bitmap_scan_test using gin (big_jsonb);
>
> analyze bitmap_scan_test;
>
>
> --  query with bitmap scan
> explain analyze
> select
> count(*)
> from
> bitmap_scan_test
> where
> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';
>
> Aggregate  (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 
> rows=1 loops=1)
>   ->  Bitmap Heap Scan on bitmap_scan_test  (cost=120.78..272.49 rows=100 
> width=0) (actual time=16.496..617.431 rows=5 loops=1)
> Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": 
> "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Heap Blocks: exact=637
> ->  Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx  
> (cost=0.00..120.75 rows=100 width=0) (actual time=16.371..16.371 rows=5 
> loops=1)
>   Index Cond: (big_jsonb @> '[{"x": 1, "filler": 
> "cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
> Planning time: 0.106 ms
> Execution time: 622.334 ms
>
>
> perf top -p... shows heavy usage of pglz_decompress:
>
> Overhead  Shared Object Symbol
>   51,06%  postgres  [.] pglz_decompress
>7,33%  libc-2.12.so  [.] memcpy

Another thing to possibly look at is configuring the column not to
compress; over half the time is spent decompressing the data.  See:
ALTER [ COLUMN ] column_name SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

Naturally this is a huge tradeoff so do some careful analysis before
making the change.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-12-05 Thread Marc-Olaf Jaschke

Thanks for the explanation!

Best Regards,
Marc-Olaf

Marc-Olaf Jaschke · Softwareentwickler
shopping24 GmbH

Werner-Otto-Straße 1-7 · 22179 Hamburg
Telefon: +49 (0) 40 6461 5830 · Fax: +49 (0) 40 64 61 7879
marc-olaf.jasc...@s24.com · www.s24.com
AG Hamburg HRB 63371
vertreten durch Dr. Björn Schäfers und Martin Mildner

2016-12-05 3:28 GMT+01:00 Jeff Janes :

>
> > big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';
>
>
>> I wonder why bitmap heap scan adds such a big amount of time on top of
>> the plain bitmap index scan.
>> It seems to me, that the recheck is active although all blocks are exact
>> [1] and that pg is loading the jsonb for the recheck.
>>
>> Is this an expected behavior?
>>
>
>
> Yes, this is expected.  The gin index is lossy.  It knows that all the
> elements are present (except when it doesn't--large elements might get
> hashed down and suffer hash collisions), but it doesn't know what the
> recursive structure between them is, and has to do a recheck.
>
> For example, if you change your example where clause to:
>
> big_jsonb @> '[{"filler": 1, "x": "cfcd208495d565ef66e7dff9f98764da"}]';
>
> You will see that the index still returns 50,000 rows, but now all of them
> get rejected upon the recheck.
>
> You could try changing the type of index to jsonb_path_ops.  In your given
> example, it won't make a difference, because you are actually counting half
> the table and so half the table needs to be rechecked.  But in my example,
> jsonb_path_ops successfully rejects all the rows at the index stage.
>
> Cheers,
>
> Jeff
>

Re: [PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-12-04 Thread Jeff Janes

> big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';


> I wonder why bitmap heap scan adds such a big amount of time on top of the
> plain bitmap index scan.
> It seems to me, that the recheck is active although all blocks are exact
> [1] and that pg is loading the jsonb for the recheck.
>
> Is this an expected behavior?
>


Yes, this is expected.  The gin index is lossy.  It knows that all the
elements are present (except when it doesn't--large elements might get
hashed down and suffer hash collisions), but it doesn't know what the
recursive structure between them is, and has to do a recheck.

For example, if you change your example where clause to:

big_jsonb @> '[{"filler": 1, "x": "cfcd208495d565ef66e7dff9f98764da"}]';

You will see that the index still returns 50,000 rows, but now all of them
get rejected upon the recheck.

You could try changing the type of index to jsonb_path_ops.  In your given
example, it won't make a difference, because you are actually counting half
the table and so half the table needs to be rechecked.  But in my example,
jsonb_path_ops successfully rejects all the rows at the index stage.

Cheers,

Jeff

[PERFORM] performance issue with bitmap index scans on huge amounts of big jsonb documents

2016-11-30 Thread Marc-Olaf Jaschke

Hi,

i have a performance issue with bitmap index scans on huge amounts of big jsonb 
documents.


= Background = 

- table with big jsonb documents
- gin index on these documents
- queries using index conditions with low selectivity


= Example = 

select version(); 
> PostgreSQL 9.6.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 
> (Red Hat 4.4.7-17), 64-bit

show work_mem;
> 1GB

-- setup test data
create table bitmap_scan_test as
select
i,
(select jsonb_agg(jsonb_build_object('x', i % 2, 'filler', md5(j::text))) from 
generate_series(0, 100) j) big_jsonb
from
generate_series(0, 10) i;

create index on bitmap_scan_test using gin (big_jsonb);

analyze bitmap_scan_test;


--  query with bitmap scan
explain analyze
select
count(*)
from
bitmap_scan_test
where
big_jsonb @> '[{"x": 1, "filler": "cfcd208495d565ef66e7dff9f98764da"}]';

Aggregate  (cost=272.74..272.75 rows=1 width=8) (actual time=622.272..622.272 
rows=1 loops=1)
  ->  Bitmap Heap Scan on bitmap_scan_test  (cost=120.78..272.49 rows=100 
width=0) (actual time=16.496..617.431 rows=5 loops=1)
Recheck Cond: (big_jsonb @> '[{"x": 1, "filler": 
"cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
Heap Blocks: exact=637
->  Bitmap Index Scan on bitmap_scan_test_big_jsonb_idx  
(cost=0.00..120.75 rows=100 width=0) (actual time=16.371..16.371 rows=5 
loops=1)
  Index Cond: (big_jsonb @> '[{"x": 1, "filler": 
"cfcd208495d565ef66e7dff9f98764da"}]'::jsonb)
Planning time: 0.106 ms
Execution time: 622.334 ms


perf top -p... shows heavy usage of pglz_decompress:

Overhead  Shared Object Symbol
  51,06%  postgres  [.] pglz_decompress
   7,33%  libc-2.12.so  [.] memcpy
...

= End of example =  


I wonder why bitmap heap scan adds such a big amount of time on top of the 
plain bitmap index scan. 
It seems to me, that the recheck is active although all blocks are exact [1] 
and that pg is loading the jsonb for the recheck.

Is this an expected behavior?


Regards,
Marc-Olaf


[1] 
(http://dba.stackexchange.com/questions/106264/recheck-cond-line-in-query-plans-with-a-bitmap-index-scan)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance decrease after upgrade to 9.6.1

2016-11-15 Thread Gabriela Serventi

Hi Tom!

Thanks for the answer.

This is just one of the benchmark that we run, we test with fewer clients and 
much more time, but you're right about de scale-factor, we didn't realize about 
that.

We are going to test using your recomendations.

Thanks!

De: Tom Lane <t...@sss.pgh.pa.us>
Enviado: martes, 15 de noviembre de 2016 19:35:03
Para: Gabriela Serventi
Cc: pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] Performance decrease after upgrade to 9.6.1

Gabriela Serventi <gabrielaserve...@hotmail.com> writes:
> $ pgbench -l -c 100 -T 30 pgbench
> starting vacuum...end.
> transaction type: 
> scaling factor: 1
> query mode: simple
> number of clients: 100
> number of threads: 1
> duration: 30 s
> number of transactions actually processed: 27428
> latency average = 110.104 ms
> tps = 908.234296 (including connections establishing)
> tps = 908.278187 (excluding connections establishing)

That's not a tremendously exciting benchmark case, for a number of
reasons:

* 100 sessions in a scale-factor-1 database are all going to be fighting
over updating the single row in the pgbench_branches table.

* 100 sessions driven by a single pgbench thread are probably going to be
bottlenecked by that thread, not by the server.

* 100 sessions on a machine with only 2 cores is going to be all about
process-swap contention anyhow.

My first thought about why the difference from 8.4 to 9.6 is that pgbench
has grown a lot more measurement apparatus since then (for example, the
transaction latency numbers, which weren't there at all in 8.4).  You
might try testing 9.6 server with 8.4 pgbench and vice versa to tease out
how much of this is actually on pgbench changes not the server.  But in
the end, what you're measuring here is mostly contention, and you'd need
to alter the test parameters to make it not so.  The "Good Practices"
section at the bottom of the pgbench reference page has some tips about
that.

regards, tom lane

Re: [PERFORM] Performance decrease after upgrade to 9.6.1

2016-11-15 Thread Tom Lane

Gabriela Serventi  writes:
> $ pgbench -l -c 100 -T 30 pgbench
> starting vacuum...end.
> transaction type: 
> scaling factor: 1
> query mode: simple
> number of clients: 100
> number of threads: 1
> duration: 30 s
> number of transactions actually processed: 27428
> latency average = 110.104 ms
> tps = 908.234296 (including connections establishing)
> tps = 908.278187 (excluding connections establishing)

That's not a tremendously exciting benchmark case, for a number of
reasons:

* 100 sessions in a scale-factor-1 database are all going to be fighting
over updating the single row in the pgbench_branches table.

* 100 sessions driven by a single pgbench thread are probably going to be
bottlenecked by that thread, not by the server.

* 100 sessions on a machine with only 2 cores is going to be all about
process-swap contention anyhow.

My first thought about why the difference from 8.4 to 9.6 is that pgbench
has grown a lot more measurement apparatus since then (for example, the
transaction latency numbers, which weren't there at all in 8.4).  You
might try testing 9.6 server with 8.4 pgbench and vice versa to tease out
how much of this is actually on pgbench changes not the server.  But in
the end, what you're measuring here is mostly contention, and you'd need
to alter the test parameters to make it not so.  The "Good Practices"
section at the bottom of the pgbench reference page has some tips about
that.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] Performance decrease after upgrade to 9.6.1

2016-11-15 Thread Gabriela Serventi

Hello!
We have a server with 8.4.1 that we want to migrate to 9.6.1
Before doing anything, we ran pgbench serveral times.
The results were always similar to the following:

$ pgbench -l -c 100 -T 30 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
query mode: simple
number of clients: 100
duration: 30 s
number of transactions actually processed: 36049
tps = 1193.682690 (including connections establishing)
tps = 1198.838960 (excluding connections establishing)

Then, we follow the procedure in 
https://www.postgresql.org/docs/9.6/static/pgupgrade.html to upgrade the server 
using pg_upgrade.
To install the new version, we downloaded and compiled the sources, with the 
same option that we use with the previous version (configure 
--prefix=/var/lib/pgsql).
We upgrade only one server, so we don't run the steps for replication.

After this, we ran the script analyze_new_cluster.sh, that was created by 
pg_upgrade, to generate statistics.

At this point, we run pgbench again, serveral times, to make the comparision.
The results were always similar to the following:

$ pgbench -l -c 100 -T 30 pgbench
starting vacuum...end.
transaction type: 
scaling factor: 1
query mode: simple
number of clients: 100
number of threads: 1
duration: 30 s
number of transactions actually processed: 27428
latency average = 110.104 ms
tps = 908.234296 (including connections establishing)
tps = 908.278187 (excluding connections establishing)

We ran the statistics again, this time with vacuumdb --all --analyze, no change 
at all.

In the postgresql.conf of the new version (9.6.1), we use this values:
max_connections = 100
superuser_reserved_connections = 3
shared_buffers = 512MB
work_mem = 5MB
maintenance_work_mem = 128MB
effective_cache_size = 1500MB
max_wal_size = 2GB
min_wal_size = 1GB
wal_level = replica

In the postgresql.conf of the old version (8.4.1), we use this values:
max_connections = 100
shared_buffers = 512MB
(The other values are set by default)

We try also with the default values in the new installation, without change in 
the times.

The hardware doesn't change, its a Intel(R) Pentium(R) CPU G3220 @ 3.00GHz with 
2 cores, 2GB of RAM, 500GB SCSI hard disk. The operating system is Enterprise 
Linux Enterprise Linux Server release 5.8, 64 bits.

Any suggestion about what could be the problem?
Thanks!
Gabriela

Re: [PERFORM] Performance of a nested loop, whose inner loop uses an index scan.

2016-10-19 Thread negora


  
  
Hi Matheus:
Thanks for your prompt answer. It's for a web application. This
  part of the application allows to export the answers to a CSV
  file. So pagination isn't possible here. The user can choose among
  several filters. The group of the courses is one of them. She can
  combine as many filters as she wants. So the query that I
  presented in my previous message was one of the "broadest"
  examples. But it's the one that I'm interested in.

Really, I'm more interested in the relative time than in the
absolute time. Because I could create the file
  asynchronously, in the background, so that the user downloaded it
  at a later time. That's not the problem. My doubt is if 2.8
  seconds is the best that I can do. Is it an acceptable time?
Thank you! ;)


On 19/10/16 13:15, Matheus de Oliveira
  wrote:


  

  On Wed, Oct 19, 2016 at 8:54 AM,
negora 
wrote:

  
    Nested Loop  (cost=245.92..383723.28 rows=7109606
width=38) (actual
time=1.091..2616.553 rows=8906075 loops=1)

  
  

I wonder about the use-case for this
  query, because it returns more than 8M rows, so 2.6 seconds
  that sounds that much for so many rows. Is it for an end user
  application? Isn't there any kind of pagination?


  
  -- 
  

  Matheus de Oliveira

Re: [PERFORM] Performance of a nested loop, whose inner loop uses an index scan.

2016-10-19 Thread Matheus de Oliveira

On Wed, Oct 19, 2016 at 8:54 AM, negora  wrote:

> Nested Loop  (cost=245.92..383723.28 rows=7109606 width=38) (actual
> time=1.091..2616.553 rows=8906075 loops=1)
>

I wonder about the use-case for this query, because it returns more than 8M
rows, so 2.6 seconds that sounds that much for so many rows. Is it for an
end user application? Isn't there any kind of pagination?

-- 
Matheus de Oliveira

[PERFORM] Performance of a nested loop, whose inner loop uses an index scan.

2016-10-19 Thread negora

Hello:

I've a question about the performance of a query plan that uses a nested
loop, and whose inner loop uses an index scan. Would you be so kind to
help me, please?

I'm using PostgreSQL 9.5.4 on Ubuntu 14.04 64-bit (kernel 4.8.2). I've 3
tables, which are "answers", "test_completions" and "courses". The first
one contains around ~30 million rows, whereas the others only have a few
thousands each one. The query that I'm performing is very simple,
although retrieves lots of rows:

-
SELECT answers.*
FROM answers
JOIN test_completions ON test_completions.test_completion_id =
answers.test_completion_id
JOIN courses ON courses.course_id = test_completions.course_id
WHERE courses.group_id = 2;
-


This yields the following plan:

-
Nested Loop  (cost=245.92..383723.28 rows=7109606 width=38) (actual
time=1.091..2616.553 rows=8906075 loops=1)
->  Hash Join  (cost=245.36..539.81 rows=3081 width=8) (actual
time=1.077..6.087 rows=3123 loops=1)
Hash Cond: (test_completions.course_id =
courses.course_id)
->  Seq Scan on test_completions  (cost=0.00..214.65
rows=13065 width=16) (actual time=0.005..1.051 rows=13065 loops=1)
->  Hash  (cost=204.11..204.11 rows=3300 width=8)
(actual time=1.063..1.063 rows=3300 loops=1)
Buckets: 4096  Batches: 1  Memory Usage:
161kB
->  Bitmap Heap Scan on courses 
(cost=45.86..204.11 rows=3300 width=8) (actual time=0.186..0.777
rows=3300 loops=1)
Recheck Cond: (group_id = 2)
Heap Blocks: exact=117
->  Bitmap Index Scan on
fki_courses_group_id_fkey  (cost=0.00..45.03 rows=3300 width=0) (actual
time=0.172..0.172 rows=3300 loops=1)
Index Cond:
(group_id = 2)
   ### HERE ###
->  Index Scan using fki_answers_test_completion_id_fkey on
answers  (cost=0.56..96.90 rows=2747 width=38) (actual time=0.007..0.558
rows=2852 loops=3123)
   ### HERE ###
Index Cond: (test_completion_id =
test_completions.test_completion_id)
Planning time: 0.523 ms
Execution time: 2805.530 ms
-

My doubt is about the inner loop of the nested loop, the one that I've
delimited with  ### HERE ### . This loop is the part that, obviously,
more time consumes. Because its run 3,123 times and requires lots of
accesses to multiple database pages. But, Is there anything that I can
do to reduce even more the time spent in this part? Apart of:

* Clustering the "answers" table.
* Upgrading PostgreSQL to version 9.6, to take advantage of the
index scans in parallel.
* Upgrading the hardware.

Thank you!



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [PERFORMANCE] Performance index and table

2016-07-22 Thread Tom Lane

Oscar Camuendo  writes:
> I'm working on Postgresql 9.5.3 and executed a query which takes 5 or 7 
> seconds and it should not take more than 0.30 milliseconds, the query is:

Have you ANALYZEd your tables lately?  Some of these estimated row counts
seem awfully far off for no very good reason.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] [PERFORMANCE] Performance index and table

2016-07-22 Thread Oscar Camuendo

I'm working on Postgresql 9.5.3 and executed a query which takes 5 or 7 seconds 
and it should not take more than 0.30 milliseconds, the query is:


---QUERY--


with recursive t(level,parent_id,id) as (
select 0,parent_id,id from parties where parent_id = 105
union
select t.level + 1,c.parent_id,c.id from parties c join t on c.parent_id = t.id
)
select distinct id from t order by id;

--


The parties table has 245512 rows and one index named 
"index_parties_on_parent_id" , so I added an EXPLAIN ANALYZE VERBOSE to get 
more details and it was the result:


RESULT--


Sort (cost=21237260.78..21237261.28 rows=200 width=4) (actual 
time=6850.338..6850.343 rows=88 loops=1)
Output: t.id
Sort Key: t.id
Sort Method: quicksort Memory: 29kB
CTE t
-> Recursive Union (cost=0.43..20562814.38 rows=29974967 width=12) (actual 
time=0.072..6850.180 rows=88 loops=1)
-> Index Scan using index_parties_on_parent_id on public.parties 
(cost=0.43..3091.24 rows=807 width=8) (actual time=0.064..0.154 rows=23 loops=1)
Output: 0, parties.parent_id, parties.id
Index Cond: (parties.parent_id = 105)
-> Hash Join (cost=777279.14..1996022.38 rows=2997416 width=12) (actual 
time=2245.623..2283.290 rows=22 loops=3)
Output: (t_1.level + 1), c.parent_id, c.id
Hash Cond: (t_1.id = c.parent_id)
-> WorkTable Scan on t t_1 (cost=0.00..161.40 rows=8070 width=8) (actual 
time=0.002..0.009 rows=29 loops=3)
Output: t_1.level, t_1.id
-> Hash (cost=606642.73..606642.73 rows=10400673 width=8) (actual 
time=2206.149..2206.149 rows=1742 loops=3)
Output: c.parent_id, c.id
Buckets: 2097152 Batches: 16 Memory Usage: 16388kB
-> Seq Scan on public.parties c (cost=0.00..606642.73 rows=10400673 width=8) 
(actual time=71.070..2190.318 rows=244249 loops=3)
Output: c.parent_id, c.id
-> HashAggregate (cost=674436.76..674438.76 rows=200 width=4) (actual 
time=6850.291..6850.305 rows=88 loops=1)
Output: t.id
Group Key: t.id
-> CTE Scan on t (cost=0.00..599499.34 rows=29974967 width=4) (actual 
time=0.075..6850.236 rows=88 loops=1)
Output: t.id
Planning time: 0.815 ms
Execution time: 7026.026 ms

--

So, I could see that index_parties_on_parent_id showed 10400673 rows and 
checking index_parties_on_parent_id index I get this information: num_rows = 
10400673 and index_size = 310 MB

Could Anybody explain me why the difference between parties table = 245512 and 
index_parties_on_parent_id index = 10400673? and How could I improve this index 
and its response time?

Re: [PERFORM] Performance problems with 9.2.15

2016-07-22 Thread Johan Fredriksson

fre 2016-07-22 klockan 19:08 +1200 skrev Mark Kirkwood:
> On 22/07/16 13:07, Johan Fredriksson wrote:
> > And by the way, I have also tried to upgrade to Postgresql 9.4.8 (the 
> > latest version in postgresl.org's own repository) without improvment.
> >
> 
> Not sure what repo you are using, but 9.5.3 and 9.6 Beta are the 
> *actual* latest versions. Now I'm not sure they will actually help your 
> particular query, but are probably worth a try out!

The one I found on https://www.postgresql.org/download/linux/redhat/

That page points out
http://yum.postgresql.org/9.4/redhat/rhel-6-x86_64/pgdg-redhat94-9.4-1.noarch.rpm
 as the latest. Perhaps the download-page need to be updated?


/ Eskil




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance problems with 9.2.15

2016-07-22 Thread Mark Kirkwood


On 22/07/16 13:07, Johan Fredriksson wrote:

And by the way, I have also tried to upgrade to Postgresql 9.4.8 (the latest 
version in postgresl.org's own repository) without improvment.



Not sure what repo you are using, but 9.5.3 and 9.6 Beta are the 
*actual* latest versions. Now I'm not sure they will actually help your 
particular query, but are probably worth a try out!


regards

Mark




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance problems with 9.2.15

2016-07-21 Thread Johan Fredriksson

And by the way, I have also tried to upgrade to Postgresql 9.4.8 (the latest 
version in postgresl.org's own repository) without improvment.

/ Eskil

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance problems with 9.2.15

2016-07-21 Thread Johan Fredriksson

I can add that setting enable_nestloop = 0 cuts the runtime for this query down 
to about 4 seconds.
Disabling nested loops globaly does however impacts performance of a lot of 
other queries.

/ Eskil 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance problems with 9.2.15

2016-07-21 Thread Claudio Freire

On Thu, Jul 21, 2016 at 3:29 PM, David G. Johnston
 wrote:
> On Thu, Jul 21, 2016 at 2:24 PM, Claudio Freire 
> wrote:
>>
>> That cross join doesn't look right. It has no join condition.
>
>
> That is that the definition of a "CROSS JOIN"...
>
> David J.

Well, maybe it shouldn't be.

A cross join I mean.

I see the query and a cross join there doesn't make much sense.

There's no filtering of the output rows on the where clause either
AFAICT, and it's producing a lot of intermediate rows that don't seem
to be necessary. That was my point.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance problems with 9.2.15

2016-07-21 Thread David G. Johnston

On Thu, Jul 21, 2016 at 2:24 PM, Claudio Freire 
wrote:

> That cross join doesn't look right. It has no join condition.


That is that the definition of a "CROSS JOIN"...

David J.

Re: [PERFORM] Performance problems with 9.2.15

2016-07-21 Thread Claudio Freire

On Thu, Jul 21, 2016 at 11:48 AM, Johan Fredriksson  wrote:
> EXPLAIN ANALYZE VERBOSE SELECT DISTINCT main.* FROM Users main CROSS
> JOIN ACL ACL_3 JOIN Principals Principals_1  ON ( Principals_1.id =
> main.id ) JOIN CachedGroupMembers CachedGroupMembers_2  ON
> ( CachedGroupMembers_2.MemberId = Principals_1.id ) JOIN
> CachedGroupMembers CachedGroupMembers_4  ON
> ( CachedGroupMembers_4.MemberId = Principals_1.id )  WHERE
> ((ACL_3.ObjectType = 'RT::Queue' AND ACL_3.ObjectId   = 85) OR
> (ACL_3.ObjectType = 'RT::System' AND ACL_3.ObjectId   = 1)) AND
> (ACL_3.PrincipalId = CachedGroupMembers_4.GroupId) AND
> (ACL_3.PrincipalType = 'Group') AND (ACL_3.RightName = 'OwnTicket') AND
> (CachedGroupMembers_2.Disabled = '0') AND (CachedGroupMembers_2.GroupId
> = '4') AND (CachedGroupMembers_4.Disabled = '0') AND
> (Principals_1.Disabled = '0') AND (Principals_1.PrincipalType = 'User')
> AND (Principals_1.id != '1')  ORDER BY main.Name ASC;


That cross join doesn't look right. It has no join condition.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance problems with 9.2.15

2016-07-21 Thread Johan Fredriksson

> > > The rowcount estimates from 9.2 seem greatly different from the 8.4 plan.
> > > Did you remember to ANALYZE all the tables after migrating?  Maybe there
> > > were some table-specific statistics targets that you forgot to transfer
> > > over?  In any case, the 9.2 plan looks like garbage-in-garbage-out to
> > > me :-( ... without estimates at least a little closer to reality, the
> > > planner is unlikely to do anything very sane.
> > > 
> > > (BTW, I wonder why you are moving only to 9.2 and not something more
> > > recent.)
> > 
> > You put me on the right track with your conclusion that the estimates
> > were off the chart. The quick-and-dirty fix "DELETE FROM pg_statistic;"
> > solved this problem. This database now have to build up sane estimates
> > from scratch.
> 
> Actually it took a VACUUM FULL; and DELETE FROM pg_statistic; followed
> by ANALYZE on all tables to get it right.

It worked last time, but this time it does not work. I have deleted all
data in the table pg_statistic and run ANALYZE on all tables but the
planner still make crappy optimizations. How can I adjust the estimates
to make the planner work better?

Last time it was in testing, this time it is in production, so urgent
help is needed, please!

This query now takes 90 seconds and it should not take more than 4-5
seconds.

EXPLAIN ANALYZE VERBOSE SELECT DISTINCT main.* FROM Users main CROSS
JOIN ACL ACL_3 JOIN Principals Principals_1  ON ( Principals_1.id =
main.id ) JOIN CachedGroupMembers CachedGroupMembers_2  ON
( CachedGroupMembers_2.MemberId = Principals_1.id ) JOIN
CachedGroupMembers CachedGroupMembers_4  ON
( CachedGroupMembers_4.MemberId = Principals_1.id )  WHERE
((ACL_3.ObjectType = 'RT::Queue' AND ACL_3.ObjectId   = 85) OR
(ACL_3.ObjectType = 'RT::System' AND ACL_3.ObjectId   = 1)) AND
(ACL_3.PrincipalId = CachedGroupMembers_4.GroupId) AND
(ACL_3.PrincipalType = 'Group') AND (ACL_3.RightName = 'OwnTicket') AND
(CachedGroupMembers_2.Disabled = '0') AND (CachedGroupMembers_2.GroupId
= '4') AND (CachedGroupMembers_4.Disabled = '0') AND
(Principals_1.Disabled = '0') AND (Principals_1.PrincipalType = 'User')
AND (Principals_1.id != '1')  ORDER BY main.Name ASC;


   
QUERY
PLAN
 

   
---
---
---
 Unique  (cost=8907.68..8907.76 rows=1 width=336) (actual
time=92075.721..92076.336 rows=176 loops=1)
   Output: main.id, main.name, main.password, main.authtoken,
main.comments, main.signature, main.emailaddress,
main.freeformcontactinfo, main.organization, main.realname,
main.nickname, main.lang, main.gecos, m
ain.homephone, main.workphone, main.mobilephone, main.pagerphone,
main.address1, main.address2, main.city, main.state, main.zip,
main.country, main.timezone, main.creator, main.created,
main.lastupdatedby, main.
lastupdated, main.smimecertificate
   ->  Sort  (cost=8907.68..8907.69 rows=1 width=336) (actual
time=92075.720..92075.748 rows=607 loops=1)
 Output: main.id, main.name, main.password, main.authtoken,
main.comments, main.signature, main.emailaddress,
main.freeformcontactinfo, main.organization, main.realname,
main.nickname, main.lang, main.ge
cos, main.homephone, main.workphone, main.mobilephone, main.pagerphone,
main.address1, main.address2, main.city, main.state, main.zip,
main.country, main.timezone, main.creator, main.created,
main.lastupdatedby,
 main.lastupdated, main.smimecertificate
 Sort Key: main.name, main.id, main.password, main.authtoken,
main.comments, main.signature, main.emailaddress,
main.freeformcontactinfo, main.organization, main.realname,
main.nickname, main.lang, main.
gecos, main.homephone, main.workphone, main.mobilephone,
main.pagerphone, main.address1, main.address2, main.city, main.state,
main.zip, main.country, main.timezone, main.creator, main.created,
main.lastupdatedb
y, main.lastupdated, main.smimecertificate
 Sort Method: quicksort  Memory: 243kB
 ->  Nested Loop  (cost=20.37..8907.67 rows=1 width=336) (actual

Re: [PERFORM] Performance of LIKE/NOT LIKE when used in single query

2016-06-09 Thread Jeff Janes

On Tue, Jun 7, 2016 at 9:57 PM, Ed Felstein  wrote:
> Hello,
> First time poster here.  Bear with me.
> Using PostgreSQL 9.5
> I have a situation where I have a LIKE and a NOT LIKE in the same query to
> identify strings in a varchar field.  Since I am using wildcards, I have
> created a GIN index on the field in question, which makes LIKE '%%'
> searches run very fast.  The problem is the NOT LIKE phrases, which (as
> would be expected) force a sequential scan.  Being that we're talking about
> millions of records, this is not desirable.
> Here's the question...
> Is there a way, using a single query, to emulate the process of running the
> LIKE part first, then running the NOT LIKE just on those results?

Just do it.  In my hands, the planner is smart enough to figure it out
for itself.

explain analyze select * from stuff where synonym like '%BAT%' and
synonym not like '%col not like%' ;

   QUERY PLAN
-
 Bitmap Heap Scan on stuff  (cost=16.10..63.08 rows=13 width=14)
(actual time=9.465..10.642 rows=23 loops=1)
   Recheck Cond: (synonym ~~ '%BAT%'::text)
   Rows Removed by Index Recheck: 76
   Filter: (synonym !~~ '%col not like%'::text)
   Heap Blocks: exact=57
   ->  Bitmap Index Scan on integrity_synonym_synonym_idx
(cost=0.00..16.10 rows=13 width=0) (actual time=8.847..8.847 rows=99
loops=1)
 Index Cond: (synonym ~~ '%BAT%'::text)
 Planning time: 18.261 ms
 Execution time: 10.932 ms


So it is using the index for the positive match, and filtering those
results for the negative match, just as you wanted.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2129 matches

Mail list logo