Re: Disabling/Enabling index before bulk loading

2018-07-10 Thread Ravi Krishna
> 
> I am very suspicious of why you need 16 indexes. Are you sure all those 
> indexes are actually being utilized?
> Try executing the attached query, You may find find some are really not 
> needed.

This is a DATAMART application and the indexes are to satisfy a large number of 
queries possible from tableau. Keep in mind
this table has 200+ cols. 
That said, it is possible that a few of them may be unnecessary.  I will run 
your script and check that out.

thanks for the script.



Re: Disabling/Enabling index before bulk loading

2018-07-10 Thread Melvin Davidson
> We recently did a test on COPY and found that on large tables (47 million
rows , 20GB of raw data) the
> difference in COPY with 16 indexes...

*I am very suspicious of why you need 16 indexes. Are you sure all those
indexes are actually being utilized?*

*Try executing the attached query, You may find find some are really not
needed.*

-- 
*Melvin Davidson*
*Maj. Database & Exploration Specialist*
*Universe Exploration Command – UXC*
Employment by invitation only!
SELECT n.nspname as schema,
   i.relname as table,
   i.indexrelname as index,
   i.idx_scan,
   i.idx_tup_read,
   i.idx_tup_fetch,
   pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' || 
quote_ident(i.relname))) AS table_size, 
   pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' || 
quote_ident(i.indexrelname))) AS index_size,
   pg_get_indexdef(idx.indexrelid) as idx_definition
  FROM pg_stat_all_indexes i
  JOIN pg_class c ON (c.oid = i.relid)
  JOIN pg_namespace n ON (n.oid = c.relnamespace)
  JOIN pg_index idx ON (idx.indexrelid =  i.indexrelid )
 WHERE i.idx_scan = 0
   AND n.nspname <> 'pg_catalog'
   AND NOT idx.indisprimary
   AND NOT idx.indisunique
 ORDER BY 1, 2, 3;




Re: Disabling/Enabling index before bulk loading

2018-07-10 Thread Tim Cross


Ravi Krishna  writes:

> We recently did a test on COPY and found that on large tables (47 million 
> rows , 20GB of raw data) the 
> difference in COPY with 16 indexes and COPY without any index is 1:14. That 
> is, COPY is 14 times slower 
> when data is ingested with all indexes as opposed to COPY first without index 
> and then create all index.
>

This is very system dependent. On our system, when we tested a similar
approach, we found that the time saved through dropping the indexes
before copy was lost when rebuilding the indexes afterwards. In fact, it
ended up being slightly slower.

I suspect a lot depends on the number and types of indexes you
have. Your example had a lot more indexes than ours. We were loading 22
batches with 2.5M records per batch. While copy was significantly faster
than transaction based inserts (minutes vs hours), the differences between
indexes and no indexes was measured in minutes. We only had 3 or 4
indexes.   

> I googled for earlier posting on this and it looks like this has been asked 
> before too.  
>
> This is what I am thinking to do:
>
> 1 - Extract index definition and save it as a SQL somewhere, either a file or 
> a table.
> 2 - Drop all indexes.
> 3 - Ingest data via COPY
> 4 - Recreate all indexes saved in (1).
>
> Is there a generic sql or script or tool to accomplish (1).
>

We are loading data via Javascript using pg and pg-copy-streams modules. It is
pretty straight forward to drop the indexes and recreate them afterwards
via sql, so we didn't look for a tool as such.

As data is only inserted into this table and only by this process, we
also turned off autovacuum for this table, performing vacuum and analyze
manually after load. 

Tim

-- 
Tim Cross



Re: Disabling/Enabling index before bulk loading

2018-07-10 Thread Nikolay Samokhvalov
On Tue, Jul 10, 2018 at 12:26 PM Ravi Krishna  wrote:

>
>
>
> https://fle.github.io/temporarily-disable-all-indexes-of-a-postgresql-table.html
>
>
>
>
> This does not work in RDS.  In order to update system catalog tables
> (pg_index), one needs privileges which is
> denied in RDS. In RDS terminology, the user must belong to role rdsadmin
> and that role is reserved only for AWS
> internal users.
>

Right, but pl/pgsql works fine there, so you can backup all indexes
and then restore them using `do $$ ... $$ language plpgsql;`, as Thomas
Kellerer already suggested.

One note here: there is some old confusion between concepts of unique
indexes and unique constraints in Postgres (see
https://www.postgresql.org/message-id/flat/CANNMO%2BKHkkDg-FBi0_78ADmfLiT9kODmz%2B8m6fR6f5kPL-n_ZQ%40mail.gmail.com#cfb3a9eaed8649d7d24ad7944ccb37cf),
so there is a chance that after such backup/restore procedure you'll get
logically correct but phisically different state.


Re: Disabling/Enabling index before bulk loading

2018-07-10 Thread Ravi Krishna
> 
> 
> https://fle.github.io/temporarily-disable-all-indexes-of-a-postgresql-table.html
>  
> 
>  


This does not work in RDS.  In order to update system catalog tables 
(pg_index), one needs privileges which is
denied in RDS. In RDS terminology, the user must belong to role rdsadmin and 
that role is reserved only for AWS 
internal users.

Re: Disabling/Enabling index before bulk loading

2018-07-10 Thread Vick Khera
On Tue, Jul 10, 2018 at 1:13 PM, Ravi Krishna  wrote:

> >
> > Did you include the time to CREATE INDEX after the COPY or is the 1:14
> only for the COPY stage?
>
> Yes.
>
> Time taken to load 47 mil rows with all 16 indexes intact:  14+ hrs
> Time taken to load the same after dropping index and then loading and
> finally creating 16 indexes: 1 hr 40 min
>


https://fle.github.io/temporarily-disable-all-indexes-of-a-postgresql-table.html


Re: Disabling/Enabling index before bulk loading

2018-07-10 Thread Ravi Krishna
> 
> Did you include the time to CREATE INDEX after the COPY or is the 1:14 only 
> for the COPY stage?

Yes.

Time taken to load 47 mil rows with all 16 indexes intact:  14+ hrs
Time taken to load the same after dropping index and then loading and finally 
creating 16 indexes: 1 hr 40 min

Frankly I am surprised by this staggering difference.





Re: Disabling/Enabling index before bulk loading

2018-07-10 Thread Adrian Klaver

On 07/10/2018 07:08 AM, Ravi Krishna wrote:


We recently did a test on COPY and found that on large tables (47 million rows 
, 20GB of raw data) the
difference in COPY with 16 indexes and COPY without any index is 1:14. That is, 
COPY is 14 times slower
when data is ingested with all indexes as opposed to COPY first without index 
and then create all index.


Did you include the time to CREATE INDEX after the COPY or is the 1:14 
only for the COPY stage?




I googled for earlier posting on this and it looks like this has been asked 
before too.

This is what I am thinking to do:

1 - Extract index definition and save it as a SQL somewhere, either a file or a 
table.
2 - Drop all indexes.
3 - Ingest data via COPY
4 - Recreate all indexes saved in (1).

Is there a generic sql or script or tool to accomplish (1).

thanks




--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Disabling/Enabling index before bulk loading

2018-07-10 Thread Thomas Kellerer
Ravi Krishna schrieb am 10.07.2018 um 16:08:
> 
> We recently did a test on COPY and found that on large tables (47 million 
> rows , 20GB of raw data) the 
> difference in COPY with 16 indexes and COPY without any index is 1:14. That 
> is, COPY is 14 times slower 
> when data is ingested with all indexes as opposed to COPY first without index 
> and then create all index.
> 
> I googled for earlier posting on this and it looks like this has been asked 
> before too.  
> 
> This is what I am thinking to do:
> 
> 1 - Extract index definition and save it as a SQL somewhere, either a file or 
> a table.
> 2 - Drop all indexes.
> 3 - Ingest data via COPY
> 4 - Recreate all indexes saved in (1).
> 
> Is there a generic sql or script or tool to accomplish (1).

You can do this with a script like this:

-- First store the definitions of all the indexes in a table:

create table index_backup 
as
select *
from pg_indexes
where schemaname = 'public'
  and tablename = 'the_table';

-- now drop all the indexes:
do
$$
declare
  l_rec record;
begin
  for l_rec in select schemaname, indexname from index_backup
  loop
execute format('drop index %I.%I', l_rec.schemaname, l_rec.indexname);
  end loop;
end;
$$

-- now insert the data

...

-- and restore all indexes

do
$$
declare
  l_rec record;
begin
  for l_rec in select indexdef from index_backup
  loop
 execute l_rec.indexdef;
  end loop;
end;
$$