# Re: [PERFORM] ETL optimization

```Hi,

At 21:38 23/06/2005, Bricklen Anderson wrote:
```
```Situation:
```
I'm trying to optimize an ETL process with many upserts (~100k aggregated rows)
```(no duplicates allowed). The source (table t2) table holds around 14 million
rows, and I'm grabbing them 100,000 rows at a time from t2, resulting in about
100,000 distinct rows in the destination table (t1).```
```

What I've tried:

```
i. FOR EXECUTE LOOP over my result set (aggregated results, 100k-ish rows), and
```try an update first, check the ROW_COUNT, if 0, then do an insert.
...
run time: approx. 25 mins

ii. in a function (pseudo code), (table name is dynamic):
...
up_stm :=
'UPDATE '||t1||' SET x=t2.x
FROM    (select sum(x),a,b,c
from t2
group by a,b,c) as t2
WHERE '||t1||'.a=t2.a AND '||t1||'.b=t2.b AND '||t1||'.c=t3.c';

EXECUTE up_stm;

ins_stm :=
'INSERT INTO '||t1||' (x,a,b,c) select x,a,b,c
FROM (select sum(x) as x,a,b,c from t2 group by a,b,c) as t2
WHERE NOT EXISTS
(select true from '||t1||'
where '||t1||'.a=t2.a
and '||t1||'.b=t2.b
and '||t1||'.c=t2.c
limit 1)';

EXECUTE ins_stm;
...
```
```
```
I have a similar situation, and the solution I use (though I haven't really tested many different situations):
```- have a trigger ON INSERT which does:
```
UPDATE set whatever_value=NEW.whatever_value,... WHERE whatever_key=NEW.whatever.key AND...
```IF FOUND THEN
RETURN NULL;
ELSE
RETURN NEW;
END IF;
- use COPY

```
For optimal performance, a different trigger function is created for each table, which allows the query plan of the UPDATE to be cached.
```
Let us know how that works out for you and if you find a better solution!

Jacques.