Hello,
I'd like to tune Postgres for large data import (using Copy from).
here are a few steps already done:
1) use 3 different disks for:
-1: source data
-2: index tablespaces
-3: data tablespaces
2) define all foreign keys as initially deferred
Hi,
I'm using a workaround for this kind of issues:
consider:
select A from
(select B from T1 where C
union
select B from T2 where C
union
select B from T3 where C
) foo
where D
in your
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
But INT2, INT4, INT8 and SERIAL are considered to be a unique datatype.
Am I Right?
Thanks,
Marc
--
Geschenkt: 3 Monate GMX ProMail gratis + 3 Ausgaben stern gratis
++
Hello,
I've split my data in daily tables to keep them in an acceptable size.
Now I have quite complex queries which may be very long if I need to query a
large number of daily tables.
I've just made a first test wich resulted in a query being 15KB big annd
containing 63 UNION.
The Query
.
hth,
Marc Mamin
From:
[EMAIL PROTECTED] on behalf of Oliver
CrosbySent: Wed 7/20/2005 3:50 AMTo: PFCCc:
Sven Willenberger; Dawid Kuroczko; Kevin Grittner; [EMAIL PROTECTED];
pgsql-performance@postgresql.orgSubject: Re: [PERFORM] Looking for
tips
Sorry for the lack of specifics...We have
Hello Peter,
If you are dealing with timed data or similar, you may consider to
partition your table(s).
In order to deal with large data, I've built a logical partition
system,
whereas the target partition is defined by the date of my data (the date
is part of the filenames that I import...).
You may try to change the planner's opinion using sub queries. Something
like:
select * from
eventactivity,
(select * from
keyword_incidents,
eventmain,
eventgeo
where
configuration below)
- has anybody built a similar workflow ?
- could this be a feature request to extend the capabilities of copy
from ?
Thanks for your time and attention,
Marc Mamin
Hello,
I have a simple table id/value, and a function that returns the id of a
given value, inserting the later if not yet present. The probability
that a value already exist within the reference table is very high.
Different db users may have their own reference table with different
content,
cpus) and the benchmark server; one of the target of this
benchmark is to verify the scalability of our application.
And you have no reason to be envious as the server doesn't belong us :-)
Thanks for your comments,
Marc Mamin
Posgres version: 8.2.1
Server Specifications
customers, also with data quite equally splitted
in 3 scheemas.
I will post our configuration(s) later on.
Thanks again for all your valuable input.
Marc Mamin
---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings
Hello,
I had a similar issue and -atfer testing - decided to merge the tables
B and C into a single table.
In my case the resulting table contains a large proportion of nulls
which limits the size increase...
You'll have to do some testing with your data to evaluate the
performance gain.
Hope to
Hi,
Maybe you can try this syntax. I'm not sure, but it eventually perform better:
delete from company_alias USING comprm
where company_alias.company_id =comprm.id
Cheers,
Marc
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your
Hello,
To improve performances, I would like to try moving the temp_tablespaces
locations outside of our RAID system.
Is it a good practice ?
Thanks,
Marc Mamin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http
example is the best method though.
Marc Mamin
SELECT
l1.id AS id1,
l2.id AS id2
FROM
location l1,
location l2
WHERE l1.objectid = 1
AND (l2.start BETWEEN l1.start AND l1.end
OR
l1.start BETWEEN l2.start AND l2.end
)
l1.start
AND l2.start
.end
..
UNION
..
WHERE l1.start BETWEEN l2.start AND l2.end
..
The first clause being equivalent to
AND l1.start = l2.end
AND l1.end = l2.start
AND l1.start = l2.start
I don't know how you have to deal the limit conditions...
Marc Mamin
--
Sent via pgsql-performance mailing
Hello Matthew,
Another idea:
Are your objects limited to some smaller ranges of your whole interval ?
If yes you may possibly reduce the ranges to search for while using an
additional table with the min(start) max(end) of each object...
Marc Mamin
Hello,
I didn't try it, but following should be slightly faster:
COUNT( CASE WHEN field = x AND field y THEN true END)
intead of
SUM( CASE WHEN field = x AND field y THEN 1 ELSE 0 END)
HTH,
Marc Mamin
From: pgsql-performance-ow...@postgresql.org
is naive, but why can't posgres use multiple
threads for large sort operation ?
best regards,
Marc Mamin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
No, CONCURRENTLY is to improve table availability during index creation, but it
degrades the performances.
best regards,
Marc Mamin
-Original Message-
From: Alex Hunsaker [mailto:bada...@gmail.com]
Sent: Donnerstag, 11. November 2010 19:55
To: Marc Mamin
Cc: pgsql-performance
formula on the fly.
best regards,
Marc Mamin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
the given transaction.
regards,
Marc Mamin
-Ursprüngliche Nachricht-
Von: pgsql-performance-ow...@postgresql.org im Auftrag von Shrirang Chitnis
Gesendet: Mi 12/8/2010 8:05
An: Bryce Nesbitt; pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] hashed subplan 5000x slower than two
Another point: would a conditionl index help ?
on articles (context_key) where indexed
regards,
-Ursprüngliche Nachricht-
Von: pgsql-performance-ow...@postgresql.org im Auftrag von Marc Mamin
Gesendet: Mi 12/8/2010 9:06
An: Shrirang Chitnis; Bryce Nesbitt; pgsql-performance
Hello,
UNION will remove all duplicates, so that the result additionally
requires to be sorted.
Anyway, for performance issues, you should always start investigation
with explain analyze .
regards,
Marc Mamin
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow
)
JOIN mediawiki.pagecontent ss on (r.rev_id=ss.old_id)
WHERE (ss.textvector @@ (to_tsquery('fotbal')))
HTH,
Marc Mamin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
);
exception
when unique_violation then
update t set dat = a_dat where id = a_id and dat a_dat;
return 0;
end;
elsif not test then
update t set dat = a_dat where id = a_id;
return 0;
end if;
return 1;
best regards,
Marc Mamin
-Ursprüngliche Nachricht
version.
CREATE OR REPLACE FUNCTION input_value_un (in_inputs numeric[], in_input_nr
numeric)
RETURNS numeric AS
$BODY$
SELECT u[1][2]
FROM unnest($1, SLICE =1) u
WHERE u[1][1]=in_input_nr
LIMIT 1;
$BODY$
LANGUAGE sql IMMUTABLE;
best regards,
Marc Mamin
), (CHF,7.5)}
b {(USD,10.0)}
regards,
Marc Mamin
On 12/14/2011 11:21 AM, Marc Mamin wrote:
Hello,
For such cases (see below), it would be nice to have an unnest
function that only affect the first array dimension.
Something like
unnest(ARRAY[[1,2],[2,3]], SLICE=1
(t_value,t_record,output_id) where t_imp.id is
not null.
regards,
Marc Mamin
-Ursprüngliche Nachricht-
Von: pgsql-performance-ow...@postgresql.org im Auftrag von Jochen Erwied
Gesendet: Sa 1/7/2012 12:57
An: anto...@inaps.org
Cc: pgsql-performance@postgresql.org
Betreff: Re: [PERFORM
that model, but am first looking for a quick way to restore
performance on our production servers.
best regards,
Marc Mamin
);
without stats: http://explain.depesz.com/s/qPg
with stats: http://explain.depesz.com/s/88q
aserr_20120125_tvi: GIN Index on my_func(.,.,.,.,.,.)
best regards,
Marc Mamin
-Original Message-
From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
ow...@postgresql.org
repeated the test with an additional search term at the last
position, but without significant change:
(result from the 6. test below)
without analyze: http://explain.depesz.com/s/6At
with analyze:http://explain.depesz.com/s/r3B
best regards,
Marc Mamin
Here all my results, always one
Von: Robert Haas [mailto:robertmh...@gmail.com]
Gesendet: Mi 2/29/2012 7:32
On Mon, Feb 6, 2012 at 6:05 AM, Marc Mamin m.ma...@intershop.de wrote:
without analyze: http://explain.depesz.com/s/6At
with analyze:http://explain.depesz.com/s/r3B
...
The problem seems
SELECT ARRAY(SELECT * FROM generate_series(1,500));
explain analyze SELECT _array[1] FROM t1;
Total runtime: 0.125 ms
explain analyze SELECT _array[1] FROM t2;
Total runtime: 8.649 ms
best regards,
Marc Mamin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org
-Original Message-
From: Pavel Stehule [mailto:pavel.steh...@gmail.com]
2012/6/26 Marc Mamin m.ma...@intershop.de:
On 22/06/12 09:02, Maxim Boguk wrote:
May be I completely wrong but I always assumed that the access
speed to the array element in PostgreSQL should be close
but one in
the background
best regards,
Marc Mamin
I wonder, what is the fastest way to accomplish this kind of task in
PostgreSQL. I am interested in
the fastest solutions ever possible.
I need the fastest cleaning strategy for such case working on PostgreSQL
both 8 and 9.
I see
,
Marc Mamin
-Original Message-
From: pgsql-performance-ow...@postgresql.org on behalf of Ioannis
Anagnostopoulos
Sent: Sat 7/21/2012 1:56 AM
To: Tom Lane
Cc: Claudio Freire; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] A very long running query
On 21/07/2012 00:10, Tom
%4 =3;
(can be built in parallel using separate threads)
Then you will have to ensure that all your WHERE clauses also contain the index
condition:
WHERE session_id = 27 AND session_id%4 =27%4
regards,
Marc Mamin
= T3_2.id
ORDER BY T1.mycolumn2,T1.id
regards,
Marc Mamin
Von: pgsql-performance-ow...@postgresql.org
[pgsql-performance-ow...@postgresql.org]quot; im Auftrag von quot;Brian
Fehrle [bri...@consistentstate.com]
Gesendet: Montag, 15. Juli 2013 18:12
An: pgsql
://www.postgresql.org/message-id/c4dac901169b624f933534a26ed7df310861b...@jenmail01.ad.intershop.net
regards,
Marc Mamin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
.
I hope this issue is being tracked in preparation for 9.3.
Regards, Stefan
2013/7/19 Marc Mamin m.ma...@intershop.de:
SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
plainto_tsquery('english', 'good');
It's slow ( 30 sec.) for some GB (27886 html files, originally 73 MB
zipped
regards,
Marc Mamin
here an example to highlight possible performance loss:
create temp table ref ( i int, r int);
create temp table val ( i int, v int);
insert into ref select s,s%2 from generate_series(1,1)s;
insert into val select s,s%2 from generate_series(1,1)s;
create or replace
to do with immutability and can
only applies to few cases
e.g: it's fine for select x+1 ... group by x,
but not forselect x^2 ... group by x
Marc Mamin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http
Hello,
Does anything speaks again adding a WITH FREEZE option to CREATE TABLE AS ,
similar to the new COPY FREEZE feature ?
best regards,
Marc Mamin
will probably start again soon.
Would it make sense to use a tool like e4defrag
(http://www.linux.org/threads/online-defragmentation.4121/)
in order to defrag the free space ?
And how safe is it to use such a tool against a running postgres instance?
many thanks,
Marc Mamin
--
Sent via pgsql
may be an issue when all your indexes are similar; e.g. all on single int4
columns.
in other cases the writes will not all take place concurrently.
To reduce I/O due to swap, you can consider increasing maintenance_work_mem on
the connextions/sessionns
that build the indexes.
regards,
Marc Mamin
[Craig]
If you haven't looked at clustering algorithms yet, you might want to do so.
Your problem is a special case of clustering, where you have a large number
of small clusters. A good place to start is the overview on Wikipedia:
http://en.wikipedia.org/wiki/Cluster_analysis
According to this
:
It marks all null values in a bit map within the row header so you just need
about one bit per null
instead of 4 bytes for zeros, and hence get rid of your I/O issue.
regards,
Marc Mamin
Von: pgsql-performance-ow...@postgresql.org
[pgsql-performance-ow
AlexK987 alex.cue@gmail.com writes:
I've created a GIN index on an INT[] column, but it slows down the selects.
Here is my table:
create table talent(person_id INT NOT NULL,
skills INT[] NOT NULL);
insert into talent(person_id, skills)
select generate_series, array[0, 1] ||
]
-- (with changed operator)
You might variate your query according to an additional table that keeps the
occurrence count of all skills.
Not really pretty though.
regards,
Marc Mamin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription
://www.postgresql.org/docs/9.3/static/trigger-definition.html )
regards,
Marc Mamin
:
http://explain.depesz.com/
regards,
Marc Mamin
===
Nested Loop (cost=33666.96..37971.39 rows=1 width=894) (actual
time=443.556..966558.767 rows=45360 loops=1)
Join Filter: (tp_exec.touchpoint_execution_id
,
Marc Mamin
- no other processes are likely to be interfering; nothing other than
PostgreSQL runs on this machine (except for normal OS processes and New Relic
server monitoring service); concurrent activity in PostgreSQL is low-level and
unrelated, and this effect is observed systematically
seen such
an issue.
regards,
Marc Mamin
' as product_id_1,
site_id,
product_id,
view_count,
purchase_count
FROM SALL
WHERE product_id != '82503';
regards,
Marc Mamin
explain select
e1.product_id,
e2.site_id,
e2.product_id,
count(nullif(e2.type='viewed', false)) view_count,
count(nullif(e2.type='purchased', false
some insight in what's going on:
http://www.postgresql.org/docs/9.4/static/auto-explain.html
Regards,
Marc Mamin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
r($2,
> 1)), greatest(array_upper($1, 1),array_upper($2, 1)), 1) AS i
> ) sub
>GROUP BY i
>ORDER BY i
>);
> $$ LANGUAGE sql STRICT IMMUTABLE;
it seems that both the GROUP and ORDER BY are superfluous and adding some
cycles.
regards,
Marc Mamin
--
Sent vi
;
regards,
Marc Mamin
regards, tom lane
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
ealing with large tables.
here is a good starting link for this topic:
http://stackoverflow.com/questions/12604744/does-the-order-of-columns-in-a-postgres-table-impact-performance
regards,
Marc Mamin
pending lists : is there a concurrency problem, or can both tasks
cleanly work in parallel ?
best regards,
Marc mamin
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
further than not cleaning the pending list?
As I understand it, this list will be merged into the index automatically when
it get full, independently from the vaccum setting.
Can it be an index bloating issue ?
and last but not least, can I reduce the problem by configuration ?
regards,
Marc
uery can still be optimized:
=>
select count(*)
from claims
where exists (select *
from unnest("ICD9_DGNS_CD") x_
where x_ like '427%'
)
regards,
Marc Mamin
> So I figured I'd create a Function to encapsulate the concept:
>
> CREATE OR REPLACE
62 matches
Mail list logo