Re: [PERFORM] insert performance for win32
> > One thing I did notice that in a 250k insert transaction the insert time > > grows with #recs inserted. Time to insert first 50k recs is about 27 > > sec and last 50 k recs is 77 sec. I also confimed that size of table is > > not playing a role here. > > > > Marc, can you do select timeofday() every 50k recs from linux? Also a > > gprof trace from linux would be helpful. > > > > Here's the timeofday ... i'll do the gprof as soon as I can. > Every 5 rows... > Were those all in a single transaction? Merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] insert performance for win32
On Tuesday 06 September 2005 19:11, Merlin Moncure wrote: > > This makes me wonder if we are looking in the wrong place. Maybe the > > problem is coming from psql? More results to follow. > > problem is not coming from psql. > > One thing I did notice that in a 250k insert transaction the insert time > grows with #recs inserted. Time to insert first 50k recs is about 27 > sec and last 50 k recs is 77 sec. I also confimed that size of table is > not playing a role here. > > Marc, can you do select timeofday() every 50k recs from linux? Also a > gprof trace from linux would be helpful. > Here's the timeofday ... i'll do the gprof as soon as I can. Every 5 rows... Wed Sep 07 13:58:13.860378 2005 CEST Wed Sep 07 13:58:20.926983 2005 CEST Wed Sep 07 13:58:27.928385 2005 CEST Wed Sep 07 13:58:35.472813 2005 CEST Wed Sep 07 13:58:42.825709 2005 CEST Wed Sep 07 13:58:50.789486 2005 CEST Wed Sep 07 13:58:57.553869 2005 CEST Wed Sep 07 13:59:04.298136 2005 CEST Wed Sep 07 13:59:11.066059 2005 CEST Wed Sep 07 13:59:19.368694 2005 CEST > Merlin > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert performance for win32
> On Tuesday 06 September 2005 19:11, Merlin Moncure wrote: > Here's the timeofday ... i'll do the gprof as soon as I can. > Every 5 rows... > > Wed Sep 07 13:58:13.860378 2005 CEST > Wed Sep 07 13:58:20.926983 2005 CEST > Wed Sep 07 13:58:27.928385 2005 CEST > Wed Sep 07 13:58:35.472813 2005 CEST > Wed Sep 07 13:58:42.825709 2005 CEST > Wed Sep 07 13:58:50.789486 2005 CEST > Wed Sep 07 13:58:57.553869 2005 CEST > Wed Sep 07 13:59:04.298136 2005 CEST > Wed Sep 07 13:59:11.066059 2005 CEST > Wed Sep 07 13:59:19.368694 2005 CEST ok, I've been in crunching profile profile graphs, and so far have been only been able to draw following conclusions. For bulk, 'in-transaction' insert: 1. win32 is slower than linux. win32 time for each insert grows with # inserts in xact, linux does not (or grows much slower). Win32 starts out about 3x slower and grows to 10x slower after 250k inserts. 2. ran a 50k profile vs. 250k profile. Nothing jumps out as being slower or faster: most time is spent in yyparse on either side. From this my preliminary conclusion is that there is something going on in the win32 api which is not showing in the profile. 3. The mingw gprof cumulative seconds does not show measurable growth in cpu time/insert in 50k/250k profile. I'm now talking suggestions about where to look for performance problems :(. Merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [pgsql-hackers-win32] [PERFORM] insert performance for win32
"Merlin Moncure" <[EMAIL PROTECTED]> writes: > ok, I've been in crunching profile profile graphs, and so far have been > only been able to draw following conclusions. > For bulk, 'in-transaction' insert: > 1. win32 is slower than linux. win32 time for each insert grows with # > inserts in xact, linux does not (or grows much slower). Win32 starts > out about 3x slower and grows to 10x slower after 250k inserts. Just to be clear: what you were testing was BEGIN; INSERT ... VALUES (...); repeat insert many times COMMIT; with each statement issued as a separate PQexec() operation, correct? Was this set up as a psql script, or specialized C code? (If a psql script, I wonder whether it's psql that's chewing the time.) > 2. ran a 50k profile vs. 250k profile. Nothing jumps out as being > slower or faster: most time is spent in yyparse on either side. From > this my preliminary conclusion is that there is something going on in > the win32 api which is not showing in the profile. Hmm. Client/server data transport maybe? It would be interesting to try inserting the same data in other ways: * COPY from client * COPY from disk file * INSERT/SELECT from another table and see whether you see a similar slowdown. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Query take 101 minutes, help, please
Hello, I'm a newbie in postgresql, I've installed it on a Windows XP machine ( I can't use linux, it's a company machine ), I'm courious why this query takes so long SELECT "Rut Cliente" FROM "Internet_Abril" WHERE "Rut Cliente" NOT IN ((SELECT "Rut Cliente" FROM "Internet_Enero") UNION (SELECT "Rut Cliente" FROM "Internet_Febrero") UNION (SELECT "Rut Cliente" FROM "Internet_Marzo")); it takes about 100 minutes to complete the query. All tables has index created ( Rut Cliente is a VarChar ), and tables has 50.000 records each. The explain for the query tells the following "QUERY PLAN Seq Scan on "Internet_Abril" (cost=19406.67..62126112.70 rows=24731 width=13) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=19406.67..21576.07 rows=136740 width=13) -> Unique (cost=17784.23..18467.93 rows=136740 width=13) -> Sort (cost=17784.23..18126.08 rows=136740 width=13) Sort Key: "Rut Cliente" -> Append (cost=0.00..3741.80 rows=136740 width=13) -> Subquery Scan "*SELECT* 1" (cost=0.00..1233.38 rows=45069 width=13) -> Seq Scan on "Internet_Enero" (cost=0.00..782.69 rows=45069 width=13) -> Subquery Scan "*SELECT* 2" (cost=0.00..1104.06 rows=40353 width=13) -> Seq Scan on "Internet_Febrero" (cost=0.00..700.53 rows=40353 width=13) -> Subquery Scan "*SELECT* 3" (cost=0.00..1404.36 rows=51318 width=13) -> Seq Scan on "Internet_Marzo" (cost=0.00..891.18 rows=51318 width=13) Any help will be apreciated, It's for my thesis saludos Christian ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query take 101 minutes, help, please
On Wed, Sep 07, 2005 at 12:22:27PM -0400, Christian Compagnon wrote: > I'm a newbie in postgresql, I've installed it on a Windows XP machine > ( I can't use linux, it's a company machine ), I'm courious why this > query takes so long It sounds like you've set work_mem too low; increasing it might help. Also try rewriting your query to SELECT "Rut Cliente" FROM "Internet_Abril" WHERE "Rut Cliente" NOT IN ( SELECT "Rut Cliente" FROM "Internet_Enero" ) AND "Rut Cliente" NOT IN ( SELECT "Rut Cliente" FROM "Internet_Febrero" ) AND "Rut Cliente" NOT IN ( SELECT "Rut Cliente" FROM "Internet_Marzo" ) (I'm not sure how optimized UNION inside an IN/NOT IN is.) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Query take 101 minutes, help, please
PG is creating the union of January, February and March tables first and that doesn't have an index on it. If you're going to do many queries using the union of those three tables, you might want to place their contents into one table and create an index on it. Otherwise, try something like this: SELECT "Rut Cliente" FROM "Internet_Abril" WHERE "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM "Internet_Enero") AND "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM "Internet_Febrero") AND "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM "Internet_Marzo"); You could also compare the performance of that to this and choose the one that works the best: SELECT "Rut Cliente" FROM "Internet_Abril" a LEFT JOIN "Internet_Enero" e ON a."Rut Cliente" = e."Rut Cliente" LEFT JOIN "Internet_Febrero" f ON a."Rut Cliente" = f."Rut Cliente" LEFT JOIN "Internet_Marzo" m ON a."Rut Cliente" = m."Rut Cliente" WHERE e."Rut Cliente" IS NULL AND f."Rut Cliente" IS NULL and m."Rut Cliente" IS NULL; MeeteshOn 9/7/05, Christian Compagnon <[EMAIL PROTECTED]> wrote: Hello,I'm a newbie in postgresql, I've installed it on a Windows XP machine( I can't use linux, it's a company machine ), I'm courious why thisquery takes so longSELECT "Rut Cliente"FROM "Internet_Abril" WHERE "Rut Cliente" NOT IN ((SELECT "Rut Cliente" FROM"Internet_Enero") UNION (SELECT "Rut Cliente" FROM"Internet_Febrero") UNION (SELECT "Rut Cliente" FROM "Internet_Marzo"));it takes about 100 minutes to complete the query.All tables has index created ( Rut Cliente is a VarChar ), and tableshas 50.000 records each.The explain for the query tells the following "QUERY PLAN Seq Scan on "Internet_Abril" (cost=19406.67..62126112.70 rows=24731 width=13) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=19406.67..21576.07 rows=136740 width=13) -> Unique (cost=17784.23..18467.93 rows=136740 width=13)-> Sort (cost=17784.23..18126.08 rows=136740 width=13) SortKey: "Rut Cliente" -> Append (cost=0.00..3741.80 rows=136740 width=13) -> Subquery Scan "*SELECT* 1" (cost=0.00..1233.38rows=45069 width=13) -> Seq Scan on "Internet_Enero" (cost=0.00..782.69rows=45069 width=13) -> Subquery Scan "*SELECT* 2" (cost=0.00..1104.06rows=40353 width=13) -> Seq Scan on "Internet_Febrero" (cost=0.00..700.53rows=40353 width=13) -> Subquery Scan "*SELECT* 3" (cost=0.00..1404.36rows=51318 width=13)-> Seq Scan on "Internet_Marzo" (cost=0.00..891.18rows=51318 width=13)Any help will be apreciated, It's for my thesissaludosChristian---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query take 101 minutes, help, please
On Wed, 7 Sep 2005, Meetesh Karia wrote: > PG is creating the union of January, February and March tables first and > that doesn't have an index on it. If you're going to do many queries using > the union of those three tables, you might want to place their contents into > one table and create an index on it. > > Otherwise, try something like this: > > SELECT "Rut Cliente" > FROM "Internet_Abril" > WHERE "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM > "Internet_Enero") > AND "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM > "Internet_Febrero") > AND "Rut Cliente" NOT IN (SELECT "Rut Cliente" FROM > "Internet_Marzo"); You may also wish to try: SELECT "Rut Cliente" FROM "Internet_Abril" WHERE NOT EXISTS (SELECT 1 FROM "Internet_Enero" WHERE "Internet_Enero"."Rut Cliente"="Internet_Abril"."Rut Cliente") AND NOT EXISTS (SELECT 1 FROM "Internet_Febrero" WHERE "Internet_Febrero"."Rut Cliente"="Internet_Abril"."Rut Cliente") AND NOT EXISTS (SELECT 1 FROM "Internet_Marzo" WHERE "Internet_Marzo"."Rut Cliente"="Internet_Abril"."Rut Cliente") which will probably scan the indexes on the January, February and March indexes once for each row in the April table. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query take 101 minutes, help, please
"Steinar H. Gunderson" <[EMAIL PROTECTED]> writes: > (I'm not sure how optimized UNION inside an IN/NOT IN is.) NOT IN is pretty nonoptimal, period. It'd help a lot to boost work_mem to the point where the planner figures it can use a hashtable (look for EXPLAIN to say "hashed subplan" rather than just "subplan"). Of course, if there's enough stuff in the UNION that that drives you into swapping, it's gonna be painful anyway. Using UNION ALL instead of UNION might save a few cycles too. If you're willing to rewrite the query wholesale, you could try the old trick of a LEFT JOIN where you discard rows for which there's a match, ie, the righthand join value isn't NULL. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Poor performance of delete by primary key
On Wed, Sep 07, 2005 at 11:07:04AM +0800, Christopher Kings-Lynne wrote: > >Unfortunately there's no very simple way to determine which FK is the > >problem. (In 8.1 it'll be possible to do that with EXPLAIN ANALYZE, > >but in existing releases EXPLAIN doesn't break out the time spent in > >each trigger ...) You have to just eyeball the schema :-(. > > phpPgAdmin has a handy info feature where you can see all tables that > refer to the current one. You can always go and steal that query to > find them... You can also use pg_user_foreighn_key* from http://pgfoundry.org/projects/newsysviews/. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly