Re: [PERFORM] Problems with an update-from statement and pg-8.1.4

2006-12-06 Thread Ted Allen

Stephan Szabo wrote:

On Wed, 6 Dec 2006, Rafael Martinez wrote:

  

We are having some problems with an UPDATE ... FROM sql-statement and
pg-8.1.4. It takes ages to finish. The problem is the Seq Scan of the
table 'mail', this table is over 6GB without indexes, and when we send
thousands of this type of statement, the server has a very high iowait
percent.

How can we get rid of this Seq Scan?

I send the output of an explain and table definitions:
-

mailstats=# EXPLAIN update mail SET spamscore = '-5.026'  FROM mail m,
mail_received mr where mr.mail_id = m.mail_id AND mr.queue_id =
'1GrxLs-0004N9-I1' and mr.mailhost = '129.240.10.47';



I don't think this statement does what you expect. You're ending up with
two copies of mail in the above one as "mail" and one as "m". You probably
want to remove the mail m in FROM and use mail rather than m in the
where clause.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

  
Worse yet I think your setting "spamcore" for EVERY row in mail to 
'-5.026'.  The above solution should fix it though.


-- Ted

*
* 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Very slow queries

2007-01-30 Thread Ted Allen

What indexes do those tables have?  Any?

Sidar López Cruz wrote:

Check this:

query: Delete From ceroriesgo.salarios Where numero_patrono Not In 
(Select numero_patrono From ceroriesgo.patronos)


Seq Scan on salarios  (cost=51021.78..298803854359.95 rows=14240077 
width=6)

 Filter: (NOT (subplan))
 SubPlan
   ->  Materialize  (cost=51021.78..69422.58 rows=1032980 width=25)
 ->  Seq Scan on patronos  (cost=0.00..41917.80 rows=1032980 
width=25)



These query took a day to finish, how or who can improove better 
performance of my PostgreSQL.


_
Charla con tus amigos en línea mediante MSN Messenger: 
http://messenger.latam.msn.com/



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org




--

*Edward Allen*
Software Engineer
Black Duck Software, Inc.

[EMAIL PROTECTED] 
T +1.781.891.5100 x133
F +1.781.891.5145
http://www.blackducksoftware.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Very slow queries

2007-01-31 Thread Ted Allen

How many rows were delete last time you ran the query?

Chad's query looks good but here is another variation that may help. 


Delete From ceroriesgo.salarios Where numero_patrono In (Select
ceroriesgo.salarios.numero_patrono From ceroriesgo.salarios Left Join 
ceroriesgo.patronos Using (numero_patrono) Where  
ceroriesgo.patronos.numero_patrono Is Null)


Hope that Helps,
Ted

Sidar López Cruz wrote:





From: "Chad Wagner" <[EMAIL PROTECTED]>
To: "Sidar López Cruz" <[EMAIL PROTECTED]>
CC: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very slow queries
Date: Tue, 30 Jan 2007 17:37:17 -0500

On 1/30/07, Sidar López Cruz <[EMAIL PROTECTED]> wrote:


query: Delete From ceroriesgo.salarios Where numero_patrono Not In 
(Select

numero_patrono From ceroriesgo.patronos)

Seq Scan on salarios  (cost=51021.78..298803854359.95 rows=14240077
width=6)
  Filter: (NOT (subplan))
  SubPlan
->  Materialize  (cost=51021.78..69422.58 rows=1032980 width=25)
  ->  Seq Scan on patronos  (cost=0.00..41917.80 rows=1032980
width=25)



How many rows exist in salarios, but not in patronos?  How many rows are
there in salarios?


Rows:
Patronos: 1032980
Salarios:  28480200



What does the explain look like for:

delete
from ceroriesgo.salarios s
where not exists (select 1
   from ceroriesgo.patronos
 where numero_patrono = s.numero_patrono);

Also, is this not a case for a foreign key with a cascade delete?


No, this is not cascade delete case because I need to delete from 
salarios not from patronos.




http://www.postgresql.org/docs/8.2/static/ddl-constraints.html


--
Chad
http://www.postgresqlforums.com/


_
Charla con tus amigos en línea mediante MSN Messenger: 
http://messenger.latam.msn.com/



---(end of broadcast)---
TIP 6: explain analyze is your friend




--

*Edward Allen*
Software Engineer
Black Duck Software, Inc.

[EMAIL PROTECTED] 
T +1.781.891.5100 x133
F +1.781.891.5145
http://www.blackducksoftware.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Subselect query enhancement

2007-02-01 Thread Ted Allen
I've found that doing joins seems to produce better results on the big 
tables queries I use.  This is not always the case though.


How about this option:

SELECT distinct ip_info.* FROM ip_info RIGHT JOIN network_events USING 
(ip) RIGHT JOIN  host_events USING (ip) WHERE 
(network_events.name='blah' OR host_events.name = 'blah')  AND 
ip_info.ip IS NOT NULL;


That gets rid of the sub-queries your using that look pretty costly.

Michael Artz wrote:

I'm needing help determining the best all-around query for the
following situation.  I have primary table that holds ip information
and two other tables that hold event data for the specific IP in with
a one-to-many mapping between them, ie:

CREATE TABLE ip_info (
   ip IP4,
   --other data
);

CREATE TABLE network_events (
   ip IP4 NOT NULL REFERENCES ip_info(ip),
   name VARCHAR,
   port INTEGER,
   --other data
);

CREATE TABLE host_events (
   ip IP4 NOT NULL REFERENCES ip_info(ip),
   name VARCHAR
   port INTEGER,
   --other data
);

There is quite a bit of commonality between the network_events and
host_events schemas, but they do not currently share an ancestor.
ip_info has about 13 million rows, the network_events table has about
30 million rows, and the host_events table has about 7 million rows.
There are indexes on all the rows.

The query that I would like to execute is to select all the rows of
ip_info that have either  network or host events that meet some
criteria, i.e. name='blah'.  I have 3 different possibilities that I
have thought of to execute this.

First, 2 'ip IN (SELECT ...)' statements joined by an OR:

SELECT * FROM ip_info
 WHERE ip IN (SELECT ip FROM network_events WHERE name='blah')
   OR ip IN (SELECT ip FROM host_events WHERE name='blah');

Next, 1 'ip IN (SELECT ... UNION SELECT ...) statement:

SELECT * FROM ip_info
 WHERE ip IN (SELECT ip FROM network_events WHERE name='blah'
 UNION
 SELECT ip FROM host_events WHERE name='blah');

Or, finally, the UNION statment with DISTINCTs:

SELECT * FROM ip_info
 WHERE ip IN (SELECT DISTINCT ip FROM network_events WHERE name='blah'
 UNION
 SELECT DISTINCT ip FROM host_events WHERE name='blah');


From what I have read, the UNION statement does an implicit DISTINCT,

but I thought that doing it on each of the tables would result in
slightly faster execution.  Can you think of any other ways to
implement the previous query?

I have explained/analyzed all the queries but, unfortunately, they are
on an isolated computer.  The gist is that, for relatively
low-incidence values of name, the UNION performs better, but for
queries on a common name, the dual-subselect query performs better.

The explains look something like:
Dual-subselect:
Seq scan on ip_info
 Filter:  ... AND ((hashed_subplan) OR (hashed_subplan))
 Subplan
   -> Result
 -> Append
   -> various scans on host_events
   -> Result
 -> Append
   -> various scans on network_events

UNION SELECT DISTINCT:

Nested Loop
 -> Unique
   -> Sort
 -> Append
   -> Unique
 -> Sort
   -> Result
 -> Append
   -> various scans on host_events
   -> Unique
 -> Sort
   -> Result
 -> Append
   -> various scans on network_events

If it would help to have more information, I could retype some of
numbers in the explain.

Any ideas?

Thanks,
-Mike

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings




--

*Edward Allen*
Software Engineer
Black Duck Software, Inc.

[EMAIL PROTECTED] 
T +1.781.891.5100 x133
F +1.781.891.5145
http://www.blackducksoftware.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Troubles dumping a very large table.

2008-12-24 Thread Ted Allen
(NOTE: I tried sending this email from my excite account and it appears 
to have been blocked for whatever reason.  But if the message does get 
double posted, sorry for the inconvenience.)


Hey all,

Merry Christmas Eve, Happy Holidays, and all that good stuff.  At my 
work, I'm trying to upgrade my system from a 8.1 to 8.3 and I'm dumping 
a few large static tables ahead of time to limit the amount of downtime 
during the upgrade.  The trouble is, when I dump the largest table, 
which is 1.1 Tb with indexes, I keep getting the following error at the 
same point in the dump.


pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  invalid string enlargement 
request size 1

pg_dump: The command was: COPY public.large_table (id, data) TO stdout;

As you can see, the table is two columns, one column is an integer, and 
the other is bytea.   Each cell in the data column can be as large as 
600mb (we had bigger rows as well but we thought they were the source of 
the trouble and moved them elsewhere to be dealt with separately.)


We are dumping the table using this command.

/var/lib/pgsql-8.3.5/bin/pg_dump  -O -x -t large_table mydb | gzip -c  
>  large_table.pgsql.gz


Originally we tried dumping the table with 
'/var/lib/pgsql-8.3.5/bin/pg_dump  -O -x -t -F c > large_table.dump' but 
that was to cpu intensive and slowed down other db processes too much.  
It failed using that command as well, but I believe it is because we did 
not have enough postgres temp hard drive space.  We have since symlinked 
the postgres temp space to a much bigger file system.


The stats of the db server is as follows,

Processors:  4x Opteron 2.4 Ghz cores
Memory: 16 GB 
Disks: 42x 15K SCSI 146 GB disks.


Also, the large table has been vacuumed recently.  Lastly, we are dumping the 
table over nfs to very large sata array.


Thanks again and Happy Holidays,
Ted


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Troubles dumping a very large table.

2008-12-26 Thread Ted Allen
600mb measured by get_octet_length on data.  If there is a better way to 
measure the row/cell size, please let me know because we thought it was the 
>1Gb problem too.  We thought we were being conservative by getting rid of the 
larger rows but I guess we need to get rid of even more.

Thanks,
Ted

From: Tom Lane [...@sss.pgh.pa.us]
Sent: Wednesday, December 24, 2008 12:49 PM
To: Ted Allen
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Troubles dumping a very large table.

Ted Allen  writes:
> during the upgrade.  The trouble is, when I dump the largest table,
> which is 1.1 Tb with indexes, I keep getting the following error at the
> same point in the dump.

> pg_dump: SQL command failed
> pg_dump: Error message from server: ERROR:  invalid string enlargement
> request size 1
> pg_dump: The command was: COPY public.large_table (id, data) TO stdout;

> As you can see, the table is two columns, one column is an integer, and
> the other is bytea.   Each cell in the data column can be as large as
> 600mb (we had bigger rows as well but we thought they were the source of
> the trouble and moved them elsewhere to be dealt with separately.)

600mb measured how?  I have a feeling the problem is that the value
exceeds 1Gb when converted to text form...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Troubles dumping a very large table.

2008-12-28 Thread Ted Allen
I was hoping use pg_dump and not to have to do a manual dump but if that 
latest solution (moving rows >300mb elsewhere and dealing with them 
later) does not work I'll try that. 


Thanks everyone.

Merlin Moncure wrote:

On Fri, Dec 26, 2008 at 12:38 PM, Tom Lane  wrote:
  

Ted Allen  writes:


600mb measured by get_octet_length on data.  If there is a better way to measure 
the row/cell size, please let me know because we thought it was the >1Gb 
problem too.  We thought we were being conservative by getting rid of the larger 
rows but I guess we need to get rid of even more.
  

Yeah, the average expansion of bytea data in COPY format is about 3X :-(
So you need to get the max row length down to around 300mb.  I'm curious
how you got the data in to start with --- were the values assembled on
the server side?



Wouldn't binary style COPY be more forgiving in this regard?  (if so,
the OP might have better luck running COPY BINARY)...

This also goes for libpq traffic..large (>1mb) bytea definately want
to be passed using the binary switch in the protocol.

merlin
  



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] More Troubles Dumping a Very Large Table

2009-01-23 Thread Ted Allen

Hey All,

I previously post about the troubles I was having dumping a >1Tb (size 
with indexes) table.  The rows in the table could be very large.  Using 
perl's DBD::Pg we were some how able to add these very large rows 
without running in to the >1Gb row bug.  With everyones help I 
determined I needed to get move the largest rows elsewhere.   This 
seemed to solve that problem but a new problem has cropped up.


When I ran pg_dump again, it ran successfully without error.  Although 
there were no errors, pg_dump dumped less then half of the rows that 
actually exist in the table.   When examining the dump file (I did not 
dump in -F c format)  the copy statement created by the dump is 
terminated correctly (with a \.) and there are indeed 300+ million rows 
in the file as opposed to the 700+ million I was expecting.  I don't 
believe I specified anything that would have caused pg_dump to dump a 
truncated version of the table.  The last row successfully dumped 
contains only normal ascii characters and is not particularly big in 
size.   The row immediately after the last row successfully dumped 
contains an installer file (.bin) stored in a bytea cell.  It is about 
138 Mb in size. 


I've also been having troubles recreating this situation on a smaller DB.

We are dumping the table using this command.

/var/lib/pgsql-8.3.5/bin/pg_dump  -O -x -t large_table mydb | gzip -c  
>  large_table.pgsql.gz



The stats of the db server is as follows,

Processors:  4x Opteron 2.4 Ghz cores
Memory: 16 GB 
Disks: 42x 15K SCSI 146 GB disks.




Thanks again for or your help,
Ted

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance