RE: [pgsql-es-ayuda] Bajo rendimiento en postgresql cuando se lanza un delete

Francisco Manuel Quintana Trujillo Fri, 31 Jul 2009 12:37:13 -0700

Hola,

La estructura de la base de datos pertenece a un proyecto GIS desarrollado por 
www.52North.org por lo tanto puedo realizar modificaciones hasta cierto punto.


Realicé los siguientes cambios:

En la tabla "observation" eliminé temporalmente las reglas que estaban creadas 
-- Rule: "offering_delete_actualization ON observation"
-- Rule: "offering_insert_actualization ON observation" No afecta al delete
-- Rule: "offering_update_actualization ON observation" No afecta al delete

Además, modifiqué la consulta 
delete from observation
where observation_id in (select observation_id from observation EXCEPT select 
observation_id from quality);

ni que decir tiene que la mejora ha sido bestial. Resultado del explain

http://explain.depesz.com/s/llp


El problema que estoy encontrando ahora es que el micro no está trabajando ni 
al 10% y en la carpeta pgsql_tmp se han creado unos 70 ficheros que no paran de 
crecer.
 
Saludos, Oliver

Siento no haber enviado las respuestas a la lista, error mío.



-----Mensaje original-----
De: Fernando Hevia [mailto:fhe...@ip-tel.com.ar] 
Enviado el: viernes, 31 de julio de 2009 16:50
Para: Francisco Manuel Quintana Trujillo
CC: pgsql-es-ayuda@postgresql.org
Asunto: RE: [pgsql-es-ayuda] Bajo rendimiento en postgresql cuando se lanza un 
delete

Por favor, responde siempre con copia a la lista. 
Los demás podrán seguir y participar en la conversación.

Lo que voy observando es que parece que no tienes un índice sobre
quality.observation_id. Crealo.


Luego sobre tu estructura de datos en si... ¿los id de tus datos son textos
de 100 caracteres? Esa tabla debe rondar los 25-30 GB sin contar los
índices.
Sería mejor identificar los procedures, features of interes, phenomenoms y
offerings con un ID numérico y que sus respectivas tablas almacenen el valor
de 100 caracteres mientras que en observations sólo referencias el
identificador numérico. Se reducirá el tamaño de la tabla.

La tabla observation tiene 8 indices, prácticamente todos sus campos están
indexados. Me pregunto si los necesitas a todos.
Ten en cuenta que cada índice impacta en la performance de escritura.

También el índice observation_time_stamp_key es gigante. No es que
necesariamente esté mal ya que implementas una unique constraint, pero
reevalúa si es realmente necesario. Cambiando los campos referenciados de
varchar a integer o bigint esto debería mejorar considerablemente.

Para tu hardware y estructura de datos no me sorprende lo que demora el
delete.

Saludos,
Fernando.



> -----Mensaje original-----
> De: Francisco Manuel Quintana Trujillo 
> [mailto:fquint...@itccanarias.org] 
> Enviado el: Viernes, 31 de Julio de 2009 04:32
> Para: Fernando Hevia
> Asunto: RE: [pgsql-es-ayuda] Bajo rendimiento en postgresql 
> cuando se lanza un delete
> 
> 
> Hola,
> 
> Primero que nada agradecer a todos vuestra colaboración.
> 
> Creación de la tabla Observation
> 
> CREATE TABLE observation
> (
>   time_stamp timestamp with time zone NOT NULL,
>   procedure_id character varying(100) NOT NULL,
>   feature_of_interest_id character varying(100) NOT NULL,
>   phenomenon_id character varying(100) NOT NULL,
>   offering_id character varying(100) NOT NULL,
>   text_value text,
>   numeric_value numeric,
>   mime_type character varying(100),
>   observation_id serial NOT NULL,
>   CONSTRAINT observation_pkey PRIMARY KEY (observation_id),
>   CONSTRAINT observation_feature_of_interest_id_fkey FOREIGN KEY
(feature_of_interest_id)
>       REFERENCES feature_of_interest (feature_of_interest_id) 
>       MATCH SIMPLE ON UPDATE CASCADE ON DELETE NO ACTION,
>   CONSTRAINT observation_offering_id_fkey FOREIGN KEY (offering_id)
>       REFERENCES offering (offering_id) MATCH SIMPLE
>       ON UPDATE CASCADE ON DELETE NO ACTION,
>   CONSTRAINT observation_phenomenon_id_fkey FOREIGN KEY (phenomenon_id)
>       REFERENCES phenomenon (phenomenon_id) MATCH SIMPLE
>       ON UPDATE CASCADE ON DELETE NO ACTION,
>   CONSTRAINT observation_procedure_id_fkey FOREIGN KEY (procedure_id)
>       REFERENCES "procedure" (procedure_id) MATCH SIMPLE
>       ON UPDATE CASCADE ON DELETE NO ACTION,
>   CONSTRAINT observation_time_stamp_key UNIQUE (time_stamp, 
> procedure_id, feature_of_interest_id, phenomenon_id, offering_id)
> )
> 
> -- Index: foiobstable
> 
> -- DROP INDEX foiobstable;
> 
> CREATE INDEX foiobstable
>   ON observation
>   USING btree
>   (feature_of_interest_id);
> 
> -- Index: numericvalueobstable
> 
> -- DROP INDEX numericvalueobstable;
> 
> CREATE INDEX numericvalueobstable
>   ON observation
>   USING btree
>   (numeric_value);
> 
> -- Index: offobstable
> 
> -- DROP INDEX offobstable;
> 
> CREATE INDEX offobstable
>   ON observation
>   USING btree
>   (offering_id);
> 
> -- Index: phenobstable
> 
> -- DROP INDEX phenobstable;
> 
> CREATE INDEX phenobstable
>   ON observation
>   USING btree
>   (phenomenon_id);
> 
> -- Index: procobstable
> 
> -- DROP INDEX procobstable;
> 
> CREATE INDEX procobstable
>   ON observation
>   USING btree
>   (procedure_id);
> 
> -- Index: textvalueobstable
> 
> -- DROP INDEX textvalueobstable;
> 
> CREATE INDEX textvalueobstable
>   ON observation
>   USING btree
>   (text_value);
> 
> 
> -- Rule: "offering_delete_actualization ON observation"
> 
> -- DROP RULE offering_delete_actualization ON observation;
> 
> CREATE OR REPLACE RULE offering_delete_actualization AS
>     ON DELETE TO observation DO ( UPDATE offering SET 
> min_time = ( SELECT min(observation.time_stamp) AS min
>            FROM observation
>           WHERE old.offering_id::text = offering.offering_id::text)
>   WHERE old.offering_id::text = offering.offering_id::text 
> AND (old.time_stamp = offering.min_time OR offering.min_time 
> IS NULL);  UPDATE offering SET max_time = ( SELECT 
> max(observation.time_stamp) AS max
>            FROM observation
>           WHERE old.offering_id::text = offering.offering_id::text)
>   WHERE old.offering_id::text = offering.offering_id::text 
> AND (old.time_stamp = offering.max_time OR offering.max_time 
> IS NULL); );
> 
> -- Rule: "offering_insert_actualization ON observation"
> 
> -- DROP RULE offering_insert_actualization ON observation;
> 
> CREATE OR REPLACE RULE offering_insert_actualization AS
>     ON INSERT TO observation DO ( UPDATE offering SET 
> min_time = new.time_stamp
>   WHERE new.offering_id::text = offering.offering_id::text 
> AND (new.time_stamp < offering.min_time OR offering.min_time 
> IS NULL);  UPDATE offering SET max_time = new.time_stamp
>   WHERE new.offering_id::text = offering.offering_id::text 
> AND (new.time_stamp > offering.max_time OR offering.max_time 
> IS NULL); );
> 
> -- Rule: "offering_update_actualization ON observation"
> 
> -- DROP RULE offering_update_actualization ON observation;
> 
> CREATE OR REPLACE RULE offering_update_actualization AS
>     ON UPDATE TO observation DO ( UPDATE offering SET 
> min_time = new.time_stamp
>   WHERE new.offering_id::text = offering.offering_id::text 
> AND (new.time_stamp < offering.min_time OR offering.min_time 
> IS NULL);  UPDATE offering SET max_time = new.time_stamp
>   WHERE new.offering_id::text = offering.offering_id::text 
> AND (new.time_stamp > offering.max_time OR offering.max_time 
> IS NULL); );
> 
> Creación de la tabla quality
> 
> 
> CREATE TABLE quality
> (
>   quality_id serial NOT NULL,
>   quality_name character varying(100),
>   quality_unit character varying(100) NOT NULL,
>   quality_value text NOT NULL,
>   quality_type character varying(100) NOT NULL,
>   observation_id integer NOT NULL,
>   CONSTRAINT quality_pkey PRIMARY KEY (quality_id),
>   CONSTRAINT quality_quality_type_check CHECK 
> (quality_type::text = ANY (ARRAY['quantity'::character 
> varying, 'category'::character varying, 'text'::character 
> varying]::text[]))
> )
> 
> Nota: Disponía de una clave foránea en la tabla "quality" del 
> campo "observation_id" que hacía referencia a la tabla 
> "observation". La eliminé para optimizar la consulta.
> 
> Supongo que las claves foráneas que afectan al rendimiento de 
> la query son las de otras tablas que apunten al campo 
> "observation_id" de la tabla "observation". ¿si o no?
> 
> El resto de las tablas no tienen claves foráneas que apunten 
> a la tabla "observation".
> 
> No dispongo de un servidor Linux por el momento, llegará 
> porque está pedido,  y como hay prisas, pues con lo que tengo 
> voy trabajando.
> 
> ¿Has probado reescribir la consulta de otra forma, por ej. 
> usando un join o un EXISTS en vez de un subselect?
> 
> En su día intenté las siguientes consultas DELETE FROM observation 
> where observation_id in (select observation_id from quality, 
> observation   where quality.observation_id = 
> observation.observation_id);
> 
> DELETE FROM observation
>   WHERE observation_id EXISTS (SELECT observation_id FROM 
> quality WHERE quality_name = '-1.3'); Error sintáctico ¿Se 
> puede utilizar el exists?
> 
> Notes
> PostgreSQL lets you reference columns of other tables in the 
> WHERE condition by specifying the other tables in the USING 
> clause. For example, to delete all films produced by a given 
> producer, one can do:
> 
> DELETE FROM films USING producers
>   WHERE producer_id = producers.id AND producers.name = 
> 'foo';What is essentially happening here is a join between 
> films and producers, with all successfully joined films rows 
> being marked for deletion. This syntax is not standard. A 
> more standard way to do it is:
> 
> DELETE FROM films
>   WHERE producer_id IN (SELECT id FROM producers WHERE name = 
> 'foo');In some cases the join style is easier to write or 
> faster to execute than the sub-select style. 
> 
> 
> Saludos, Oliver
> 
> 
> 
> -----Mensaje original-----
> De: Fernando Hevia [mailto:fhe...@ip-tel.com.ar] Enviado el: 
> jueves, 30 de julio de 2009 15:32
> Para: Francisco Manuel Quintana Trujillo; 
> pgsql-es-ayuda@postgresql.org
> Asunto: RE: [pgsql-es-ayuda] Bajo rendimiento en postgresql 
> cuando se lanza un delete
> 
> 
> > -----Mensaje original-----
> > De: Francisco Manuel Quintana Trujillo
> > 
> > Hola a todos,
> > 
> > Hace unas semanas instalé postgresql 8.3.7 en un Windows xp sp3.
> > Especificaciones de la máquina, 2 Gb de Ram, 2 discos duros 
> > sata de 150 Gb cada uno, procesador Pentium 4 dual core a 3.2Ghz.
> > 
> > Un disco duro se utiliza para el sistema operativo y las 
> > aplicaciones, incluido el postgresql y el otro disco se 
> > utiliza para la  base de datos la cual ocupa 105 Gb entre 
> > índices y datos. Lo más destacado es que existen 2 tablas que 
> > contienen 130 millones de registros cada una.
> > 
> 
> Mis recomendaciones de más importante a menos importante:
> 
> 1. Tratá de tener 4 discos en RAID 10. (mejor si puedes 
> instalar más discos)
> 2. Si no es posible comprar más discos, configura tus 2 
> discos en RAID 1
> 3. Llevá la base a Linux o a Windows 2003 Server
> 
> > La verdad es que todo funciona de maravillas si no tenemos en 
> > cuenta la fragmentación que sufre el disco en las inserciones 
> > pero que se resuelve con un simple defrag. El caso es que a 
> > la hora de realizar un select los tiempos de respuesta son 
> > más que aceptables pero no así cuando ejecuto un delete de este tipo
> > 
> 
> Se me ocurre que la fragmentación no debiera ser demasiado 
> problemática. 
> Quizá afecte algo en lecturas secuenciales sobre grandes cantidades de
> datos.
> 
> > delete from observation where observation_id not in (select 
> > distinct(observation_id) from quality) esto significa en 
> > tiempos de cpu 72 horas y sin solución por el momento.
> > 
> 
> El delete debiera ser mejorable pero tendrías que mostrarnos 
> el resultado
> del explain y estructuras de observation y quality como para 
> poder darte
> alguna sugerencia.
> ¿No tendrás clave foráneas? Asegurate de tener índices sobre 
> las columnas
> referenciada siempre! Sino cada registro a eliminar forzará un scan
> secuencial sobre la tabla referenciada a fines de verificar 
> se cumpla la
> constraint.
> 
> > Mis preguntas son:
> > 
> > ¿Es normal?,
> >
> 
> Digamos que normal no. Pero depende de la actividad del 
> sistema, de cuantos
> registros estás eliminando, cuantos índices se ven afectados, 
> si hay claves
> foráneas sobre la tabla afectada, etc.
> 
> > ¿puede ser un problema de bloqueos? ¿cómo puedo averiguar si 
> > la consulta no progresa?
> >
> 
> Estás borrando registros. Definitivamente puede ser un 
> problema de bloqueos
> si hay otros usuarios trabajando sobre los mismos.
> En postgres puedes ver los lockeos con esta consulta: select * from
> pg_locks;
> En Windows usa perfmon.exe para monitorear la actividad del sistema y
> tendrás una idea de si está trabajando o no. Para esta consulta en
> particular presta atención a la carga sobre los discos.
> 
> > ¿Qué  otra solución se puede dar a la fragmentación de disco? 
> > ¿se puede forzar al postgresql a reservar espacio en disco?
> >
> 
> Deja este tema para lo último. Dudo que sea decisivo.
> 
> > He leído las optimizaciones  que se pueden realizar:
> > 
> > Separar las distintas bases de datos en discos duros 
> > independientes así como sus índices, discos duros en raid, 
> > realizar cluster de las tablas, por el momento no son 
> > posibles. Además realizo vacuum cada 2 millones de inserciones.
> > 
> 
> Un Raid 1 no puede "no ser posible". Haz el esfuercito y no 
> lo lamentarás.
> 
> > Agradeciendo de antemano cualquier ayuda
> > 
> > Saludos, Oliver
> > 
> 
> Saludos,
> Fernando.
> 

--
TIP 7: no olvides aumentar la configuración del "free space map"

RE: [pgsql-es-ayuda] Bajo rendimiento en postgresql cuando se lanza un delete

Responder a