Buenos días, Tengo un problema de rendimiento con un servidor DELL nuevo (server B) que os comento a continuación. Actualmente tengo dos servidores:
Server A ----------------- 2xCPU Six-Core AMD Opteron 2439 SE 64GB RAM Controladora Perc6 512MB cache nv - 2 HD 146GB SAS 15Krpm RAID1 (SO Centos 5.4 y pg_xlog) (XFS no barriers) - 6 HD 300GB SAS 15Krpm RAID10 (DB Postgres 8.3.9) (XFS no barriers) Server B ------------------ 2xCPU 16 Core AMD Opteron 6282 SE 64GB RAM Controladora H700 1GB cache nv - 2HD 74GB SAS 15Krpm RAID1 stripe 16k (SO Centos 6.2) - 4HD 146GB SAS 15Krpm RAID10 stripe 16k XFS (pg_xlog) (ext4 bs 4096, no barriers) Controladora H800 1GB cache nv - MD1200 12HD 300GB SAS 15Krpm RAID10 stripe 256k (DB Postgres 8.3.18) (ext4 bs 4096, stride 64, stripe-width 384, no barriers) La BBDD es la misma en ambas maquinas, básicamente cosiste en una BBDD de unos 170GB con las tablas mas grandes particionadas por fecha, para evitar buscar en tablas muy grandes. La configuración a nivel de shared_buffers, checkpoint_segments... es la misma en ambas maquinas. En las dos utilizo pgpool2 únicamente como pool de conexiones. Hasta donde yo suponía, el segundo servidor me debería ir mucho mas rápido, al tener mas discos en RAID10 y controladoras separadas, pero la realidad es que no da rendimiento. Como ejemplo una consulta del tipo: EXPLAIN ANALYZE SELECT c.id AS c__id, c.fk_news_id AS c__fk_news_id, c.fk_news_group_id AS c__fk_news_group_id, c.fk_company_id AS c__fk_company_id, c.import_date AS c__import_date, c.highlight AS c__highlight, c.status AS c__status, c.ord AS c__ord, c.news_date AS c__news_date, c.fk_media_id AS c__fk_media_id, c.title AS c__title, c.search_title_idx AS c__search_title_idx, c.stored AS c__stored, c.tono AS c__tono, c.media_type AS c__media_type, c.fk_editions_news_id AS c__fk_editions_news_id, c.dossier_selected AS c__dossier_selected, c.update_stats AS c__update_stats, c.url_news AS c__url_news, c.url_image AS c__url_image, m.id AS m__id, m.name AS m__name, m.media_type AS m__media_type, m.media_code AS m__media_code, m.fk_data_source_id AS m__fk_data_source_id, m.language_iso AS m__language_iso, m.country_iso AS m__country_iso, m.region_iso AS m__region_iso, m.subregion_iso AS m__subregion_iso, m.media_code_temp AS m__media_code_temp, m.url AS m__url, m.current_rank AS m__current_rank, m.typologyid AS m__typologyid, m.fk_platform_id AS m__fk_platform_id, m.page_views_per_day AS m__page_views_per_day, m.audience AS m__audience, m.last_stats_update AS m__last_stats_update, n.id AS n__id, n.fk_media_id AS n__fk_media_id, n.fk_news_media_id AS n__fk_news_media_id, n.fk_data_source_id AS n__fk_data_source_id, n.news_code AS n__news_code, n.title AS n__title, n.searchfull_idx AS n__searchfull_idx, n.news_date AS n__news_date, n.economical_value AS n__economical_value, n.audience AS n__audience, n.media_type AS n__media_type, n.url_news AS n__url_news, n.url_news_old AS n__url_news_old, n.url_image AS n__url_image, n.typologyid AS n__typologyid, n.author AS n__author, n.fk_platform_id AS n__fk_platform_id, n2.id AS n2__id, n2.name AS n2__name, n3.id AS n3__id, n3.name AS n3__name, f.id AS f__id, f.name AS f__name, n4.id AS n4__id, n4.opentext AS n4__opentext, i.id AS i__id, i.name AS i__name, i.ord AS i__ord, i2.id AS i2__id, i2.name AS i2__name FROM company_news_internet c LEFT JOIN media_internet m ON c.fk_media_id = m.id AND m.media_type = 4 LEFT JOINnews_internet n ON c.fk_news_id = n.id AND n.media_type = 4 LEFT JOIN news_media_internet n2 ON n.fk_news_media_id = n2.id AND n2.media_type = 4 LEFT JOINnews_group_internet n3 ON c.fk_news_group_id = n3.id AND n3.media_type = 4 LEFT JOIN feed_internet f ON n3.fk_feed_id = f.id LEFT JOIN news_text_internet n4 ON c.fk_news_id = n4.fk_news_id AND n4.media_type = 4 LEFT JOIN internet_typology i ON n.typologyid = i.id LEFT JOIN internet_media_platform i2 ON n.fk_platform_id = i2.id WHERE(c.fk_company_id = '16073' AND c.status <> '-3' AND n3.fk_feed_id = '30693' AND n3.status = '1' AND f.fk_company_id = '16073') AND n.typologyid IN ('6', '7', '1', '2', '3', '5', '4') AND c.id > '49764393' AND c.news_date >= '2012-04-02'::timestamp - INTERVAL '4 months' AND n.news_date >= '2012-04-02'::timestamp - INTERVAL '4 months' AND c.fk_news_group_id IN ('43475') AND (c.media_type = 4) ORDER BY c.news_date DESC, c.id DESC LIMIT 200 Me tarda en el servidor A unos 20 segundos y en el servidor B 150 segundos... por lo que veo en el EXPLAIN, en el servidor A el Seq Scan de la tabla news_internet_201112 tarda unos 2s: -> Seq Scan on news_internet_201112 n (cost=0.00..119749.12 rows=1406528 width=535) (actual time=0.046..2186.379 rows=1844831 loops=1) Filter: ((news_date >= '2011-12-02 00:00:00'::timestamp without time zone) AND (media_type = 4) AND (typologyid = ANY ('{6,7,1,2,3,5,4}'::integer[]))) Mientras en el servidor B, tarda 11s: -> Seq Scan on news_internet_201112 n (cost=0.00..119520.12 rows=1405093 width=482) (actual time=0.177..11783.621 rows=1844831 loops=1) Filter: ((news_date >= '2011-12-02 00:00:00'::timestamp without time zone) AND (media_type = 4) AND (typologyid = ANY ('{6,7,1,2,3,5,4}'::integer[]))) Lo que tampoco entiendo es que en el servidor nuevo, el tiempo de la query varia, porque ha llegado a tardarme solo 25s con la BBDD recién arrancada y sin embargo en cuanto la empiezo a usarla un tiempo, van variando los resultados de forma exagerada, mientras en al BBDD que esta en produccion, la fluctuacion en los resultados es de solo unos segundos. Es como si la cache de la tarjeta controladora se comportara de forma extraña. Otro ejemplo, simplemente recuperando una tabla entera dos veces seguidas: Server 1 ------------ EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on company_news_internet_201111 (cost=0.00..457010.37 rows=6731337 width=318) (actual time=0.042..19665.155 rows=6731337 loops=1) Total runtime: 20391.555 ms - EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on company_news_internet_201111 (cost=0.00..457010.37 rows=6731337 width=318) (actual time=0.012..2171.181 rows=6731337 loops=1) Total runtime: 2831.028 ms Server 2 ------------ EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on company_news_internet_201111 (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.110..10010.443 rows=6765779 loops=1) Total runtime: 11552.818 ms - EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Seq Scan on company_news_internet_201111 (cost=0.00..369577.79 rows=6765779 width=323) (actual time=0.023..8173.801 rows=6765779 loops=1) Total runtime: 12939.717 ms Es como si en el segundo servidor hiciera caso omiso de la cache¿?¿?¿ La verdad que estoy un poco perdido, he probado varios sistemas de ficheros, distintos tamaños de stripe en la RAID, pero nada, sigue haciendo cosas raras... ¿Se os ocurre que puedo estar haciendo mal? Muchas gracias! Un saludo. -- César Martín Pérez cmart...@gmail.com