[GENERAL] Tsearch2 crashes my backend, ouch !
Hello, I have just ditched Gentoo and installed a brand new kubuntu system (was tired of the endless compiles). I have a problem with crashing tsearch2. This appeared both on Gentoo and the brand new kubuntu. I will describe all my install procedure, maybe I'm doing something wrong. Cluster is newly created and empty. initdb was done with UNICODE encoding & locales. # from postgresql.conf # These settings are initialized by initdb -- they might be changed lc_messages = 'fr_FR.UTF-8' # locale for system error message strings lc_monetary = 'fr_FR.UTF-8' # locale for monetary formatting lc_numeric = 'fr_FR.UTF-8' # locale for number formatting lc_time = 'fr_FR.UTF-8' # locale for time formatting [EMAIL PROTECTED]:~$ locale LANG=fr_FR.UTF-8 LC_CTYPE="fr_FR.UTF-8" LC_NUMERIC="fr_FR.UTF-8" etc... First import needed .sql files from contrib and check that the default tsearch2 config works for English $ createdb -U postgres test $ psql -U postgres test ./configure --prefix=/usr/lib/postgresql/8.2/ --datadir=/usr/share/postgresql/8.2 --enable-nls=fr --with-python cd contrib/tsearch2 make cd gendict (copy french stem.c and stem.h from the snowball website) ./config.sh -n fr -s -p french_UTF_8 -i -v -c stem.c -h stem.h -C'Snowball stemmer for French' cd ../../dict_fr make clean && make sudo make install Now we have : /bin/sh ../../config/install-sh -c -m 644 dict_fr.sql '/usr/share/postgresql/8.2/contrib' /bin/sh ../../config/install-sh -c -m 755 libdict_fr.so.0.0 '/usr/lib/postgresql/8.2/lib/dict_fr.so' Okay... - download and install UTF8 french dictionaries from http://www.davidgis.fr/download/tsearch2_french_files.zip and put them in contrib directory (the files delivered by debian package ifrench are ISO8859, bleh) - import french shared libs psql -U postgres test < /usr/share/postgresql/8.2/contrib/dict_fr.sql Then : test=# select lexize( 'en_stem', 'flying' ); lexize {fli} And : test=# select * from pg_ts_dict where dict_name ~ '^(fr|en)'; dict_name | dict_init | dict_initoption| dict_lexize |dict_comment ---+---+--+---+- en_stem | snb_en_init(internal) | contrib/english.stop | snb_lexize(internal,internal,integer) | English Stemmer. Snowball. fr| dinit_fr(internal)| | snb_lexize(internal,internal,integer) | Snowball stemmer for French test=# select lexize( 'fr', 'voyageur' ); server closed the connection unexpectedly BLAM ! Try something else : test=# UPDATE pg_ts_dict SET dict_initoption='/usr/share/postgresql/8.2/contrib/french.stop' WHERE dict_name = 'fr'; UPDATE 1 test=# select lexize( 'fr', 'voyageur' ); server closed the connection unexpectedly Try other options : dict_name | fr_ispell dict_init | spell_init(internal) dict_initoption | DictFile="/usr/share/postgresql/8.2/contrib/french.dict",AffFile="/usr/share/postgresql/8.2/contrib/french.aff",StopFile="/usr/share/postgresql/8.2/contrib/french.stop" dict_lexize | spell_lexize(internal,internal,integer) dict_comment| test=# select lexize( 'en_stem', 'traveler' ), lexize( 'fr_ispell', 'voyageur' ); -[ RECORD 1 ]--- lexize | {travel} lexize | {voyageuse} Now it works (kinda) but stemming doesn't stem for French (since snowball is out). It should return 'voyage' (=travel) instead of 'voyageuse' (=female traveler) That's now what I want ; i want to use snowball to stem French words. I'm going to make a debug build and try to debug it, but if anyone can help, you're really, really welcome. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Tsearch2 crashes my backend, ouch !
OK, I've solved my problem... thanks for the hint ! Anyway, just to signal that tsearch2 crashes if SELECT is not granted to pg_ts_dict (other tables give a proper error message when not GRANTed).On Fri, 30 Mar 2007 13:20:30 +0200, Listmail <[EMAIL PROTECTED]> wrote: Hello, I have just ditched Gentoo and installed a brand new kubuntu system (was tired of the endless compiles). I have a problem with crashing tsearch2. This appeared both on Gentoo and the brand new kubuntu. I will describe all my install procedure, maybe I'm doing something wrong. Cluster is newly created and empty. initdb was done with UNICODE encoding & locales. # from postgresql.conf # These settings are initialized by initdb -- they might be changed lc_messages = 'fr_FR.UTF-8' # locale for system error message strings lc_monetary = 'fr_FR.UTF-8' # locale for monetary formatting lc_numeric = 'fr_FR.UTF-8' # locale for number formatting lc_time = 'fr_FR.UTF-8' # locale for time formatting [EMAIL PROTECTED]:~$ locale LANG=fr_FR.UTF-8 LC_CTYPE="fr_FR.UTF-8" LC_NUMERIC="fr_FR.UTF-8" etc... First import needed .sql files from contrib and check that the default tsearch2 config works for English $ createdb -U postgres test $ psql -U postgres test ./configure --prefix=/usr/lib/postgresql/8.2/ --datadir=/usr/share/postgresql/8.2 --enable-nls=fr --with-python cd contrib/tsearch2 make cd gendict (copy french stem.c and stem.h from the snowball website) ./config.sh -n fr -s -p french_UTF_8 -i -v -c stem.c -h stem.h -C'Snowball stemmer for French' cd ../../dict_fr make clean && make sudo make install Now we have : /bin/sh ../../config/install-sh -c -m 644 dict_fr.sql '/usr/share/postgresql/8.2/contrib' /bin/sh ../../config/install-sh -c -m 755 libdict_fr.so.0.0 '/usr/lib/postgresql/8.2/lib/dict_fr.so' Okay... - download and install UTF8 french dictionaries from http://www.davidgis.fr/download/tsearch2_french_files.zip and put them in contrib directory (the files delivered by debian package ifrench are ISO8859, bleh) - import french shared libs psql -U postgres test < /usr/share/postgresql/8.2/contrib/dict_fr.sql Then : test=# select lexize( 'en_stem', 'flying' ); lexize {fli} And : test=# select * from pg_ts_dict where dict_name ~ '^(fr|en)'; dict_name | dict_init | dict_initoption | dict_lexize |dict_comment ---+---+--+---+- en_stem | snb_en_init(internal) | contrib/english.stop | snb_lexize(internal,internal,integer) | English Stemmer. Snowball. fr| dinit_fr(internal)| | snb_lexize(internal,internal,integer) | Snowball stemmer for French test=# select lexize( 'fr', 'voyageur' ); server closed the connection unexpectedly BLAM ! Try something else : test=# UPDATE pg_ts_dict SET dict_initoption='/usr/share/postgresql/8.2/contrib/french.stop' WHERE dict_name = 'fr'; UPDATE 1 test=# select lexize( 'fr', 'voyageur' ); server closed the connection unexpectedly Try other options : dict_name | fr_ispell dict_init | spell_init(internal) dict_initoption | DictFile="/usr/share/postgresql/8.2/contrib/french.dict",AffFile="/usr/share/postgresql/8.2/contrib/french.aff",StopFile="/usr/share/postgresql/8.2/contrib/french.stop" dict_lexize | spell_lexize(internal,internal,integer) dict_comment| test=# select lexize( 'en_stem', 'traveler' ), lexize( 'fr_ispell', 'voyageur' ); -[ RECORD 1 ]--- lexize | {travel} lexize | {voyageuse} Now it works (kinda) but stemming doesn't stem for French (since snowball is out). It should return 'voyage' (=travel) instead of 'voyageuse' (=female traveler) That's now what I want ; i want to use snowball to stem French words. I'm going to make a debug build and try to debug it, but if anyone can help, you're really, really welcome. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] postgresl for mysql?
>What will they think of next! >http://krow.livejournal.com/502908.html >I suppose it makes as much sense as the others, except why would you >want to use mysql if the storage is in postgres? If you've inherited data in a postgresql database this will allow you to migrate it to the industry standard database without the inconvenience and downtime of a dump from postgresql and a restore into mysql. I don't think it's a new idea - IIRC, Aprile Pazzo did something similar for MySQL 3 and PG 7.something. What an interesting name! I don't know much Italian other than what I've picked up from a few movies, but I think I now know what Pazzo means ... Yeah well you know mysqldump has an option "export to postgres syntax" so you can reimport in postgres. I encourage you to try it one day, you'll be amazed. mysqldump --password -d -u root immo_forum DROP TABLE IF EXISTS `smf_topics`; CREATE TABLE `smf_topics` ( `ID_TOPIC` mediumint(8) unsigned NOT NULL auto_increment, `isSticky` tinyint(4) NOT NULL default '0', `ID_BOARD` smallint(5) unsigned NOT NULL default '0', `ID_FIRST_MSG` int(10) unsigned NOT NULL default '0', `ID_LAST_MSG` int(10) unsigned NOT NULL default '0', `ID_MEMBER_STARTED` mediumint(8) unsigned NOT NULL default '0', `ID_MEMBER_UPDATED` mediumint(8) unsigned NOT NULL default '0', `ID_POLL` mediumint(8) unsigned NOT NULL default '0', `numReplies` int(10) unsigned NOT NULL default '0', `numViews` int(10) unsigned NOT NULL default '0', `locked` tinyint(4) NOT NULL default '0', PRIMARY KEY (`ID_TOPIC`), UNIQUE KEY `lastMessage` (`ID_LAST_MSG`,`ID_BOARD`), UNIQUE KEY `firstMessage` (`ID_FIRST_MSG`,`ID_BOARD`), UNIQUE KEY `poll` (`ID_POLL`,`ID_TOPIC`), KEY `isSticky` (`isSticky`), KEY `ID_BOARD` (`ID_BOARD`) ) ENGINE=MyISAM AUTO_INCREMENT=25 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci; /*!40103 SET [EMAIL PROTECTED] */; mysqldump --compatible=postgres --password -d -u root immo_forum DROP TABLE IF EXISTS "smf_topics"; CREATE TABLE "smf_topics" ( "ID_TOPIC" mediumint(8) unsigned NOT NULL, "isSticky" tinyint(4) NOT NULL default '0', "ID_BOARD" smallint(5) unsigned NOT NULL default '0', "ID_FIRST_MSG" int(10) unsigned NOT NULL default '0', "ID_LAST_MSG" int(10) unsigned NOT NULL default '0', "ID_MEMBER_STARTED" mediumint(8) unsigned NOT NULL default '0', "ID_MEMBER_UPDATED" mediumint(8) unsigned NOT NULL default '0', "ID_POLL" mediumint(8) unsigned NOT NULL default '0', "numReplies" int(10) unsigned NOT NULL default '0', "numViews" int(10) unsigned NOT NULL default '0', "locked" tinyint(4) NOT NULL default '0', PRIMARY KEY ("ID_TOPIC"), UNIQUE KEY "lastMessage" ("ID_LAST_MSG","ID_BOARD"), UNIQUE KEY "firstMessage" ("ID_FIRST_MSG","ID_BOARD"), UNIQUE KEY "poll" ("ID_POLL","ID_TOPIC"), KEY "isSticky" ("isSticky"), KEY "ID_BOARD" ("ID_BOARD") ); /*!40103 SET [EMAIL PROTECTED] */; Sure looks "compatible" (but with what ?) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Tsearch2 crashes my backend, ouch !
Anyway, just to signal that tsearch2 crashes if SELECT is not granted to pg_ts_dict (other tables give a proper error message when not GRANTed).On I don't understand this. Are sure on this ? From prompt in your select examples I see you have superuser's rights and you have successfully select from pg_ts_dict column. It was tricky to reproduce... I thought I had hallucinations but here it is : - open two psql windows (one postgres user, one normal unprivileged user), see > or # in prompt for which window I use to type commands/ - first let's lock ourselves up : caillaudangers=# REVOKE select ON pg_ts_dict FROM caillaudangers ; REVOKE caillaudangers=# REVOKE select ON pg_ts_cfg FROM caillaudangers ; REVOKE caillaudangers=# REVOKE select ON pg_ts_cfgmap FROM caillaudangers ; REVOKE caillaudangers=# REVOKE select ON pg_ts_parser FROM caillaudangers ; REVOKE - then try to access : caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); ERREUR: droit refusé pour la relation pg_ts_dict CONTEXT: instruction SQL «select dict_init, dict_initoption, dict_lexize from public.pg_ts_dict where oid = $1» caillaudangers=# GRANT select ON pg_ts_dict TO caillaudangers ; GRANT caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); ERREUR: No dictionary with id 138493128 Strange error message ?? caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); ERREUR: droit refusé pour la relation pg_ts_cfg CONTEXT: instruction SQL «select prs_name from public.pg_ts_cfg where oid = $1» Proper error message now. Let's go back. caillaudangers=# REVOKE select ON pg_ts_dict FROM caillaudangers ; REVOKE Now try to select to_tsvector and each time a permission is denied, grant the needed table. caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); ERREUR: droit refusé pour la relation pg_ts_cfg CONTEXT: instruction SQL «select prs_name from public.pg_ts_cfg where oid = $1» caillaudangers=# GRANT select ON pg_ts_cfg TO caillaudangers ; GRANT caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); ERREUR: droit refusé pour la relation pg_ts_cfgmap CONTEXT: instruction SQL «select lt.tokid, map.dict_name from public.pg_ts_cfgmap as map, public.pg_ts_cfg as cfg, public.token_type( $1 ) as lt where lt.alias = map.tok_alias and map.ts_name = cfg.ts_name and cfg.oid= $2 order by lt.tokid desc;» caillaudangers=# GRANT select ON pg_ts_cfgmap TO caillaudangers ; GRANT caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); ERREUR: droit refusé pour la relation pg_ts_parser CONTEXT: instruction SQL «select prs_start, prs_nexttoken, prs_end, prs_lextype, prs_headline from public.pg_ts_parser where oid = $1» caillaudangers=# GRANT select ON pg_ts_parser TO caillaudangers ; GRANT caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' ); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. La connexion au serveur a été perdue. Tentative de réinitialisation : Echec. There it crashes. It's bizarre. ---(end of broadcast)--- TIP 6: explain analyze is your friend
[GENERAL] BitmapScan mishaps
Hello everyone ! I have this query : annonces=> EXPLAIN ANALYZE SELECT * FROM annonces AS a WHERE detect_time > CURRENT_TIMESTAMP - '7 DAY'::INTERVAL AND detect_time >= '2006-10-30 16:17:45.064793' AND vente AND surface IS NOT NULL AND price IS NOT NULL AND type_id IN (1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6) AND (a.zipcode IN (69001,69002,69003,69004,69005,69006,69007,69008,69009) OR a.city_id IN (27595) OR a.coords && '(45.74101689082,4.8371263505564),(45.75898310918,4.8628736494436)'::BOX) ; QUERY PLAN - Bitmap Heap Scan on annonces a (cost=1657.06..7145.98 rows=1177 width=691) (actual time=118.342..118.854 rows=194 loops=1) Recheck Cond: (((vente AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) OR (vente AND (city_id = 27595)) OR (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)) AND (detect_time > (now() - '7 days'::interval)) AND (detect_time >= '2006-10-30 16:17:45.064793'::timestamp without time zone)) Filter: (vente AND (surface IS NOT NULL) AND (price IS NOT NULL) AND (type_id = ANY ('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[]))) -> BitmapAnd (cost=1657.06..1657.06 rows=2465 width=0) (actual time=118.294..118.294 rows=0 loops=1) -> BitmapOr (cost=133.83..133.83 rows=4368 width=0) (actual time=2.903..2.903 rows=0 loops=1) -> Bitmap Index Scan on annonces_zip (cost=0.00..56.54 rows=1825 width=0) (actual time=0.599..0.599 rows=1580 loops=1) Index Cond: ((vente = true) AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) -> Bitmap Index Scan on annonces_city (cost=0.00..43.30 rows=1904 width=0) (actual time=0.464..0.464 rows=1575 loops=1) Index Cond: ((vente = true) AND (city_id = 27595)) -> Bitmap Index Scan on annonces_coords (cost=0.00..33.10 rows=640 width=0) (actual time=1.837..1.837 rows=2166 loops=1) Index Cond: (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box) -> Bitmap Index Scan on annonces_date (cost=0.00..1522.68 rows=72241 width=0) (actual time=114.930..114.930 rows=68022 loops=1) Index Cond: ((detect_time > (now() - '7 days'::interval)) AND (detect_time >= '2006-10-30 16:17:45.064793'::timestamp without time zone)) Total runtime: 119.000 ms (14 lignes) The interesting part is : Bitmap Index Scan on annonces_date (cost=0.00..1522.68 rows=72241 width=0) (actual time=114.930..114.930 rows=68022 loops=1) It bitmapscans about half the table... I realized this index was actually useless for all my queries, so I dropped it, and behold : QUERY PLAN -- Bitmap Heap Scan on annonces a (cost=133.83..7583.77 rows=1176 width=691) (actual time=5.483..18.731 rows=194 loops=1) Recheck Cond: ((vente AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) OR (vente AND (city_id = 27595)) OR (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)) Filter: ((detect_time > (now() - '7 days'::interval)) AND (detect_time = '2006-10-30 16:17:45.064793'::timestamp without time zone) AND vente AND (surface IS NOT NULL) AND (price IS NOT NULL) AND (type_id = ANY ('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[]))) -> BitmapOr (cost=133.83..133.83 rows=4368 width=0) (actual time=2.648..2.648 rows=0 loops=1) -> Bitmap Index Scan on annonces_zip (cost=0.00..56.54 rows=1825 width=0) (actual time=0.505..0.505 rows=1580 loops=1) Index Cond: ((vente = true) AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,
Re: [GENERAL] UPDATE on two large datasets is very slow
I can't help but think that the way this application writes data is optimized for MySQL's transactionless table type, where lots of simultaneous input streams writing at the same time to the same table would be death. Can you step back and work on how the app writes out data, so that it opens a persistent connection, and then sends in the updates one at a time, committing every couple of seconds while doing so? No, he can't, I also coded a bittorrent tracker of the same kind, and the problem is : - each user has N torrents active - torrent client does N tracker requests - tracker only does 1 UDPATE to update user's stats So if you do instantaneous updates you multiply your query load by N (on average between 6 and 8). Besides, these kinds of trackers face several problems : - they are accessed by clients which have near saturated connections since they're leeching illegal prOn like crazy - therefore these HTTP connections are very slow - therefore you have a hell of a lot of concurrent connections. Therefore using a threaded server for this kind of load is asking for trouble. All decent torrent trackers are designed like lighttpd : select() / poll() or other variants, and no threads. No threads means, database queries are something long and to be avoided. Hosting providers will delete your account if they see a php torrent tracker on it, and for good reason. 600 hits/s = 600 connections = 600 apache and PG process = you cry. Anyway my tracker was in Python with select/poll asynchronous HTTP model. It handled 200 HTTP requests per second using 10% CPU on a Core 2. I guess thats pretty decent. (I do NOT work on it anymore, DO NOT ASK for sources, it is illegal now in my country to code trackers so I have completely dropped the project, but I guess helping a fellow living in a free country is OK) Back to databases. You complain that postgres is slow for your application. Yes, it is a lot slower than MyISAM *on this application* (but try InnoDB and cry). But PG is a real database. It is simply not the right tool to your application. You have to choose between in-place updates and transactions. (besides, your website is locked while MySQL does your big UPDATE). Here is how you can do it : Your problem is that you put the peers in the database. Ask yourself why ? You need seeders / leechers count for each torrent ? -> Two INTEGERs in your torrents table, updated in batch by the tracker every hour. You need to have all peers saved somewhere so that you may exit and restart your tracker ? -> code your tracker in python and be able to reload running code -> or just save it when you exit -> or don't save it, it's not like it's your bank accounting data, who cares -> the FBI will be happy to have all that data when they seize your server (see: piratebay laughs as all data was in RAM and police had to unplug the server to seize it.) So, DO NOT put the peers in the database. IF you put the peers info in the database you get one UPDATE per user per torrent. If you only update the user stats you only get one UPDATE per user. And the tracker never inserts users and torrents (hopefully) so you only get UPDATES to users and to torrents tables, never inserts. Now you need to display realtime info on the user's and torrents pages. This is easily done : your tracker is a HTTP server, it can serve data via HTTP (php serialized, JSON, whatever) that is inserted via AJAX of PHP in your webpages. From my stats my tracker needs about 100 microseconds to serve a HTTP web page with the peer counts for a torrent. So, you don't need Postgres for your tracker ! Use it for your website instead... ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] BitmapScan mishaps
On Tue, 03 Apr 2007 19:23:31 +0200, Tom Lane <[EMAIL PROTECTED]> wrote: Listmail <[EMAIL PROTECTED]> writes: It bitmapscans about half the table... Which PG version is this exactly? We've fooled with the choose_bitmap_and heuristics quite a bit ... regards, tom lane Version is 8.2.3. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] PGSQL with high number of database rows?
Are there any implications with possibly doing this? will PG handle it? Are there realworld systems using PG that have a massive amount of data in them? It's not how much data you have, it's how you query it. You can have a table with 1000 rows and be dead slow if said rows are big TEXT data and you seq-scan it in its entierety on every webpage hit your server gets... You can have a terabyte table with billions of row, and be fast if you know what you're doing and have proper indexes. Learning all this is very interesting. MySQL always seemed hostile to me, but postgres is friendly, has helpful error messages, the docs are great, and the developer team is really nice. The size of your data has no importance (unless your disk is full), but the size of your working set does. So, if you intend on querying your data for a website, for instance, where the user searches data using forms, you will need to index it properly so you only need to explore small sections of your data set in order to be fast. If you intend to scan entire tables to generate reports or statistics, you will be more interested in knowing if the size of your RAM is larger or smaller than your data set, and about your disk throughput. So, what is your application ? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Webappication and PostgreSQL login roles
I designed a Java web application. The persistence layer is a PostgreSQL database. The application needs user authentication. I think it's a good choice to implement this authentication mechanism via PostgreSQL login roles. So I can create several database login roles and set the database permissions to this login roles. This is my first project with the postgres database, so I don't know how I can validate a login from the website. Is there a best practice to do this or does PostgreSQL offers a stored procedure like 'authenticateUser(String username, String password)'? Keep in mind that this might interact badly with very desirable features like : - persistent connections (opening a postgres connection takes a lot longer than a simple SELECT, so if you must reopen connections all the time your performance will suck) - connection pooling (what happens when a user gets the admin's connection out of the pool ?) Since you use an object-relational mapper I believe it is better, and more flexible to have your objects handle their own operations. On a very basic level your objects can have a .isReadOnly() method which is checked in your application before any writing takes place, for instance. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] BitmapScan mishaps
Hmmm [ studies query a bit more... ] I think the reason why that index is so expensive to use is exposed here: Index Cond: ((detect_time > (now() - '7 days'::interval)) AND (detect_time >= '2006-10-30 16:17:45.064793'::timestamp without time zone)) Evidently detect_time is timestamp without time zone, but you're comparing it to an expression that is timestamp with time zone (ie CURRENT_TIMESTAMP). That's an enormously expensive operator compared to straight comparisons of two timestamps of the same ilk, because it does some expensive stuff to convert across time zones. And you're applying it to a whole lot of index rows. If you change the query to use LOCALTIMESTAMP to avoid the type conversion, how do the two plans compare? regards, tom lane OK, I recreated the index, and... you were right. Actually, it was my query that sucked. Index Cond: ((detect_time > (now() - '7 days'::interval)) AND (detect_time >= '2006-10-30 16:17:45.064793'::timestamp without time zone)) Is it greater() which returns the highest of two values ? (like max() but not aggregate) Anyway, I fixed this in the code that generates the query, it's cleaner. So now, I just put a constant timestamp. Then we have this interesting side effect. Simply changing the timestamp value induces a different plan, and the one which returns more rows is actually faster ! annonces=> EXPLAIN ANALYZE SELECT * FROM annonces AS a WHERE detect_time >= '2007-03-27 20:46:29.187131+02' AND vente AND surface IS NOT NULL AND price IS NOT NULL AND type_id IN (1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6) AND (a.zipcode IN (69001,69002,69003,69004,69005,69006,69007,69008,69009) OR a.city_id IN (27595) OR a.coords && '(45.74101689082,4.8371263505564),(45.75898310918,4.8628736494436)'::BOX) ; QUERY PLAN Bitmap Heap Scan on annonces a (cost=1422.91..6758.82 rows=1130 width=691) (actual time=27.007..27.542 rows=194 loops=1) Recheck Cond: (((vente AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) OR (vente AND (city_id = 27595)) OR (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)) AND (detect_time >= '2007-03-27 20:46:29.187131'::timestamp without time zone)) Filter: (vente AND (surface IS NOT NULL) AND (price IS NOT NULL) AND (type_id = ANY ('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[]))) -> BitmapAnd (cost=1422.91..1422.91 rows=2367 width=0) (actual time=26.960..26.960 rows=0 loops=1) -> BitmapOr (cost=133.80..133.80 rows=4368 width=0) (actual time=2.764..2.764 rows=0 loops=1) -> Bitmap Index Scan on annonces_zip (cost=0.00..56.54 rows=1825 width=0) (actual time=0.503..0.503 rows=1580 loops=1) Index Cond: ((vente = true) AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) -> Bitmap Index Scan on annonces_city (cost=0.00..43.30 rows=1904 width=0) (actual time=0.457..0.457 rows=1575 loops=1) Index Cond: ((vente = true) AND (city_id = 27595)) -> Bitmap Index Scan on annonces_coords (cost=0.00..33.10 rows=640 width=0) (actual time=1.802..1.802 rows=2166 loops=1) Index Cond: (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box) -> Bitmap Index Scan on annonces_timestamp (cost=0.00..1288.58 rows=69375 width=0) (actual time=23.906..23.906 rows=68022 loops=1) Index Cond: (detect_time >= '2007-03-27 20:46:29.187131'::timestamp without time zone) Total runtime: 27.669 ms (14 lignes) annonces=> EXPLAIN ANALYZE SELECT * FROM annonces AS a WHERE detect_time >= '2006-03-27 20:46:29.187131+02' AND vente AND surface IS NOT NULL AND price IS NOT NULL AND type_id IN (1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6) AND (a.zipcode IN (69001,69002,69003,69004,69005,69006,69007,69008,69009) OR a.city_id IN (27595) OR a.coords && '(45.74101689082,4.8371263505564),(45.75898310918,4.8628736494436)'
Re: [GENERAL] Storing blobs in PG DB
You don't like filesystems ? On Wed, 04 Apr 2007 07:44:57 +0200, Nikolay Moskvichev <[EMAIL PROTECTED]> wrote: Hi All! Question is : How suitable PG for storing about 2 000 000 binary files 0,5-2,0 Mb size each ? It is not planned the big number of clients or a plenty of updatings. Like photoalbum on local host. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] UPDATE on two large datasets is very slow
Any suggestions for finding out where all the time is being spent? I'm - time spent updating indexes ? do you have a lot of them ? - try with fsync off (not in production, but it will give you an idea) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] Storing blobs in PG DB
My personal view is that in general, binary files have no place in databases. Filesystems are for files, databases are for data. My design choice is to store the files in a fileystem and use the database to hold metadata as well as a pointer to the file. If you *must* put files into the database, then you can do so, and PG will handle that many files of those sizes with ease. For all intents and purposes, PG can store an unlimited number of files. You're far more likely to run into walls in the form of limitations in your disk I/O system then limitations in what PG will handle. And you can't backup with rsync... ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] Sort and Limit - really nasty query and feature of the day
Today I rewrote a particularly nasty query involving a UNION ALL between an active table and a huge archive table, some left joins, order by and limit, and it went from 5 minutes to under one second ; however one query became 4 with some glue in between. EXPLAIN SELECT * FROM ( SELECT 0 AS archived, id, price, surface, coords, detect_time, type_id, vente, zipcode, city_id, description FROM annonces UNION ALL SELECT 1 AS archived, a.id, price, surface, coords, detect_time, type_id, vente, zipcode, city_id, description FROM archive_data a LEFT JOIN archive_ext d ON a.id=d.id ) AS foo WHERE detect_time >= '2006-10-30 16:17:45.064793' AND type_id IN (1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6) AND vente AND (zipcode IN (69001,69002,69003,69004,69005,69006,69007,69008,69009) OR city_id IN (27595) OR coords && '(45.74101689082,4.8371263505564),(45.75898310918,4.8628736494436)'::BOX) AND surface IS NOT NULL AND price IS NOT NULL ORDER BY price/surface LIMIT 100; Here is the messy explain : Limit (cost=333560.35..333560.60 rows=100 width=103) -> Sort (cost=333560.35..333656.88 rows=38610 width=103) Sort Key: (foo.price / (foo.surface)::double precision) -> Result (cost=133.21..328438.41 rows=38610 width=103) -> Append (cost=133.21..328245.36 rows=38610 width=103) -> Bitmap Heap Scan on annonces (cost=133.21..7520.56 rows=1426 width=190) Recheck Cond: ((vente AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) OR (vente AND (city_id = 27595)) OR (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)) Filter: ((detect_time >= '2006-10-30 16:17:45.064793'::timestamp without time zone) AND (type_id = ANY ('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[])) AND vente AND (surface IS NOT NULL) AND (price IS NOT NULL)) -> BitmapOr (cost=133.21..133.21 rows=4294 width=0) -> Bitmap Index Scan on annonces_zip (cost=0.00..55.91 rows=1761 width=0) Index Cond: ((vente = true) AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) -> Bitmap Index Scan on annonces_city (cost=0.00..42.85 rows=1859 width=0) Index Cond: ((vente = true) AND (city_id = 27595)) -> Bitmap Index Scan on annonces_coords (cost=0.00..33.37 rows=675 width=0) Index Cond: (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box) -> Merge Right Join (cost=59679.03..320338.70 rows=37184 width=182) Merge Cond: (d.id = a.id) -> Index Scan using archive_ext_pkey on archive_ext d (cost=0.00..252661.12 rows=2976314 width=119) -> Sort (cost=59679.03..59771.99 rows=37184 width=67) Sort Key: a.id -> Bitmap Heap Scan on archive_data a (cost=3951.02..56856.32 rows=37184 width=67) Recheck Cond: ((vente AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) OR (vente AND (city_id = 27595)) OR (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)) Filter: ((detect_time >= '2006-10-30 16:17:45.064793'::timestamp without time zone) AND (type_id = ANY ('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[])) AND vente AND (surface IS NOT NULL) AND (price IS NOT NULL)) -> BitmapOr (cost=3951.02..3951.02 rows=171699 width=0) -> Bitmap Index Scan on archive_data_zip (cost=0.00..1692.62 rows=80610 width=0) Index Cond: ((vente = true) AND (zipcode = ANY ('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[]))) -> Bitmap Index Scan on archive_data_city (cost=0.00..1695.31 rows=80683 width=0) Index Cond: ((vente = true) AND (city_id = 27595)) -> Bitmap Index Scan on archive_data_coords (cost=0.00..535.20 rows=10406 width=0) Index Cond: (coords && '(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box) I didn't redo the e
Re: [GENERAL] Anyone know where I can get an 8.2.3 binary for ubuntu?
On 4/5/07, Michelle Konzack <[EMAIL PROTECTED]> wrote: Am 2007-04-01 12:05:44, schrieb Leonel: > and once you downloaded the packages do a : > > apt-get build-deps postgresql-8.1 Are you sure? -- It should be: You don't have the build-dep for 8.2 in ubuntu dapper/ edgy I recently installed kubuntu and postgres 8.2.3 is available in apt-get (edgy backports I think). ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] SQL WHERE: many sql or large IN()
I have a choice of running: SELECT bar FROM tokens WHERE foo IN ('apple','orange','biscuit') for up to ~300 words OR SELECT bar FROM tokens WHERE foo = 'apple' up to ~300 times as a prepared/cached SQL statements. With new PG versions you can also use VALUES which will save you a hash if you know your keys are unique. Example use integers but you can use anything. Just like a normal join from a table. Putting 300 values in VALUES is certainly a LOT faster than doing 300 individual SELECTs ! test=> EXPLAIN ANALYZE SELECT t.* FROM test t, ( VALUES (1),(2) ) AS v WHERE t.id=v.column1; QUERY PLAN Nested Loop (cost=0.00..16.76 rows=2 width=8) (actual time=0.029..0.039 rows=2 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4) (actual time=0.004..0.005 rows=2 loops=1) -> Index Scan using test_pkey on test t (cost=0.00..8.36 rows=1 width=8) (actual time=0.013..0.015 rows=1 loops=2) Index Cond: (t.id = "*VALUES*".column1) Total runtime: 0.085 ms ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] SQL WHERE: many sql or large IN()
On Fri, 06 Apr 2007 18:45:15 +0200, Markus Schiltknecht <[EMAIL PROTECTED]> wrote: Hi, tom wrote: Initially it seems that the WHERE IN (...) approach takes a turn for the worse when the list gets very large. Since I use this a lot on webpages, I thought maybe a little benchmark is in order ? CREATE TABLE test (id SERIAL PRIMARY KEY, value TEXT NOT NULL ); INSERT INTO test (value) SELECT * FROM generate_series( 1, 100 ); CREATE INDEX test_value ON test( value ); ANALYZE test; My script runs EXPLAIN ANALYZE 10 times and keeps the fastest timing, then displays it. Then it executes the query enough times so that 1 second is elapsed (also fetching the results), then prints the real runtime. Well, the table is clustered (since it was created from generate_series) so of course, bitmap index scan rules. Get the first 300 rows of the table using column 'value' which is an indexed TEXT column. (using 'id' which is an integer is a bit faster) SELECT * FROM test WHERE value IN <300 values> Bitmap Heap Scan on test (cost=1225.83..2317.08 rows=300 width=13) (actual time=3.736..3.807 rows=300 loops=1) Recheck Cond: (value = ANY ('{<300 values>}'::text[])) -> Bitmap Index Scan on test_value (cost=0.00..1225.75 rows=300 width=0) (actual time=3.717..3.717 rows=300 loops=1) Index Cond: (value = ANY ('{<300 values>}'::text[])) Explain Analyze runtime: 1 x 3.896 ms = 3.896 ms Real runtime: 1 x 6.027 ms = 6.027 ms (timed on 257 iterations) SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE t.value=v.column1 Nested Loop (cost=0.00..2447.27 rows=300 width=13) (actual time=0.035..4.724 rows=300 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..3.75 rows=300 width=32) (actual time=0.002..0.219 rows=300 loops=1) -> Index Scan using test_value on test t (cost=0.00..8.13 rows=1 width=13) (actual time=0.013..0.014 rows=1 loops=300) Index Cond: (t.value = "*VALUES*".column1) Explain Analyze runtime: 1 x 4.814 ms = 4.814 ms Real runtime: 1 x 6.786 ms = 6.786 ms (timed on 208 iterations) SELECT * FROM test WHERE value='1' Index Scan using test_value on test (cost=0.00..8.40 rows=1 width=13) (actual time=0.014..0.015 rows=1 loops=1) Index Cond: (value = '1'::text) Explain Analyze runtime: 300 x 0.032 ms = 9.600 ms Real runtime: 300 x 0.149 ms = 44.843 ms (timed on 31251 iterations) Now if we ask for 300 random rows out of the million in the table, which is a lot more likely situation... SELECT * FROM test WHERE value IN <300 values> Bitmap Heap Scan on test (cost=1225.83..2317.08 rows=300 width=13) (actual time=4.516..4.945 rows=300 loops=1) Recheck Cond: (value = ANY ('{<300 values>}'::text[])) -> Bitmap Index Scan on test_value (cost=0.00..1225.75 rows=300 width=0) (actual time=4.451..4.451 rows=300 loops=1) Index Cond: (value = ANY ('{<300 values>}'::text[])) Explain Analyze runtime: 1 x 5.034 ms = 5.034 ms Real runtime: 1 x 7.278 ms = 7.278 ms (timed on 199 iterations) SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE t.value=v.column1 Nested Loop (cost=0.00..2447.27 rows=300 width=13) (actual time=0.046..5.503 rows=300 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..3.75 rows=300 width=32) (actual time=0.001..0.200 rows=300 loops=1) -> Index Scan using test_value on test t (cost=0.00..8.13 rows=1 width=13) (actual time=0.016..0.016 rows=1 loops=300) Index Cond: (t.value = "*VALUES*".column1) Explain Analyze runtime: 1 x 5.625 ms = 5.625 ms Real runtime: 1 x 7.572 ms = 7.572 ms (timed on 178 iterations) Doing a query per row is a lot slower than it appears in EXPLAIN ANALYZE when you add all the overhead. In fact it sucks. That was ecpected of course. I could tell you about when I got this php/mysql website where the previous developer's idea of a join was a select in a PHP for() loop because "joins are slow". It was hell. I tried to query 1, 2, 10, 1000, 10k and 100k random rows (thats abuse) and : - one query per row is (obviously) the slowest unless you select only one row - IN reverts to a single index scan if it contains one value, else it always uses a bitmap scan - VALUES join uses nested loop until about 100k rows where it switches to hash join I also tried on a real table with many columns. In that case, there are less tuples per page, and bitmap scans are more efficient than nested loops, so IN wins. So, IN() does not turn to crap anymore like it used to ! However, if you are asking for this question because you don't want to use temporary tables, it seems that temp tables have been upgraded so they are now usable even in a webpage (well, not ALL webpages, but that big nasty slow webpage, you know which one I'm talking about). ---(end of br
Re: [GENERAL] SQL WHERE: many sql or large IN()
I've never seen this before. Is this PG specific or generic SQL that I've never been exposed to? http://www.postgresql.org/docs/8.2/interactive/sql-values.html VALUES conforms to the SQL standard, except that LIMIT and OFFSET are PostgreSQL extensions. It doesn't seem like much at first glance, but it's really useful. It's your kleenex temp table. It can select the rows you want from a table in the order you want, update a table, replacing several values with several others (as a before=>after correspondence table) and many other things. It is more powerful than IN() because you can use columns of the VALUES() in any part of your query, updating, ordering, etc. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] programmatic way to fetch latest release for a given major.minor version
Here you go. Fetches versions and prints most recent minor for each major Tests all mirrors for speed and prints out the 4 fastest (takes some time) http://www.crummy.com/software/BeautifulSoup/ Have a nice day ! #! /bin/env python # -*- coding: utf-8 -*- import urllib, BeautifulSoup, re, time, sys def get_all_versions(): soup = BeautifulSoup.BeautifulSoup( urllib.urlopen( "http://ftp3.fr.postgresql.org/pub/postgresql/source/"; ).read() ) for a in soup( 'a', {'href': re.compile( r"v\d+.\d+.\d+" ) } ): yield map( int, re.search( r"v(\d+)\.(\d+)\.(\d+)*", a['href'] ).groups() ) def get_latest_versions(): lastversions = {} for a,b,c in sorted( get_all_versions() ): lastversions[ (a,b) ] = c return sorted( lastversions.items() ) def parse_query_string( url ): return dict( map( urllib.unquote_plus, pair.split('=',1) ) for pair in re.split( "&(?:amp;|)", urllib.splitquery( url )[1] ) ) def get_mirrors(): soup = BeautifulSoup.BeautifulSoup( urllib.urlopen( "http://wwwmaster.postgresql.org/download/mirrors-ftp"; ).read() ) for a in soup( 'a', {'href': re.compile( r"\?setmir=.*url=" ) } ): yield parse_query_string( a['href'] )['url'] def get_fastest_mirrors( urls, filename ): for url in urls: sys.stdout.write( "%s\r" % url ) t = time.time() try: urllib.urlopen( url + filename ) except: pass d = time.time()-t print "%.02f s" % d yield d, url for major, minor in get_latest_versions(): print "%d.%d.%d" % (major[0], major[1], minor) mirrors = get_mirrors() fastest = sorted( get_fastest_mirrors( mirrors, "sync_timestamp" ))[:4] for d, mirror in fastest: print "%.02f s %s" % (d,mirror) On Tue, 10 Apr 2007 00:34:02 +0200, Andrew Hammond <[EMAIL PROTECTED]> wrote: On 4/9/07, CAJ CAJ <[EMAIL PROTECTED]> wrote: On 9 Apr 2007 14:47:20 -0700, Andrew Hammond <[EMAIL PROTECTED]> wrote: > I'm writing a script that wants to know the latest release for a given > major.minor version. Is there some better way than parsing > http://www.postgresql.org/ftp/source/ or trying to connect to ftp > (which is invariably timing out on me today. Is that box getting > hammered or something?) and doing the parsing that? Both approaches > feel quite awkward to me. Use wget to download via HTTP (added recently). Probably wise to add a couple mirrors in your script. I'm not asking how to download stuff. I'm asking how to figure out the current release number for a given major.minor. I thought that was clear in my original post, but I guess not. For example, how do I determine (programmatically) the lastest version of 8.1. I'm also interested in a clever way to select one of the "close" mirrors at random for downloading via http. However I had planned to hold that question until I'd solved the first issue. Andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Bad plan using join on VALUES
Table definition and problem query is below. I'm surprised... caillaudangers=> \d relations Table « public.relations » Colonne | Type | Modificateurs +-+ parent_id | integer | not null child_id | integer | not null klass | integer | not null id | integer | not null default nextval('relations_id_seq'::regclass) sort_order | integer | data | bytea | tree_vis | boolean | main_path | boolean | index_id | integer | Index : « relations_pkey » PRIMARY KEY, btree (id) « relations_unique » UNIQUE, btree (parent_id, child_id) « relations_child » btree (child_id) « relations_sort » btree (parent_id, klass, sort_order) CLUSTER « relations_tree » btree (parent_id, klass, sort_order) WHERE tree_vis = true Contraintes de clés étrangères : « klass_fk » FOREIGN KEY (klass) REFERENCES relation_klasses(id) « relations_child_id_fkey » FOREIGN KEY (child_id) REFERENCES nodes(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED « relations_node_id_fkey » FOREIGN KEY (parent_id) REFERENCES nodes(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED caillaudangers=> CLUSTER relations; ANALYZE relations; CLUSTER ANALYZE caillaudangers=> EXPLAIN ANALYZE SELECT * FROM relations WHERE child_id IN ( [60 integers] ); QUERY PLAN --- Bitmap Heap Scan on relations (cost=176.93..383.43 rows=236 width=58) (actual time=0.298..0.482 rows=350 loops=1) Recheck Cond: (child_id = ANY (' [60 integers] -> Bitmap Index Scan on relations_child (cost=0.00..176.87 rows=236 width=0) (actual time=0.281..0.281 rows=350 loops=1) Index Cond: (child_id = ANY [60 integers] Total runtime: 0.582 ms (5 lignes) OK, Screaming fast ! (and it doesn't use the CLUSTER) caillaudangers=> explain analyze SELECT target.* FROM relations AS target, (VALUES (8695), (8743), (10309), (22000), (22980), (23016), (8683), (25092), (13369), (13377), (13375), (13371), (13373), (25126), (10629), (13363), (13365), (22820), (15705), (13367), (8759), (8783), (8815), (23018), (8781), (8765), (23012), (23036), (23046), (8803), (25226), (22940), (8771), (8769), (13335), (23004), (22816), (23062), (8805), (8727), (13379), (23064), (23032), (24976), (8807), (23024), (8787), (23028), (8809), (8735), (23042), (8813), (8731), (22964), (8755), (8697), (8749), (22974), (8733), (8715)) AS source WHERE target.child_id = source.column1; QUERY PLAN --- Hash Join (cost=1.50..542.68 rows=216 width=58) (actual time=0.395..45.402 rows=350 loops=1) Hash Cond: (target.child_id = "*VALUES*".column1) -> Seq Scan on relations target (cost=0.00..440.29 rows=26329 width=58) (actual time=0.011..8.213 rows=26329 loops=1) -> Hash (cost=0.75..0.75 rows=60 width=4) (actual time=0.096..0.096 rows=60 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..0.75 rows=60 width=4) (actual time=0.001..0.049 rows=60 loops=1) Total runtime: 45.594 ms Argh. Where did my index go ? [shoots self in foot] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Bad plan using join on VALUES (and now on temp table too)
Well, the planner probably guessed that in your case it's faster to scan the table than to use the index (indexes are not free). Did it choose wrong? Yes, see the other query in my post... id IN ( 60 values ) => 0.582 ms (bitmap scan on the index : perfect) join with VALUES => 45.594 ms (seq scan) If you disable the seqscan, does it get faster (set enable_seqscan=false). Make sure you run both a few times to make sure you're getting good results. Yeah, everything is in cache. With enable_seq to 0, it does a nested loop at 1.190 ms If it turns out the planner is wrong, you need to do some tuning, in particular random_page_cost and effective_cache_size. I don't want to screw my tuning (which works well for all other queries) because of this single one ! I modified the website to use IN(), but I thought it might be a bug in the planner... I'll repost. I believe the planner does not consider the right plan. (see more below) EXPLAIN ANALYZE SELECT * FROM relations WHERE child_id IN( 8695,8743,10309,22000,22980,23016,8683,25092,13369,13377,13375,13371,13373,25126,10629,13363,13365,22820,15705,13367,8759,8783,8815,23018,8781,8765,23012,23036,23046,8803,25226,22940,8771,8769,13335,23004,22816,23062,8805,8727,13379,23064,23032,24976,8807,23024,8787,23028,8809,8735,23042,8813,8731,22964,8755,8697,8749,22974,8733,8715 ) ; QUERY PLAN --- Bitmap Heap Scan on relations (cost=176.93..383.43 rows=236 width=58) (actual time=0.295..0.470 rows=350 loops=1) Recheck Cond: (child_id = ANY ('{8695,8743,10309,22000,22980,23016,8683,25092,13369,13377,13375,13371,13373,25126,10629,13363,13365,22820,15705,13367,8759,8783,8815,23018,8781,8765,23012,23036,23046,8803,25226,22940,8771,8769,13335,23004,22816,23062,8805,8727,13379,23064,23032,24976,8807,23024,8787,23028,8809,8735,23042,8813,8731,22964,8755,8697,8749,22974,8733,8715}'::integer[])) -> Bitmap Index Scan on relations_child (cost=0.00..176.87 rows=236 width=0) (actual time=0.279..0.279 rows=350 loops=1) Index Cond: (child_id = ANY ('{8695,8743,10309,22000,22980,23016,8683,25092,13369,13377,13375,13371,13373,25126,10629,13363,13365,22820,15705,13367,8759,8783,8815,23018,8781,8765,23012,23036,23046,8803,25226,22940,8771,8769,13335,23004,22816,23062,8805,8727,13379,23064,23032,24976,8807,23024,8787,23028,8809,8735,23042,8813,8731,22964,8755,8697,8749,22974,8733,8715}'::integer[])) Total runtime: 0.571 ms explain analyze SELECT target.* FROM relations AS target, (VALUES (8695::INTEGER), (8743), (10309), (22000), (22980), (23016), (8683), (25092), (13369), (13377), (13375), (13371), (13373), (25126), (10629), (13363), (13365), (22820), (15705), (13367), (8759), (8783), (8815), (23018), (8781), (8765), (23012), (23036), (23046), (8803), (25226), (22940), (8771), (8769), (13335), (23004), (22816), (23062), (8805), (8727), (13379), (23064), (23032), (24976), (8807), (23024), (8787), (23028), (8809), (8735), (23042), (8813), (8731), (22964), (8755), (8697), (8749), (22974), (8733), (8715)) AS source WHERE target.child_id = source.column1; QUERY PLAN --- Hash Join (cost=1.50..542.68 rows=216 width=58) (actual time=0.349..44.907 rows=350 loops=1) Hash Cond: (target.child_id = "*VALUES*".column1) -> Seq Scan on relations target (cost=0.00..440.29 rows=26329 width=58) (actual time=0.011..8.250 rows=26334 loops=1) -> Hash (cost=0.75..0.75 rows=60 width=4) (actual time=0.076..0.076 rows=60 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..0.75 rows=60 width=4) (actual time=0.001..0.035 rows=60 loops=1) Total runtime: 45.048 ms SET enable_seqscan TO 0; SET caillaudangers=> explain analyze SELECT target.* FROM relations AS target, (VALUES (8695::INTEGER), (8743), (10309), (22000), (22980), (23016), (8683), (25092), (13369), (13377), (13375), (13371), (13373), (25126), (10629), (13363), (13365), (22820), (15705), (13367), (8759), (8783), (8815), (23018), (8781), (8765), (23012), (23036), (23046), (8803), (25226), (22940), (8771), (8769), (13335), (23004), (22816), (23062), (8805), (8727), (13379), (23064), (23032), (24976), (8807), (23024), (8787), (23028), (8809), (8735)
Re: [GENERAL] programmatic way to fetch latest release for a given major.minor version
Yeah yeah, but terminology aside, having 2 or three digits in each attribute is just wrong! Terminology aside, why? The unit is "8.1" not "8" and "1". It makes no sense to say you're on version 8, in the given context, so why should the XML data pretend there is? //Magnus Just pretend that : - version = a tuple of integers (a, b, c, ...) - major = (a, b) - minor = (c, ...) Besides, that is sortable (unlike strings where 15 < 2) : latest minor for major : major, max(minor) where major = what you want value="3" /> from BeautifulSoup import BeautifulSoup as Soup s = Soup("""/>""" ) v = s.find('pgversion') [int(x['value']) for x in v.find('major') ] [8, 2] [int(x['value']) for x in v.find('minor') ] [3] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] programmatic way to fetch latest release for a given major.minor version
I love Open Source XD http://ethan.tira-thompson.com/cvslog2web/ Note that this is overkill (but it would look SEXY on the site). However, the original poster probably wants to know when to update his servers, so he won't care about CVS commits... If there was a RSS feed of new postgres versions I'd subscribe to it though. But I can't help thinking that we should have some additional values for release notes, download sub-URLs (to be appended to the mirror roots) etc. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] The rule question before, request official documentation on the problem
Rules mess with queries. For data copying/archiving kinds of tasks, triggers are a better bet, like you suggested in your original post. Let me put that a different way: rules can *only* be used where data integrity is not at stake. My own thinking is that it might be time to make an official recommendation that they are only safe for views. NEW and OLD mean different things in a PL/pgSQL context and a Rules context. In PL/pgSQL NEW and OLD are values, in Rules (which specifically mess with queries) they are expressions. The fact that the same words mean different things in different contexts is a bit unfortunate but not as messy as say using "NEWEXPR" in the Rules context would be. Since we now have UPDATE/INSERT/DELETE RETURNING, one could imagine the rules using these to access the actual rows and not the expressions... But there is a perfectly valid argument against that : - There already is a mechanism designed specifically for this purpose (triggers). - It works perfectly. - Rules are supposed to rewrite queries to do stuff like views. It should be mentioned in the docs, though : someone with an account on the PG site should copypaste this mail exchange in the comments field... ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [pgsql-www] [GENERAL] programmatic way to fetch latest release for a given major.minor version
If someone wants the schema change, react *now*. Later on we can only append to it, and not change it :) Since I like to complain... Suppose you someday add another dot, or a "b" for beta, wouldn't it be better to have 823 ... or ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Arrays with Rails?
On Fri, 13 Apr 2007 10:30:29 +0200, Tino Wildenhain <[EMAIL PROTECTED]> wrote: Joshua D. Drake schrieb: Rick Schumeyer wrote: Has anyone here used a postgres array with Rails? If so, how? split()? Err... there is no type mapping? You know, some languages spoil us developers, so that we for granted that Doing It The Right Way is indeed the way it is, and then we feel the burning pain of having to deal with the reality... For instance, python and psycopg massively spoil you : cursor.execute( "select '{1,2}'::INTEGER[] AS myarray, 1 AS myint, 'abc' AS mytext, %s as myarg1, %s as myarg2, %s as myarg3", ( u"this is an unicode string €éàù", [1, 2, 3], ["this","is","an","array","of","text"] )) for row in cursor: for name, item in row.items(): print name, type( item ), item Results : As you can see, arguments and results are auto-converted from python types to postgres types (you can write converters for your own types). This includes unicode, arrays, etc. The interface is clean and elegant : you provide SQL with %s in it and a list of arguments. You can use named or positional arguments too. mytext abc myarray [1, 2] myint 1 myarg1 this is an unicode string €éàù myarg2 [1, 2, 3] myarg3 ['this', 'is', 'an', 'array', 'of', 'text'] Some (like me) would argue that this is NORMAL and that anything inferior to this is murder. I believe the job of libraries and languages is to help the programmer, not torture him. Then, other languages will make you feel the pain of having to quote all your arguments YOURSELF and provide all results as string. The most famous offender is PHP (this causes countless security holes). Ruby is no better : require "postgres" conn = PGconn.connect("localhost", 5432, "", "", "test") res = conn.exec("select '{1,2}'::INTEGER[] AS myarray, 1 AS myint, 'abc' AS mytext;") res.result.each {|row| row.each { |item| puts item puts item.class puts "\n" } } {1,2} String 1 String abc String So the answer to your question is : - either suffer - or code an additional layer above the ruby postgres lib which does the same as the python lib above. I partially did this for PHP. It's a lifesaver. No more addslashes() ! Yay ! ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Arrays with Rails?
On Fri, 13 Apr 2007 12:15:30 +0200, Alexander Presber <[EMAIL PROTECTED]> wrote: Listmail schrieb: Then, other languages will make you feel the pain of having to quote all your arguments YOURSELF and provide all results as string. The most famous offender is PHP (this causes countless security holes). I partially did this for PHP. It's a lifesaver. No more addslashes() ! Yay ! What about PEAR MDB2? http://pear.php.net/manual/en/package.database.mdb2.php Is it any good? Cheers, Alex Well, the problem with a lot of PHP libraries is that they are written by people who don't think. Python's interface for doing a query in your code is close to the ideal, which should be something like that : query( "SELECT * FROM duhhh WHERE id=%s AND date < %s", id, date ) (python's API has an extra set of () and it also takes named parameters ) If id is an python integer and date a python datetime object, format conversion is automatic. If they are not, first they should be, but whatever error the programmer makes DOES NOT make a SQL injection. At most psql will complain that you try to compare a date with something that is not a date, but you don't get hacked, since in order to put an un-quoted argument into the SQL you have to do it really on purpose. Now, I use many database queries in my web applications (so does everyone), therefore I consider a system that needs me to type a lot of crap in order to work is DUMB. PEAR::DB2 says : $query = 'INSERT INTO tablename (id, itemname, saved_time) VALUES (' . $mdb2->quote($id, 'integer') .', ' . $mdb2->quote($name, 'text') .', ' . $mdb2->quote($time, 'timestamp') .')'; $res =& $mdb2->query($query); As you see, - it's a lot less compact and readable - it's a pain to use, so I will copypaste code, which is the GREAT EVIL and a good source of bugs - you have to repeat the variable types (who cares ?) - if I am not well awake I may forget to type that crap because I'm sure the variable is an integer, why bother (but am I really sure ? => get hacked) Since PHP has no type for date, a much better way of doing this would be : query( "INSERT INTO ... VALUES %s,%s,%s", $id, $name, DB::datetime( $timestamp ) ) with all the basic types being quoted as they come (ie like a string since postgres doesn't care between 1 and '1'), and a few adapters for other types (like date). Also the ORM part of PEAR::DB2 is braindead since the field specifiers are not objects that you can custmize and derive... ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Temporary table in pl/pgsql
OK, suppose in his function : - if it does not exist, he creates the temp table, with ON COMMIT DELETE ROWS - if it does exists, he truncates it just to be sure So the next execution of the function will find the temp table, it will have the same OID, all is well. Now : BEGIN execute the function (and creates the table) ROLLBACK execute the function The Rollback will have rolled back the table creation too. Now when he executes the function again, will it get the cached plan with the rolled back table's OID ? On Fri, 13 Apr 2007 22:55:49 +0200, Merlin Moncure <[EMAIL PROTECTED]> wrote: On 4/13/07, Raymond O'Donnell <[EMAIL PROTECTED]> wrote: Hello again all, I'm using a temporary table in a pl/PgSQL function, and I've read the bit in the FAQ about using EXECUTE to force the table-creation query to be re-evaluated each time the function is called. However, it doesn't seem to work for me. The first time I call the function, all is well; the second (and every subsequent) time, the function bombs with the 'relation with OID X does not exist' error - this seems to imply that the EXECUTE statement is getting cached along with the rest of the function, which according to the docs ain't supposed to happen :-( in addition to the 'create table' stmt, all queries that touch the table must also be dynamic. postgresql 8.3 will have improved plan invalidation which will (aiui) remove this requirement. merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Can Postgresql be ported to a device with no OS and simple file I/O and memory management?
If you want embedded SQL, you'll probably have only 1 user at a time so sqlite is a better choice. But do you want embedded SQL ? On Wed, 18 Apr 2007 22:48:52 +0200, Raymond Hurst <[EMAIL PROTECTED]> wrote: I have a requirement of incorporating a database in a hard disk drive. Can postgresql do the job? Any consultants out here that have done this? If not, any suggestions? Ray Hurst Western Digital 20511 Lake Forest Drive Lake Forest, CA 92630 949-672-9853 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] tsearch2 benchmarks, Oleg gets prize
tsearch2 versus mysql FULLTEXT in the context of a large forum. I guess you know the answer already, but it never hurts to have nice graphics to show your boss. http://peufeu.free.fr/ftsbench/ I will upload new versions with more results, and maybe other engines, as I can. If someone is really interested, it would be interesting to run this stuff on a real server. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] tsearch2 dictionary that indexes substrings?
You want trigram based search. ie. postgresql -> 'pos', 'ost', 'stg', 'tgr', 'gre', 'res', 'esq', 'sql' searching for 'gresq' is searching for 'gre' and 'res' and 'esq' which is good friends with bitmap scan. Then a little LIKE '%gresq%' to filter the results. PS : indexing all substring means for long words you get huge number of lexems... On Fri, 20 Apr 2007 11:06:26 +0200, Tilmann Singer <[EMAIL PROTECTED]> wrote: Hi, In this http://archive.netbsd.se/?ml=pgsql-hackers&a=2006-10&t=2475028 thread Teodor Sigaev describes a way to make tsearch2 index substrings of words: Brain storm method: Develop a dictionary which returns all substring for lexeme, for example for word foobar it will be 'foobar fooba foob foo fo oobar ooba oob oo obar oba ob bar ba ar' Did anyone do that already? If I understand it correctly such a dictionary would require to write a custom C component - is that correct? Or could I get away with writing a plpgsql function that does the above and hooking that somehow into the tsearch2 config? thanks, Til ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] tsearch2 dictionary that indexes substrings?
I'm aware of that and in my case I don't think it will be a problem. It is for a type-ahead search web interface so actually it only requires indexing all possible substrings starting from char 1, ie. p, po, pos, post, postg, postgr, postgre, postgres, postgresq, postgresql. If you want to provide autocompletion, you could build a list of unique lexemes from inserted posts, and use a btree index and LIKE... ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] Bitmap Scan Pages
Hello, Is there a way to know how many pages were hit by a Bitmap Heap scan ? For instance : Bitmap Heap Scan on posts (cost=56.71..295.24 rows=2123 width=67) (actual time=0.575..1.462 rows=2160 loops=1) I'd like to know if it hit 2160 pages (ie I should really run CLUSTER) or 50 pages (ie. my table is well clustered, everything cool). Since, when it's cached, it's so fast anyway I cant' tell the difference, but when it's not cached, it is important. This would give an interesting probe for EXPLAIN ANALYZE tuning... ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] query from a list of ids
name| ids - Peter| 2, 3, 4, 5 Jack| 100, 34, 3 Both name and ids are in text format. IF you really do not want to use a link table (user_id, flag_id), you could use an array of ints instead of a string... ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Vacuum-full very slow
I don't see a way to remove the old index entries before inserting new ones without creating a window where the index and table will be inconsistent if vacuum fails. VACUUM FULL is slow because it plays with the indexes... CLUSTER is slow because it has to order the rows... Maybe, drop all the indexes, VACUUM FULL only the table, then recreate all the indexes ? If vacuum fails, the index drop would be rolled back. By the way, about indexes : When you have a small table (say, for a website, maybe a few tens of megabytes max...) reindexing it takes just a few seconds, maybe 10-20 seconds. It could be interesting, performance-wise, to tell postgres not to bother about crash-survivability of indexes on this table. Like temporary tables. Write nothing to WAL. If it crashes, on recovery, postgres would reindex the table. btree indexing is so fast on postgres that I'd definitely use this feature. I'd rather trade a minute of recovery versus less disk IO for index update. You could even do that for whole tables (like, web sessions table) which hold "perishable" data... CLUSTER avoids all this thrashing by recopying the whole table, but of course that has peak space requirements approximately twice the table size (and is probably not a win anyway unless most of the table rows need to be moved). You pays your money, you takes your choice. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Query in function not using index...
Any idea why using a variable v_streetName instead of a string 'wynn%' behaves differently? Yeah. 'wynn%' doesn't start with % so LIKE uses index. But Postgres doesn't know that you know that the contents of this variable never starts with '%'... Thanks, John ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Query in function not using index...
Tom, We were looking at the explain results and noticed that it was converting the 'wynn%' into fname >= 'wynn' and fname < 'wyno' Does this also work if the last character is a unicode character ? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Temporal Units
On Sun, 29 Apr 2007 22:33:37 +0200, Rich Shepard <[EMAIL PROTECTED]> wrote: On Sun, 29 Apr 2007, Martijn van Oosterhout wrote: Then I'm afraid you havn't indicated your requirements properly. All I can see is that the interval type does exactly what you want. It can store days, weeks, months or any combination thereof. You can multiply them and add them to dates and all works as expected. How does one define 'shift' with intervals? 0.33 DAY? Yeah, that's the problem. An Interval in the mathematical sense is a range (say, [1...2], or [monday 10AM ... thursday 10PM]) which means two known endpoints. INTERVAL in postgres is simply a convenient way to express a time difference in a way which is very helpful to solve practical problems (ie. today + '1 month' behaves as expected whatever the current month), but which might not be suited to your problem. I see your shifts as ranges expressed over a modulo something set : example, morning shift, day shift, night shift, are [ begin hour .. end hour ] modulo 24 hour, since they repeat every day. Work days are [monday .. friday] modulo 7 days. Postgres intervals can't express this, since they have no fixed beginning or end points, they are simply differences. So if you want to know how many times a thing has been monitored each month, maybe count(*) GROUP BY EXTRACT( month FROM monitoring_time ); same thing for week and weekdays, and more funky formulations will be needed for shifts... ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] PG Books
loaded by a single application and used by our customers with MS Access, From by previous job where we had an Access based backoffice application, you might want to learn enough to be ready to switch to something better (ie postgres) when you feel the need. Access is a booby trap setup by Microsoft to sell SQL server. It will work well up to a certain point, then die miserably (some of our web pages were taking minutes to load !). When this happens many people complain and you usually buy SQL server to ease the pain. Or, as we did, you ditch the MS stuff and switch to other tools. I wasn't in charge of this, so can't say more, but beware of Access. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Have I b0rked something? Slow comparisons on "where x in (...)"
Try creating a temporary table, populating with the list and joining against it. That's probably your best bet for a long list of target values. Check : forum_bench=> CREATE TABLE test (value INTEGER NOT NULL); CREATE TABLE forum_bench=> INSERT INTO test SELECT * FROM generate_series( 1, 100 ); INSERT 0 100 forum_bench=> ANALYZE test; forum_bench=> EXPLAIN ANALYZE SELECT * FROM test; QUERY PLAN --- Seq Scan on test (cost=0.00..14405.24 rows=24 width=4) (actual time=0.030..349.699 rows=100 loops=1) Total runtime: 542.914 ms (2 lignes) OK : 542 ms to grab the data. IN() : EXPLAIN ANALYZE SELECT * FROM test WHERE value IN ( 1000 values from 0 to 999000 in steps of 1000 ): Seq Scan on test (cost=0.00..1264310.24 rows=1000 width=4) (actual time=17.649..17977.085 rows=999 loops=1) Filter: (value = ANY ('{0,1000..99000}'::integer[])) Total runtime: 17978.061 ms Ouch. forum_bench=> EXPLAIN ANALYZE SELECT * FROM test WHERE value IN (VALUES (0),(1000),(2000),(998000),(999000)); QUERY PLAN Hash Join (cost=19.50..18176.45 rows=200 width=4) (actual time=2.823..736.960 rows=999 loops=1) Hash Cond: (test.value = "*VALUES*".column1) -> Seq Scan on test (cost=0.00..14405.24 rows=24 width=4) (actual time=0.032..335.680 rows=100 loops=1) -> Hash (cost=17.00..17.00 rows=200 width=4) (actual time=2.108..2.108 rows=1000 loops=1) -> HashAggregate (cost=15.00..17.00 rows=200 width=4) (actual time=1.165..1.542 rows=1000 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..12.50 rows=1000 width=4) (actual time=0.004..0.478 rows=1000 loops=1) Total runtime: 737.362 ms Removing the 542 ms to read the table, we see checking if the values are in the hash is really rally fast. So, obvious truth : hash is faster than dumb compare. Much faster. Now, postgres should do this on its own, I think. PS : if the 1000 values are all the same (1000 times 1), IN() doesn't detect it, so the runtime does not change. Hash join doesn't care, so the runtime doesn't change either. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Have I b0rked something? Slow comparisons on "where x in (...)"
Followup to my previous test, with an index this time EXPLAIN ANALYZE SELECT * FROM test WHERE value IN ( 1000 integers ) Bitmap Heap Scan on test (cost=3519.09..7156.83 rows=1000 width=4) (actual time=5.843..8.897 rows=999 loops=1) Recheck Cond: (value = ANY ('{0,...,999000}'::integer[])) -> Bitmap Index Scan on testindex (cost=0.00..3518.84 rows=1000 width=0) (actual time=5.594..5.594 rows=999 loops=1) Index Cond: (value = ANY ('{0,...,999000}'::integer[])) Total runtime: 9.157 ms EXPLAIN ANALYZE SELECT * FROM test WHERE value IN (VALUES (0),(1000),...(999000)) Nested Loop (cost=15.00..1461.74 rows=200 width=4) (actual time=1.191..26.127 rows=999 loops=1) -> HashAggregate (cost=15.00..17.00 rows=200 width=4) (actual time=1.169..1.673 rows=1000 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..12.50 rows=1000 width=4) (actual time=0.007..0.517 rows=1000 loops=1) -> Index Scan using testindex on test (cost=0.00..7.21 rows=1 width=4) (actual time=0.023..0.023 rows=1 loops=1000) Index Cond: (test.value = "*VALUES*".column1) Total runtime: 26.411 ms Mixing the two would be a win : - hashing the values - making a bitmap from them - grabbing the pages and using the hash in "Recheck Cond" ie. something like that : -> HashAggregate (cost=15.00..17.00 rows=200 width=4) (actual time=1.169..1.673 rows=1000 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..12.50 rows=1000 width=4) (actual time=0.007..0.517 rows=1000 loops=1) Bitmap Heap Scan on test (cost=3519.09..7156.83 rows=1000 width=4) (actual time=5.843..8.897 rows=999 loops=1) Recheck Cond: (value in hash) -> Bitmap Index Scan on testindex (cost=0.00..3518.84 rows=1000 width=0) (actual time=5.594..5.594 rows=999 loops=1) Index Cond: (value in hash) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Indexing questions: Index == key? And index vs substring - how successful?
1. Does an indexed column on a table have to be a potential primary key? Nope, create as many index as you need/must/should. I've been working with a couple of rather large tables where a common select is on a foreign key called 'cntrct_id' (Varchar(9) in format). However, the same 'cntrct_id' can appear on multiple records in the tables I'm trying to work with now; the tables themselves record events associated with the given 'cntrct_id' record and can store many events for one 'cntrct_id' value. I'd thought that creating an index on the table.cntrct_id field for the event tables would allow me to speed up the transations some, but comparisons of time before and after the indexing lead me to wonder if I was mistaken in this. The times were almost identical in the following areas: Before Indexing, after Indexing but before Analyzing, and after Analyzing. 2. Another common sort on these fields uses part, not all, of the 'cntrct_id' value to search for things; the first character marks original location in an internal framework we're using, for example, and the third character marks the month of the year that the original 'cntrct_id' record was set up. Sorts on either of those are fairly common as well; would indexing on the cntrct_id as a whole be able to speed up a sort on a portion of it? Nope. This looks like suboptimal schema design... If you had an indexed date column, you would be able to make fast indexed queries with BETWEEN, >=, <=, etc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [GENERAL] Have I b0rked something? Slow comparisons on "where x in (...)"
I used VALUES as a replacement for the temporary table since for this application, it is a lot more useful. The point is : SELECT * FROM table WHERE value IN ( 1000 integers ) : does 1000 comparisons for each row SELECT * FROM table WHERE value IN ( VALUES (1000 integerss) ) : builds a Hash with the 1000 values and uses it to test rows, which is a lot faster if you have many values to compare with. The first one is faster if the number of values in the IN() is small. The second one is faster if the number of values in the IN() is large. EXPLAIN ANALYZE SELECT * FROM table JOIN test ON (table.column = test.value) It wouldn't give the same result : both queries above remove duplicates, this one does not. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq