[GENERAL] Tsearch2 crashes my backend, ouch !

2007-03-30 Thread Listmail


Hello,

	I have just ditched Gentoo and installed a brand new kubuntu system (was  
tired of the endless compiles).
	I have a problem with crashing tsearch2. This appeared both on Gentoo and  
the brand new kubuntu.


I will describe all my install procedure, maybe I'm doing something 
wrong.

Cluster is newly created and empty.

initdb was done with UNICODE encoding & locales.

# from postgresql.conf

# These settings are initialized by initdb -- they might be changed
lc_messages = 'fr_FR.UTF-8' # locale for system error  
message strings
lc_monetary = 'fr_FR.UTF-8' # locale for monetary  
formatting
lc_numeric = 'fr_FR.UTF-8'  # locale for number  
formatting
lc_time = 'fr_FR.UTF-8' # locale for time  
formatting


[EMAIL PROTECTED]:~$ locale
LANG=fr_FR.UTF-8
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
etc...

	First import needed .sql files from contrib and check that the default  
tsearch2 config works for English


$ createdb -U postgres test
$ psql -U postgres test ./configure --prefix=/usr/lib/postgresql/8.2/  
--datadir=/usr/share/postgresql/8.2 --enable-nls=fr --with-python

cd contrib/tsearch2
make
cd gendict
(copy french stem.c and stem.h from the snowball website)
./config.sh -n fr -s -p french_UTF_8 -i -v -c stem.c -h stem.h -C'Snowball  
stemmer for French'

cd ../../dict_fr
make clean && make
sudo make install

Now we have :

/bin/sh ../../config/install-sh -c -m 644 dict_fr.sql  
'/usr/share/postgresql/8.2/contrib'
/bin/sh ../../config/install-sh -c -m 755  libdict_fr.so.0.0  
'/usr/lib/postgresql/8.2/lib/dict_fr.so'


Okay...

- download and install UTF8 french dictionaries from  
http://www.davidgis.fr/download/tsearch2_french_files.zip and put them in  
contrib directory

(the files delivered by debian package ifrench are ISO8859, bleh)

- import french shared libs
psql -U postgres test < /usr/share/postgresql/8.2/contrib/dict_fr.sql

Then :

test=# select lexize( 'en_stem', 'flying' );
 lexize

 {fli}

And :

test=# select * from pg_ts_dict where dict_name ~ '^(fr|en)';
 dict_name |   dict_init   |   dict_initoption|   
dict_lexize  |dict_comment

---+---+--+---+-
 en_stem   | snb_en_init(internal) | contrib/english.stop |  
snb_lexize(internal,internal,integer) | English Stemmer. Snowball.
 fr| dinit_fr(internal)|  |  
snb_lexize(internal,internal,integer) | Snowball stemmer for French


test=# select lexize( 'fr', 'voyageur' );
server closed the connection unexpectedly

BLAM ! Try something else :

test=# UPDATE pg_ts_dict SET  
dict_initoption='/usr/share/postgresql/8.2/contrib/french.stop' WHERE  
dict_name = 'fr';

UPDATE 1
test=# select lexize( 'fr', 'voyageur' );
server closed the connection unexpectedly

Try other options :

dict_name   | fr_ispell
dict_init   | spell_init(internal)
dict_initoption |  
DictFile="/usr/share/postgresql/8.2/contrib/french.dict",AffFile="/usr/share/postgresql/8.2/contrib/french.aff",StopFile="/usr/share/postgresql/8.2/contrib/french.stop"

dict_lexize | spell_lexize(internal,internal,integer)
dict_comment|

test=# select lexize( 'en_stem', 'traveler' ), lexize( 'fr_ispell',  
'voyageur' );

-[ RECORD 1 ]---
lexize | {travel}
lexize | {voyageuse}

	Now it works (kinda) but stemming doesn't stem for French (since snowball  
is out). It should return 'voyage' (=travel) instead of 'voyageuse'  
(=female traveler)

That's now what I want ; i want to use snowball to stem French words.

	I'm going to make a debug build and try to debug it, but if anyone can  
help, you're really, really welcome.






---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Tsearch2 crashes my backend, ouch !

2007-03-30 Thread Listmail


OK, I've solved my problem... thanks for the hint !

	Anyway, just to signal that tsearch2 crashes if SELECT is not granted to  
pg_ts_dict (other tables give a proper error message when not GRANTed).On  
Fri, 30 Mar 2007 13:20:30 +0200, Listmail <[EMAIL PROTECTED]> wrote:




Hello,

	I have just ditched Gentoo and installed a brand new kubuntu system  
(was tired of the endless compiles).
	I have a problem with crashing tsearch2. This appeared both on Gentoo  
and the brand new kubuntu.


	I will describe all my install procedure, maybe I'm doing something  
wrong.


Cluster is newly created and empty.

initdb was done with UNICODE encoding & locales.

# from postgresql.conf

# These settings are initialized by initdb -- they might be changed
lc_messages = 'fr_FR.UTF-8' # locale for system  
error message strings
lc_monetary = 'fr_FR.UTF-8' # locale for monetary  
formatting
lc_numeric = 'fr_FR.UTF-8'  # locale for number  
formatting
lc_time = 'fr_FR.UTF-8' # locale for time  
formatting


[EMAIL PROTECTED]:~$ locale
LANG=fr_FR.UTF-8
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC="fr_FR.UTF-8"
etc...

	First import needed .sql files from contrib and check that the default  
tsearch2 config works for English


$ createdb -U postgres test
$ psql -U postgres test ./configure --prefix=/usr/lib/postgresql/8.2/  
--datadir=/usr/share/postgresql/8.2 --enable-nls=fr --with-python

cd contrib/tsearch2
make
cd gendict
(copy french stem.c and stem.h from the snowball website)
./config.sh -n fr -s -p french_UTF_8 -i -v -c stem.c -h stem.h  
-C'Snowball stemmer for French'

cd ../../dict_fr
make clean && make
sudo make install

Now we have :

/bin/sh ../../config/install-sh -c -m 644 dict_fr.sql  
'/usr/share/postgresql/8.2/contrib'
/bin/sh ../../config/install-sh -c -m 755  libdict_fr.so.0.0  
'/usr/lib/postgresql/8.2/lib/dict_fr.so'


Okay...

- download and install UTF8 french dictionaries from  
http://www.davidgis.fr/download/tsearch2_french_files.zip and put them  
in contrib directory

(the files delivered by debian package ifrench are ISO8859, bleh)

- import french shared libs
psql -U postgres test < /usr/share/postgresql/8.2/contrib/dict_fr.sql

Then :

test=# select lexize( 'en_stem', 'flying' );
  lexize

  {fli}

And :

test=# select * from pg_ts_dict where dict_name ~ '^(fr|en)';
  dict_name |   dict_init   |   dict_initoption 
|  dict_lexize  |dict_comment

---+---+--+---+-
  en_stem   | snb_en_init(internal) | contrib/english.stop |  
snb_lexize(internal,internal,integer) | English Stemmer. Snowball.
  fr| dinit_fr(internal)|  |  
snb_lexize(internal,internal,integer) | Snowball stemmer for French


test=# select lexize( 'fr', 'voyageur' );
server closed the connection unexpectedly

BLAM ! Try something else :

test=# UPDATE pg_ts_dict SET  
dict_initoption='/usr/share/postgresql/8.2/contrib/french.stop' WHERE  
dict_name = 'fr';

UPDATE 1
test=# select lexize( 'fr', 'voyageur' );
server closed the connection unexpectedly

Try other options :

dict_name   | fr_ispell
dict_init   | spell_init(internal)
dict_initoption |  
DictFile="/usr/share/postgresql/8.2/contrib/french.dict",AffFile="/usr/share/postgresql/8.2/contrib/french.aff",StopFile="/usr/share/postgresql/8.2/contrib/french.stop"

dict_lexize | spell_lexize(internal,internal,integer)
dict_comment|

test=# select lexize( 'en_stem', 'traveler' ), lexize( 'fr_ispell',  
'voyageur' );

-[ RECORD 1 ]---
lexize | {travel}
lexize | {voyageuse}

	Now it works (kinda) but stemming doesn't stem for French (since  
snowball is out). It should return 'voyage' (=travel) instead of  
'voyageuse' (=female traveler)

That's now what I want ; i want to use snowball to stem French words.

	I'm going to make a debug build and try to debug it, but if anyone can  
help, you're really, really welcome.








---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] postgresl for mysql?

2007-04-01 Thread Listmail



>What will they think of next!
>http://krow.livejournal.com/502908.html
>I suppose it makes as much sense as the others, except why would you
>want to use mysql if the storage is in postgres?

If you've inherited data in a postgresql database this will allow
you to migrate it to the industry standard database without the
inconvenience and downtime of a dump from postgresql and
a restore into mysql.

I don't think it's a new idea - IIRC, Aprile Pazzo did something
similar for MySQL 3 and PG 7.something.


What an interesting name!  I don't know much Italian other than what
I've picked up from a few movies, but I think I now know what Pazzo
means ...



	Yeah well you know mysqldump has an option "export to postgres syntax" so  
you can reimport in postgres.

I encourage you to try it one day, you'll be amazed.

mysqldump --password -d -u root immo_forum

DROP TABLE IF EXISTS `smf_topics`;
CREATE TABLE `smf_topics` (
  `ID_TOPIC` mediumint(8) unsigned NOT NULL auto_increment,
  `isSticky` tinyint(4) NOT NULL default '0',
  `ID_BOARD` smallint(5) unsigned NOT NULL default '0',
  `ID_FIRST_MSG` int(10) unsigned NOT NULL default '0',
  `ID_LAST_MSG` int(10) unsigned NOT NULL default '0',
  `ID_MEMBER_STARTED` mediumint(8) unsigned NOT NULL default '0',
  `ID_MEMBER_UPDATED` mediumint(8) unsigned NOT NULL default '0',
  `ID_POLL` mediumint(8) unsigned NOT NULL default '0',
  `numReplies` int(10) unsigned NOT NULL default '0',
  `numViews` int(10) unsigned NOT NULL default '0',
  `locked` tinyint(4) NOT NULL default '0',
  PRIMARY KEY  (`ID_TOPIC`),
  UNIQUE KEY `lastMessage` (`ID_LAST_MSG`,`ID_BOARD`),
  UNIQUE KEY `firstMessage` (`ID_FIRST_MSG`,`ID_BOARD`),
  UNIQUE KEY `poll` (`ID_POLL`,`ID_TOPIC`),
  KEY `isSticky` (`isSticky`),
  KEY `ID_BOARD` (`ID_BOARD`)
) ENGINE=MyISAM AUTO_INCREMENT=25 DEFAULT CHARSET=latin1  
COLLATE=latin1_general_ci;

/*!40103 SET [EMAIL PROTECTED] */;

mysqldump --compatible=postgres --password -d -u root immo_forum

DROP TABLE IF EXISTS "smf_topics";
CREATE TABLE "smf_topics" (
  "ID_TOPIC" mediumint(8) unsigned NOT NULL,
  "isSticky" tinyint(4) NOT NULL default '0',
  "ID_BOARD" smallint(5) unsigned NOT NULL default '0',
  "ID_FIRST_MSG" int(10) unsigned NOT NULL default '0',
  "ID_LAST_MSG" int(10) unsigned NOT NULL default '0',
  "ID_MEMBER_STARTED" mediumint(8) unsigned NOT NULL default '0',
  "ID_MEMBER_UPDATED" mediumint(8) unsigned NOT NULL default '0',
  "ID_POLL" mediumint(8) unsigned NOT NULL default '0',
  "numReplies" int(10) unsigned NOT NULL default '0',
  "numViews" int(10) unsigned NOT NULL default '0',
  "locked" tinyint(4) NOT NULL default '0',
  PRIMARY KEY  ("ID_TOPIC"),
  UNIQUE KEY "lastMessage" ("ID_LAST_MSG","ID_BOARD"),
  UNIQUE KEY "firstMessage" ("ID_FIRST_MSG","ID_BOARD"),
  UNIQUE KEY "poll" ("ID_POLL","ID_TOPIC"),
  KEY "isSticky" ("isSticky"),
  KEY "ID_BOARD" ("ID_BOARD")
);
/*!40103 SET [EMAIL PROTECTED] */;

Sure looks "compatible" (but with what ?)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Tsearch2 crashes my backend, ouch !

2007-04-01 Thread Listmail


	Anyway, just to signal that tsearch2 crashes if SELECT is not granted  
to pg_ts_dict (other tables give a proper error message when not  
GRANTed).On


I don't understand this. Are sure on this ? From prompt in  your select  
examples I see you have superuser's rights and you have successfully  
select from pg_ts_dict column.


	It was tricky to reproduce... I thought I had hallucinations but here it  
is :


- open two psql windows (one postgres user, one normal unprivileged user),  
see > or # in prompt for which window I use to type commands/


- first let's lock ourselves up :

caillaudangers=# REVOKE select ON pg_ts_dict FROM caillaudangers ;
REVOKE
caillaudangers=# REVOKE select ON pg_ts_cfg FROM caillaudangers ;
REVOKE
caillaudangers=# REVOKE select ON pg_ts_cfgmap FROM caillaudangers ;
REVOKE
caillaudangers=# REVOKE select ON pg_ts_parser FROM caillaudangers ;
REVOKE

- then try to access :

caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' );
ERREUR:  droit refusé pour la relation pg_ts_dict
CONTEXT:  instruction SQL «select dict_init, dict_initoption, dict_lexize  
from public.pg_ts_dict where oid = $1»


caillaudangers=# GRANT select ON pg_ts_dict TO caillaudangers ;
GRANT

caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' );
ERREUR:  No dictionary with id 138493128

Strange error message ??

caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' );
ERREUR:  droit refusé pour la relation pg_ts_cfg
CONTEXT:  instruction SQL «select prs_name from public.pg_ts_cfg where oid  
= $1»


Proper error message now. Let's go back.

caillaudangers=# REVOKE select ON pg_ts_dict FROM caillaudangers ;
REVOKE

Now try to select to_tsvector and each time a permission is denied, grant  
the needed table.


caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' );
ERREUR:  droit refusé pour la relation pg_ts_cfg
CONTEXT:  instruction SQL «select prs_name from public.pg_ts_cfg where oid  
= $1»


caillaudangers=# GRANT select ON pg_ts_cfg TO caillaudangers ;
GRANT

caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' );
ERREUR:  droit refusé pour la relation pg_ts_cfgmap
CONTEXT:  instruction SQL «select lt.tokid, map.dict_name from  
public.pg_ts_cfgmap as map, public.pg_ts_cfg as cfg, public.token_type( $1  
) as lt where lt.alias =  map.tok_alias and map.ts_name = cfg.ts_name and  
cfg.oid= $2 order by lt.tokid desc;»


caillaudangers=# GRANT select ON pg_ts_cfgmap TO caillaudangers ;
GRANT

caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' );
ERREUR:  droit refusé pour la relation pg_ts_parser
CONTEXT:  instruction SQL «select prs_start, prs_nexttoken, prs_end,  
prs_lextype, prs_headline from public.pg_ts_parser where oid = $1»


caillaudangers=# GRANT select ON pg_ts_parser TO caillaudangers ;
GRANT

caillaudangers=> SELECT to_tsvector( 'bill gates is watching us' );
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
La connexion au serveur a été perdue. Tentative de réinitialisation :  
Echec.


There it crashes. It's bizarre.




---(end of broadcast)---
TIP 6: explain analyze is your friend


[GENERAL] BitmapScan mishaps

2007-04-03 Thread Listmail


Hello everyone !

I have this query :

annonces=> EXPLAIN ANALYZE SELECT * FROM annonces AS a WHERE
detect_time > CURRENT_TIMESTAMP - '7 DAY'::INTERVAL
AND detect_time >= '2006-10-30 16:17:45.064793'
AND vente
AND surface IS NOT NULL AND price IS NOT NULL
AND type_id IN  
(1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6)

AND (a.zipcode IN (69001,69002,69003,69004,69005,69006,69007,69008,69009)
OR a.city_id IN (27595)
OR a.coords &&  
'(45.74101689082,4.8371263505564),(45.75898310918,4.8628736494436)'::BOX) ;
   QUERY  
PLAN

-
 Bitmap Heap Scan on annonces a  (cost=1657.06..7145.98 rows=1177  
width=691) (actual time=118.342..118.854 rows=194 loops=1)
   Recheck Cond: (((vente AND (zipcode = ANY  
('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[])))  
OR (vente AND (city_id = 27595)) OR (coords &&  
'(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box))  
AND (detect_time > (now() - '7 days'::interval)) AND (detect_time >=  
'2006-10-30 16:17:45.064793'::timestamp without time zone))
   Filter: (vente AND (surface IS NOT NULL) AND (price IS NOT NULL) AND  
(type_id = ANY  
('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[])))
   ->  BitmapAnd  (cost=1657.06..1657.06 rows=2465 width=0) (actual  
time=118.294..118.294 rows=0 loops=1)
 ->  BitmapOr  (cost=133.83..133.83 rows=4368 width=0) (actual  
time=2.903..2.903 rows=0 loops=1)
   ->  Bitmap Index Scan on annonces_zip  (cost=0.00..56.54  
rows=1825 width=0) (actual time=0.599..0.599 rows=1580 loops=1)
 Index Cond: ((vente = true) AND (zipcode = ANY  
('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[])))
   ->  Bitmap Index Scan on annonces_city  (cost=0.00..43.30  
rows=1904 width=0) (actual time=0.464..0.464 rows=1575 loops=1)

 Index Cond: ((vente = true) AND (city_id = 27595))
   ->  Bitmap Index Scan on annonces_coords  (cost=0.00..33.10  
rows=640 width=0) (actual time=1.837..1.837 rows=2166 loops=1)
 Index Cond: (coords &&  
'(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)
 ->  Bitmap Index Scan on annonces_date  (cost=0.00..1522.68  
rows=72241 width=0) (actual time=114.930..114.930 rows=68022 loops=1)
   Index Cond: ((detect_time > (now() - '7 days'::interval))  
AND (detect_time >= '2006-10-30 16:17:45.064793'::timestamp without time  
zone))

 Total runtime: 119.000 ms
(14 lignes)

The interesting part is :

Bitmap Index Scan on annonces_date  (cost=0.00..1522.68 rows=72241  
width=0) (actual time=114.930..114.930 rows=68022 loops=1)


It bitmapscans about half the table...
	I realized this index was actually useless for all my queries, so I  
dropped it, and behold :


  QUERY  
PLAN

--
 Bitmap Heap Scan on annonces a  (cost=133.83..7583.77 rows=1176  
width=691) (actual time=5.483..18.731 rows=194 loops=1)
   Recheck Cond: ((vente AND (zipcode = ANY  
('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[])))  
OR (vente AND (city_id = 27595)) OR (coords &&  
'(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box))
   Filter: ((detect_time > (now() - '7 days'::interval)) AND (detect_time  
= '2006-10-30 16:17:45.064793'::timestamp without time zone) AND vente  
AND (surface IS NOT NULL) AND (price IS NOT NULL) AND (type_id = ANY  
('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[])))
   ->  BitmapOr  (cost=133.83..133.83 rows=4368 width=0) (actual  
time=2.648..2.648 rows=0 loops=1)
 ->  Bitmap Index Scan on annonces_zip  (cost=0.00..56.54  
rows=1825 width=0) (actual time=0.505..0.505 rows=1580 loops=1)
   Index Cond: ((vente = true) AND (zipcode = ANY  
('{69001,69002,69003,69004,69005,69006,

Re: [GENERAL] UPDATE on two large datasets is very slow

2007-04-03 Thread Listmail



I can't help but think that the way this application writes data is
optimized for MySQL's transactionless table type, where lots of
simultaneous input streams writing at the same time to the same table
would be death.

Can you step back and work on how the app writes out data, so that it
opens a persistent connection, and then sends in the updates one at a
time, committing every couple of seconds while doing so?


	No, he can't, I also coded a bittorrent tracker of the same kind, and the  
problem is :


- each user has N torrents active
- torrent client does N tracker requests
- tracker only does 1 UDPATE to update user's stats

	So if you do instantaneous updates you multiply your query load by N (on  
average between 6 and 8).


Besides, these kinds of trackers face several problems :

	- they are accessed by clients which have near saturated connections  
since they're leeching illegal prOn like crazy

- therefore these HTTP connections are very slow
- therefore you have a hell of a lot of concurrent connections.

	Therefore using a threaded server for this kind of load is asking for  
trouble.
	All decent torrent trackers are designed like lighttpd : select() /  
poll() or other variants, and no threads.

No threads means, database queries are something long and to be avoided.
	Hosting providers will delete your account if they see a php torrent  
tracker on it, and for good reason.


600 hits/s = 600 connections = 600 apache and PG process = you cry.

	Anyway my tracker was in Python with select/poll asynchronous HTTP model.  
It handled 200 HTTP requests per second using 10% CPU on a Core 2. I guess  
thats pretty decent.


	(I do NOT work on it anymore, DO NOT ASK for sources, it is illegal now  
in my country to code trackers so I have completely dropped the project,  
but I guess helping a fellow living in a free country is OK)


Back to databases.

You complain that postgres is slow for your application.
	Yes, it is a lot slower than MyISAM *on this application* (but try InnoDB  
and cry).
	But PG is a real database. It is simply not the right tool to your  
application.

You have to choose between in-place updates and transactions.

(besides, your website is locked while MySQL does your big UPDATE).

Here is how you can do it :

Your problem is that you put the peers in the database.
Ask yourself why ?

You need seeders / leechers count for each torrent ?
	-> Two INTEGERs in your torrents table, updated in batch by the tracker  
every hour.


	You need to have all peers saved somewhere so that you may exit and  
restart your tracker ?

-> code your tracker in python and be able to reload running code
-> or just save it when you exit
	-> or don't save it, it's not like it's your bank accounting data, who  
cares
	-> the FBI will be happy to have all that data when they seize your  
server (see: piratebay laughs as all data was in RAM and police had to  
unplug the server to seize it.)


So, DO NOT put the peers in the database.
	IF you put the peers info in the database you get one UPDATE per user per  
torrent.

If you only update the user stats you only get one UPDATE per user.
	And the tracker never inserts users and torrents (hopefully) so you only  
get UPDATES to users and to torrents tables, never inserts.


Now you need to display realtime info on the user's and torrents pages.
	This is easily done : your tracker is a HTTP server, it can serve data  
via HTTP (php serialized, JSON, whatever) that is inserted via AJAX of PHP  
in your webpages.


	From my stats my tracker needs about 100 microseconds to serve a HTTP web  
page with the peer counts for a torrent.


	So, you don't need Postgres for your tracker ! Use it for your website  
instead...






---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] BitmapScan mishaps

2007-04-03 Thread Listmail

On Tue, 03 Apr 2007 19:23:31 +0200, Tom Lane <[EMAIL PROTECTED]> wrote:


Listmail <[EMAIL PROTECTED]> writes:

It bitmapscans about half the table...


Which PG version is this exactly?  We've fooled with the
choose_bitmap_and heuristics quite a bit ...

regards, tom lane


Version is 8.2.3.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] PGSQL with high number of database rows?

2007-04-03 Thread Listmail


Are there any implications with possibly doing this? will PG handle it?  
Are there realworld systems using PG that have a massive amount of data  
in them?


It's not how much data you have, it's how you query it.

	You can have a table with 1000 rows and be dead slow if said rows are big  
TEXT data and you seq-scan it in its entierety on every webpage hit your  
server gets...
	You can have a terabyte table with billions of row, and be fast if you  
know what you're doing and have proper indexes.


	Learning all this is very interesting. MySQL always seemed hostile to me,  
but postgres is friendly, has helpful error messages, the docs are great,  
and the developer team is really nice.


	The size of your data has no importance (unless your disk is full), but  
the size of your working set does.


	So, if you intend on querying your data for a website, for instance,  
where the user searches data using forms, you will need to index it  
properly so you only need to explore small sections of your data set in  
order to be fast.


	If you intend to scan entire tables to generate reports or statistics,  
you will be more interested in knowing if the size of your RAM is larger  
or smaller than your data set, and about your disk throughput.


So, what is your application ?

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Webappication and PostgreSQL login roles

2007-04-03 Thread Listmail



I designed a Java web application. The persistence layer is a
PostgreSQL database. The application needs user authentication.
I think it's a good choice to implement this authentication mechanism
via PostgreSQL login roles. So I can create several database login
roles and set the database permissions to this login roles. This is my
first project with the postgres database, so I don't know how I can
validate a login from the website. Is there a best practice to do this
or does PostgreSQL offers a stored procedure like
'authenticateUser(String username, String password)'?


	Keep in mind that this might interact badly with very desirable features  
like :


- persistent connections
	(opening a postgres connection takes a lot longer than a simple SELECT,  
so if you must reopen connections all the time your performance will suck)


- connection pooling
(what happens when a user gets the admin's connection out of the pool ?)

	Since you use an object-relational mapper I believe it is better, and  
more flexible to have your objects handle their own operations.
	On a very basic level your objects can have a .isReadOnly() method which  
is checked in your application before any writing takes place, for  
instance.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] BitmapScan mishaps

2007-04-03 Thread Listmail



Hmmm [ studies query a bit more... ]  I think the reason why that index
is so expensive to use is exposed here:

   Index Cond: ((detect_time > (now() - '7  
days'::interval)) AND (detect_time >= '2006-10-30  
16:17:45.064793'::timestamp without time zone))


Evidently detect_time is timestamp without time zone, but you're
comparing it to an expression that is timestamp with time zone
(ie CURRENT_TIMESTAMP).  That's an enormously expensive operator
compared to straight comparisons of two timestamps of the same ilk,
because it does some expensive stuff to convert across time zones.
And you're applying it to a whole lot of index rows.

If you change the query to use LOCALTIMESTAMP to avoid the type
conversion, how do the two plans compare?

regards, tom lane


	OK, I recreated the index, and... you were right. Actually, it was my  
query that sucked.


   Index Cond: ((detect_time > (now() - '7  
days'::interval)) AND (detect_time >= '2006-10-30  
16:17:45.064793'::timestamp without time zone))


	Is it greater() which returns the highest of two values ? (like max() but  
not aggregate)

Anyway, I fixed this in the code that generates the query, it's cleaner.
So now, I just put a constant timestamp.

	Then we have this interesting side effect. Simply changing the timestamp  
value induces a different plan, and the one which returns more rows is  
actually faster !


annonces=> EXPLAIN ANALYZE SELECT * FROM annonces AS a  
WHERE
detect_time >= '2007-03-27 20:46:29.187131+02'

AND vente
AND surface IS NOT NULL AND price IS NOT NULL
AND type_id IN  
(1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6)

AND (a.zipcode IN (69001,69002,69003,69004,69005,69006,69007,69008,69009)
OR a.city_id IN (27595)
OR a.coords &&  
'(45.74101689082,4.8371263505564),(45.75898310918,4.8628736494436)'::BOX) ;
   QUERY  
PLAN


 Bitmap Heap Scan on annonces a  (cost=1422.91..6758.82 rows=1130  
width=691) (actual time=27.007..27.542 rows=194 loops=1)
   Recheck Cond: (((vente AND (zipcode = ANY  
('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[])))  
OR (vente AND (city_id = 27595)) OR (coords &&  
'(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box))  
AND (detect_time >= '2007-03-27 20:46:29.187131'::timestamp without time  
zone))
   Filter: (vente AND (surface IS NOT NULL) AND (price IS NOT NULL) AND  
(type_id = ANY  
('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[])))
   ->  BitmapAnd  (cost=1422.91..1422.91 rows=2367 width=0) (actual  
time=26.960..26.960 rows=0 loops=1)
 ->  BitmapOr  (cost=133.80..133.80 rows=4368 width=0) (actual  
time=2.764..2.764 rows=0 loops=1)
   ->  Bitmap Index Scan on annonces_zip  (cost=0.00..56.54  
rows=1825 width=0) (actual time=0.503..0.503 rows=1580 loops=1)
 Index Cond: ((vente = true) AND (zipcode = ANY  
('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[])))
   ->  Bitmap Index Scan on annonces_city  (cost=0.00..43.30  
rows=1904 width=0) (actual time=0.457..0.457 rows=1575 loops=1)

 Index Cond: ((vente = true) AND (city_id = 27595))
   ->  Bitmap Index Scan on annonces_coords  (cost=0.00..33.10  
rows=640 width=0) (actual time=1.802..1.802 rows=2166 loops=1)
 Index Cond: (coords &&  
'(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)
 ->  Bitmap Index Scan on annonces_timestamp  (cost=0.00..1288.58  
rows=69375 width=0) (actual time=23.906..23.906 rows=68022 loops=1)
   Index Cond: (detect_time >= '2007-03-27  
20:46:29.187131'::timestamp without time zone)

 Total runtime: 27.669 ms
(14 lignes)

annonces=> EXPLAIN ANALYZE SELECT * FROM annonces AS a WHERE
detect_time >= '2006-03-27 20:46:29.187131+02'
AND vente
AND surface IS NOT NULL AND price IS NOT NULL
AND type_id IN  
(1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6)

AND (a.zipcode IN (69001,69002,69003,69004,69005,69006,69007,69008,69009)
OR a.city_id IN (27595)
OR a.coords &&  
'(45.74101689082,4.8371263505564),(45.75898310918,4.8628736494436)'

Re: [GENERAL] Storing blobs in PG DB

2007-04-04 Thread Listmail


You don't like filesystems ?

On Wed, 04 Apr 2007 07:44:57 +0200, Nikolay Moskvichev <[EMAIL PROTECTED]>  
wrote:



Hi All!

Question is : How suitable PG for storing about 2 000 000 binary files  
0,5-2,0 Mb size each ? It is not planned the big number of clients or a  
plenty of updatings. Like photoalbum on local host.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] UPDATE on two large datasets is very slow

2007-04-04 Thread Listmail



Any suggestions for finding out where all the time is being spent? I'm


- time spent updating indexes ? do you have a lot of them ?
- try with fsync off (not in production, but it will give you an idea)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] Storing blobs in PG DB

2007-04-05 Thread Listmail


My personal view is that in general, binary files have no place in  
databases. Filesystems are for files, databases are for data. My design  
choice is to store the files in a fileystem and use the database to hold  
metadata as well as a pointer to the file.


If you *must* put files into the database, then you can do so, and PG  
will handle that many files of those sizes with ease. For all intents  
and purposes, PG can store an unlimited number of files. You're far more  
likely to run into walls in the form of limitations in your disk I/O  
system then limitations in what PG will handle.


And you can't backup with rsync...

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[GENERAL] Sort and Limit - really nasty query and feature of the day

2007-04-05 Thread Listmail


	Today I rewrote a particularly nasty query involving a UNION ALL between  
an active table and a huge archive table, some left joins, order by and  
limit, and it went from 5 minutes to under one second ; however one query  
became 4 with some glue in between.


EXPLAIN
SELECT * FROM (
SELECT 0 AS archived, id, price, surface, coords, detect_time, type_id,  
vente, zipcode, city_id, description FROM annonces

UNION ALL
SELECT 1 AS archived, a.id, price, surface, coords, detect_time, type_id,  
vente, zipcode, city_id, description FROM archive_data a LEFT JOIN  
archive_ext d ON a.id=d.id ) AS foo

WHERE
detect_time >= '2006-10-30 16:17:45.064793'
	AND type_id IN  
(1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6)

AND vente
	AND (zipcode IN (69001,69002,69003,69004,69005,69006,69007,69008,69009)  
OR city_id IN (27595) OR coords &&  
'(45.74101689082,4.8371263505564),(45.75898310918,4.8628736494436)'::BOX)

AND surface IS NOT NULL AND price IS NOT NULL
ORDER BY price/surface
LIMIT 100;

Here is the messy explain :

 Limit  (cost=333560.35..333560.60 rows=100 width=103)
   ->  Sort  (cost=333560.35..333656.88 rows=38610 width=103)
 Sort Key: (foo.price / (foo.surface)::double precision)
 ->  Result  (cost=133.21..328438.41 rows=38610 width=103)
   ->  Append  (cost=133.21..328245.36 rows=38610 width=103)
 ->  Bitmap Heap Scan on annonces   
(cost=133.21..7520.56 rows=1426 width=190)
   Recheck Cond: ((vente AND (zipcode = ANY  
('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[])))  
OR (vente AND (city_id = 27595)) OR (coords &&  
'(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box))
   Filter: ((detect_time >= '2006-10-30  
16:17:45.064793'::timestamp without time zone) AND (type_id = ANY  
('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[]))  
AND vente AND (surface IS NOT NULL) AND (price IS NOT NULL))
   ->  BitmapOr  (cost=133.21..133.21 rows=4294  
width=0)
 ->  Bitmap Index Scan on annonces_zip   
(cost=0.00..55.91 rows=1761 width=0)
   Index Cond: ((vente = true) AND  
(zipcode = ANY  
('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[])))
 ->  Bitmap Index Scan on annonces_city   
(cost=0.00..42.85 rows=1859 width=0)
   Index Cond: ((vente = true) AND  
(city_id = 27595))
 ->  Bitmap Index Scan on annonces_coords   
(cost=0.00..33.37 rows=675 width=0)
   Index Cond: (coords &&  
'(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)
 ->  Merge Right Join  (cost=59679.03..320338.70  
rows=37184 width=182)

   Merge Cond: (d.id = a.id)
   ->  Index Scan using archive_ext_pkey on  
archive_ext d  (cost=0.00..252661.12 rows=2976314 width=119)
   ->  Sort  (cost=59679.03..59771.99 rows=37184  
width=67)

 Sort Key: a.id
 ->  Bitmap Heap Scan on archive_data a   
(cost=3951.02..56856.32 rows=37184 width=67)
   Recheck Cond: ((vente AND (zipcode  
= ANY  
('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[])))  
OR (vente AND (city_id = 27595)) OR (coords &&  
'(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box))
   Filter: ((detect_time >=  
'2006-10-30 16:17:45.064793'::timestamp without time zone) AND (type_id =  
ANY  
('{1,12,24,17,18,19,20,33,35,50,51,7,52,4,13,41,14,16,26,28,43,53,15,29,30,31,45,32,34,46,47,6}'::integer[]))  
AND vente AND (surface IS NOT NULL) AND (price IS NOT NULL))
   ->  BitmapOr   
(cost=3951.02..3951.02 rows=171699 width=0)
 ->  Bitmap Index Scan on  
archive_data_zip  (cost=0.00..1692.62 rows=80610 width=0)
   Index Cond: ((vente =  
true) AND (zipcode = ANY  
('{69001,69002,69003,69004,69005,69006,69007,69008,69009}'::integer[])))
 ->  Bitmap Index Scan on  
archive_data_city  (cost=0.00..1695.31 rows=80683 width=0)
   Index Cond: ((vente =  
true) AND (city_id = 27595))
 ->  Bitmap Index Scan on  
archive_data_coords  (cost=0.00..535.20 rows=10406 width=0)
   Index Cond: (coords &&  
'(45.75898310918,4.8628736494436),(45.74101689082,4.8371263505564)'::box)


	I didn't redo the e

Re: [GENERAL] Anyone know where I can get an 8.2.3 binary for ubuntu?

2007-04-05 Thread Listmail



On 4/5/07, Michelle Konzack <[EMAIL PROTECTED]> wrote:

Am 2007-04-01 12:05:44, schrieb Leonel:
> and once you downloaded the packages do a :
>
> apt-get build-deps  postgresql-8.1

Are you sure?  --  It should be:




You don't have the  build-dep  for  8.2 in ubuntu  dapper/ edgy


	I recently installed kubuntu and postgres 8.2.3 is available in apt-get  
(edgy backports I think).


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] SQL WHERE: many sql or large IN()

2007-04-06 Thread Listmail




I have a choice of running:

SELECT bar FROM tokens WHERE foo IN ('apple','orange','biscuit') for  
up to ~300 words


OR

SELECT bar FROM tokens WHERE foo = 'apple' up to ~300 times as a  
prepared/cached SQL statements.


	With new PG versions you can also use VALUES which will save you a hash  
if you know your keys are unique.
	Example use integers but you can use anything. Just like a normal join  
from a table.
	Putting 300 values in VALUES is certainly a LOT faster than doing 300  
individual SELECTs !


test=> EXPLAIN ANALYZE SELECT t.* FROM test t, ( VALUES (1),(2) ) AS v  
WHERE t.id=v.column1;

   QUERY PLAN

 Nested Loop  (cost=0.00..16.76 rows=2 width=8) (actual time=0.029..0.039  
rows=2 loops=1)
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4) (actual  
time=0.004..0.005 rows=2 loops=1)
   ->  Index Scan using test_pkey on test t  (cost=0.00..8.36 rows=1  
width=8) (actual time=0.013..0.015 rows=1 loops=2)

 Index Cond: (t.id = "*VALUES*".column1)
 Total runtime: 0.085 ms

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] SQL WHERE: many sql or large IN()

2007-04-06 Thread Listmail
On Fri, 06 Apr 2007 18:45:15 +0200, Markus Schiltknecht  
<[EMAIL PROTECTED]> wrote:



Hi,

tom wrote:
Initially it seems that the WHERE IN (...) approach takes a turn for  
the worse when the list gets very large.




	Since I use this a lot on webpages, I thought maybe a little benchmark is  
in order ?


CREATE TABLE test (id SERIAL PRIMARY KEY, value TEXT NOT NULL );
INSERT INTO test (value) SELECT * FROM generate_series( 1, 100 );
CREATE INDEX test_value ON test( value );
ANALYZE test;

	My script runs EXPLAIN ANALYZE 10 times and keeps the fastest timing,  
then displays it.
	Then it executes the query enough times so that 1 second is elapsed (also  
fetching the results), then prints the real runtime.


	Well, the table is clustered (since it was created from generate_series)  
so of course, bitmap index scan rules.
	Get the first 300 rows of the table using column 'value' which is an  
indexed TEXT column.

(using 'id' which is an integer is a bit faster)

SELECT * FROM test WHERE value IN <300 values>
Bitmap Heap Scan on test  (cost=1225.83..2317.08 rows=300 width=13)  
(actual time=3.736..3.807 rows=300 loops=1)

  Recheck Cond: (value = ANY ('{<300 values>}'::text[]))
  ->  Bitmap Index Scan on test_value  (cost=0.00..1225.75 rows=300  
width=0) (actual time=3.717..3.717 rows=300 loops=1)

Index Cond: (value = ANY ('{<300 values>}'::text[]))
Explain Analyze runtime: 1 x 3.896 ms = 3.896 ms
Real runtime: 1 x 6.027 ms = 6.027 ms (timed on 257 iterations)

SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE  
t.value=v.column1
Nested Loop  (cost=0.00..2447.27 rows=300 width=13) (actual  
time=0.035..4.724 rows=300 loops=1)
  ->  Values Scan on "*VALUES*"  (cost=0.00..3.75 rows=300 width=32)  
(actual time=0.002..0.219 rows=300 loops=1)
  ->  Index Scan using test_value on test t  (cost=0.00..8.13 rows=1  
width=13) (actual time=0.013..0.014 rows=1 loops=300)

Index Cond: (t.value = "*VALUES*".column1)
Explain Analyze runtime: 1 x 4.814 ms = 4.814 ms
Real runtime: 1 x 6.786 ms = 6.786 ms (timed on 208 iterations)

SELECT * FROM test WHERE value='1'
Index Scan using test_value on test  (cost=0.00..8.40 rows=1 width=13)  
(actual time=0.014..0.015 rows=1 loops=1)

  Index Cond: (value = '1'::text)
Explain Analyze runtime: 300 x 0.032 ms = 9.600 ms
Real runtime: 300 x 0.149 ms = 44.843 ms (timed on 31251 iterations)

	Now if we ask for 300 random rows out of the million in the table, which  
is a lot more likely situation...


SELECT * FROM test WHERE value IN <300 values>
Bitmap Heap Scan on test  (cost=1225.83..2317.08 rows=300 width=13)  
(actual time=4.516..4.945 rows=300 loops=1)

  Recheck Cond: (value = ANY ('{<300 values>}'::text[]))
  ->  Bitmap Index Scan on test_value  (cost=0.00..1225.75 rows=300  
width=0) (actual time=4.451..4.451 rows=300 loops=1)

Index Cond: (value = ANY ('{<300 values>}'::text[]))
Explain Analyze runtime: 1 x 5.034 ms = 5.034 ms
Real runtime: 1 x 7.278 ms = 7.278 ms (timed on 199 iterations)

SELECT t.* FROM test t, ( VALUES <300 values> ) AS v WHERE  
t.value=v.column1
Nested Loop  (cost=0.00..2447.27 rows=300 width=13) (actual  
time=0.046..5.503 rows=300 loops=1)
  ->  Values Scan on "*VALUES*"  (cost=0.00..3.75 rows=300 width=32)  
(actual time=0.001..0.200 rows=300 loops=1)
  ->  Index Scan using test_value on test t  (cost=0.00..8.13 rows=1  
width=13) (actual time=0.016..0.016 rows=1 loops=300)

Index Cond: (t.value = "*VALUES*".column1)
Explain Analyze runtime: 1 x 5.625 ms = 5.625 ms
Real runtime: 1 x 7.572 ms = 7.572 ms (timed on 178 iterations)

	Doing a query per row is a lot slower than it appears in EXPLAIN ANALYZE  
when you add all the overhead. In fact it sucks. That was ecpected of  
course. I could tell you about when I got this php/mysql website where the  
previous developer's idea of a join was a select in a PHP for() loop  
because "joins are slow". It was hell.


	I tried to query 1, 2, 10, 1000, 10k and 100k random rows (thats abuse)  
and :
	- one query per row is (obviously) the slowest unless you select only one  
row
	- IN reverts to a single index scan if it contains one value, else it  
always uses a bitmap scan
	- VALUES join uses nested loop until about 100k rows where it switches to  
hash join


	I also tried on a real table with many columns. In that case, there are  
less tuples per page, and bitmap scans are more efficient than nested  
loops, so IN wins.

So, IN() does not turn to crap anymore like it used to !

	However, if you are asking for this question because you don't want to  
use temporary tables, it seems that temp tables have been upgraded so they  
are now usable even in a webpage (well, not ALL webpages, but that big  
nasty slow webpage, you know which one I'm talking about).











---(end of br

Re: [GENERAL] SQL WHERE: many sql or large IN()

2007-04-06 Thread Listmail



I've never seen this before.
Is this PG specific or generic SQL that I've never been exposed to?


http://www.postgresql.org/docs/8.2/interactive/sql-values.html
	VALUES conforms to the SQL standard, except that LIMIT and OFFSET are  
PostgreSQL extensions.


	It doesn't seem like much at first glance, but it's really useful. It's  
your kleenex temp table. It can select the rows you want from a table in  
the order you want, update a table, replacing several values with several  
others (as a before=>after correspondence table) and many other things. It  
is more powerful than IN() because you can use columns of the VALUES() in  
any part of your query, updating, ordering, etc.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [GENERAL] programmatic way to fetch latest release for a given major.minor version

2007-04-09 Thread Listmail

Here you go.

Fetches versions and prints most recent minor for each major
Tests all mirrors for speed and prints out the 4 fastest (takes some 
time)
http://www.crummy.com/software/BeautifulSoup/

Have a nice day !

#! /bin/env python
# -*- coding: utf-8  -*-

import urllib, BeautifulSoup, re, time, sys


def get_all_versions():
	soup =  
BeautifulSoup.BeautifulSoup( urllib.urlopen( "http://ftp3.fr.postgresql.org/pub/postgresql/source/";  
).read() )

for a in soup( 'a', {'href': re.compile( r"v\d+.\d+.\d+" ) } ):
		yield map( int, re.search( r"v(\d+)\.(\d+)\.(\d+)*", a['href']  
).groups() )


def get_latest_versions():
lastversions = {}
for a,b,c in sorted( get_all_versions() ):
lastversions[ (a,b) ] = c
return sorted( lastversions.items() )

def parse_query_string( url ):
	return dict( map( urllib.unquote_plus, pair.split('=',1) ) for pair in  
re.split( "&(?:amp;|)", urllib.splitquery( url )[1] ) )


def get_mirrors():
	soup =  
BeautifulSoup.BeautifulSoup( urllib.urlopen( "http://wwwmaster.postgresql.org/download/mirrors-ftp";  
).read() )

for a in soup( 'a', {'href': re.compile( r"\?setmir=.*url=" ) } ):
yield parse_query_string( a['href'] )['url']

def get_fastest_mirrors( urls, filename ):
for url in urls:
sys.stdout.write( "%s\r" % url )
t = time.time()
try:
urllib.urlopen( url + filename )
except:
pass
d = time.time()-t
print "%.02f s" % d
yield d, url

for major, minor in get_latest_versions():
print "%d.%d.%d" % (major[0], major[1], minor)

mirrors = get_mirrors()
fastest = sorted( get_fastest_mirrors( mirrors, "sync_timestamp" ))[:4]
for d, mirror in fastest:
print "%.02f s %s" % (d,mirror)









On Tue, 10 Apr 2007 00:34:02 +0200, Andrew Hammond  
<[EMAIL PROTECTED]> wrote:



On 4/9/07, CAJ CAJ <[EMAIL PROTECTED]> wrote:

On 9 Apr 2007 14:47:20 -0700, Andrew Hammond
<[EMAIL PROTECTED]> wrote:
> I'm writing a script that wants to know the latest release for a given
> major.minor version. Is there some better way than parsing
> http://www.postgresql.org/ftp/source/ or trying to
connect to ftp
> (which is invariably timing out on me today. Is that box getting
> hammered or something?) and doing the parsing that? Both approaches
> feel quite awkward to me.

Use wget to download via  HTTP (added recently).  Probably wise to add a
couple mirrors in your script.


I'm not asking how to download stuff. I'm asking how to figure out the
current release number for a given major.minor. I thought that was
clear in my original post, but I guess not. For example, how do I
determine (programmatically) the lastest version of 8.1.

I'm also interested in a clever way to select one of the "close"
mirrors at random for downloading via http. However I had planned to
hold that question until I'd solved the first issue.

Andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] Bad plan using join on VALUES

2007-04-10 Thread Listmail


Table definition and problem query is below. I'm surprised...

caillaudangers=> \d relations
  Table « public.relations »
  Colonne   |  Type   | Modificateurs
+-+
 parent_id  | integer | not null
 child_id   | integer | not null
 klass  | integer | not null
 id | integer | not null default  
nextval('relations_id_seq'::regclass)

 sort_order | integer |
 data   | bytea   |
 tree_vis   | boolean |
 main_path  | boolean |
 index_id   | integer |
Index :
« relations_pkey » PRIMARY KEY, btree (id)
« relations_unique » UNIQUE, btree (parent_id, child_id)
« relations_child » btree (child_id)
« relations_sort » btree (parent_id, klass, sort_order) CLUSTER
« relations_tree » btree (parent_id, klass, sort_order) WHERE tree_vis  
= true

Contraintes de clés étrangères :
« klass_fk » FOREIGN KEY (klass) REFERENCES relation_klasses(id)
« relations_child_id_fkey » FOREIGN KEY (child_id) REFERENCES  
nodes(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
« relations_node_id_fkey » FOREIGN KEY (parent_id) REFERENCES  
nodes(id) ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED


caillaudangers=> CLUSTER relations; ANALYZE relations;
CLUSTER
ANALYZE


caillaudangers=> EXPLAIN ANALYZE SELECT * FROM relations WHERE child_id IN  
( [60 integers] );
  QUERY  
PLAN

---
 Bitmap Heap Scan on relations  (cost=176.93..383.43 rows=236 width=58)  
(actual time=0.298..0.482 rows=350 loops=1)

   Recheck Cond: (child_id = ANY ('  [60 integers]
   ->  Bitmap Index Scan on relations_child  (cost=0.00..176.87 rows=236  
width=0) (actual time=0.281..0.281 rows=350 loops=1)

 Index Cond: (child_id = ANY  [60 integers]
 Total runtime: 0.582 ms
(5 lignes)

OK, Screaming fast ! (and it doesn't use the CLUSTER)

caillaudangers=> explain analyze SELECT target.* FROM relations AS target,  
(VALUES (8695), (8743), (10309), (22000), (22980), (23016), (8683),  
(25092), (13369), (13377), (13375), (13371), (13373), (25126), (10629),  
(13363), (13365), (22820), (15705), (13367), (8759), (8783), (8815),  
(23018), (8781), (8765), (23012), (23036), (23046), (8803), (25226),  
(22940), (8771), (8769), (13335), (23004), (22816), (23062), (8805),  
(8727), (13379), (23064), (23032), (24976), (8807), (23024), (8787),  
(23028), (8809), (8735), (23042), (8813), (8731), (22964), (8755), (8697),  
(8749), (22974), (8733), (8715)) AS source WHERE target.child_id =  
source.column1;

QUERY PLAN
---
 Hash Join  (cost=1.50..542.68 rows=216 width=58) (actual  
time=0.395..45.402 rows=350 loops=1)

   Hash Cond: (target.child_id = "*VALUES*".column1)
   ->  Seq Scan on relations target  (cost=0.00..440.29 rows=26329  
width=58) (actual time=0.011..8.213 rows=26329 loops=1)
   ->  Hash  (cost=0.75..0.75 rows=60 width=4) (actual time=0.096..0.096  
rows=60 loops=1)
 ->  Values Scan on "*VALUES*"  (cost=0.00..0.75 rows=60 width=4)  
(actual time=0.001..0.049 rows=60 loops=1)

 Total runtime: 45.594 ms

Argh. Where did my index go ? [shoots self in foot]




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Bad plan using join on VALUES (and now on temp table too)

2007-04-10 Thread Listmail



Well, the planner probably guessed that in your case it's faster to
scan the table than to use the index (indexes are not free). Did it
choose wrong?


Yes, see the other query in my post...

id IN ( 60 values ) => 0.582 ms (bitmap scan on the index : perfect)
join with VALUES => 45.594 ms (seq scan)


If you disable the seqscan, does it get faster (set
enable_seqscan=false). Make sure you run both a few times to make sure
you're getting good results.


Yeah, everything is in cache.
With enable_seq to 0, it does a nested loop at 1.190 ms


If it turns out the planner is wrong, you need to do some tuning, in
particular random_page_cost and effective_cache_size.


	I don't want to screw my tuning (which works well for all other queries)  
because of this single one !
	I modified the website to use IN(), but I thought it might be a bug in  
the planner...


I'll repost.
I believe the planner does not consider the right plan. (see more below)

EXPLAIN ANALYZE SELECT * FROM relations WHERE child_id  
IN( 8695,8743,10309,22000,22980,23016,8683,25092,13369,13377,13375,13371,13373,25126,10629,13363,13365,22820,15705,13367,8759,8783,8815,23018,8781,8765,23012,23036,23046,8803,25226,22940,8771,8769,13335,23004,22816,23062,8805,8727,13379,23064,23032,24976,8807,23024,8787,23028,8809,8735,23042,8813,8731,22964,8755,8697,8749,22974,8733,8715  
) ;
  QUERY  
PLAN

---
 Bitmap Heap Scan on relations  (cost=176.93..383.43 rows=236 width=58)  
(actual time=0.295..0.470 rows=350 loops=1)
   Recheck Cond: (child_id = ANY  
('{8695,8743,10309,22000,22980,23016,8683,25092,13369,13377,13375,13371,13373,25126,10629,13363,13365,22820,15705,13367,8759,8783,8815,23018,8781,8765,23012,23036,23046,8803,25226,22940,8771,8769,13335,23004,22816,23062,8805,8727,13379,23064,23032,24976,8807,23024,8787,23028,8809,8735,23042,8813,8731,22964,8755,8697,8749,22974,8733,8715}'::integer[]))
   ->  Bitmap Index Scan on relations_child  (cost=0.00..176.87 rows=236  
width=0) (actual time=0.279..0.279 rows=350 loops=1)
 Index Cond: (child_id = ANY  
('{8695,8743,10309,22000,22980,23016,8683,25092,13369,13377,13375,13371,13373,25126,10629,13363,13365,22820,15705,13367,8759,8783,8815,23018,8781,8765,23012,23036,23046,8803,25226,22940,8771,8769,13335,23004,22816,23062,8805,8727,13379,23064,23032,24976,8807,23024,8787,23028,8809,8735,23042,8813,8731,22964,8755,8697,8749,22974,8733,8715}'::integer[]))

 Total runtime: 0.571 ms

explain analyze SELECT target.* FROM relations AS target, (VALUES  
(8695::INTEGER), (8743), (10309), (22000), (22980), (23016), (8683),  
(25092), (13369), (13377), (13375), (13371), (13373), (25126), (10629),  
(13363), (13365), (22820), (15705), (13367), (8759), (8783), (8815),  
(23018), (8781), (8765), (23012), (23036), (23046), (8803), (25226),  
(22940), (8771), (8769), (13335), (23004), (22816), (23062), (8805),  
(8727), (13379), (23064), (23032), (24976), (8807), (23024), (8787),  
(23028), (8809), (8735), (23042), (8813), (8731), (22964), (8755), (8697),  
(8749), (22974), (8733), (8715)) AS source WHERE target.child_id =  
source.column1;

QUERY PLAN
---
 Hash Join  (cost=1.50..542.68 rows=216 width=58) (actual  
time=0.349..44.907 rows=350 loops=1)

   Hash Cond: (target.child_id = "*VALUES*".column1)
   ->  Seq Scan on relations target  (cost=0.00..440.29 rows=26329  
width=58) (actual time=0.011..8.250 rows=26334 loops=1)
   ->  Hash  (cost=0.75..0.75 rows=60 width=4) (actual time=0.076..0.076  
rows=60 loops=1)
 ->  Values Scan on "*VALUES*"  (cost=0.00..0.75 rows=60 width=4)  
(actual time=0.001..0.035 rows=60 loops=1)

 Total runtime: 45.048 ms

SET enable_seqscan TO 0;
SET
caillaudangers=> explain analyze SELECT target.* FROM relations AS target,  
(VALUES (8695::INTEGER), (8743), (10309), (22000), (22980), (23016),  
(8683), (25092), (13369), (13377), (13375), (13371), (13373), (25126),  
(10629), (13363), (13365), (22820), (15705), (13367), (8759), (8783),  
(8815), (23018), (8781), (8765), (23012), (23036), (23046), (8803),  
(25226), (22940), (8771), (8769), (13335), (23004), (22816), (23062),  
(8805), (8727), (13379), (23064), (23032), (24976), (8807), (23024),  
(8787), (23028), (8809), (8735)

Re: [GENERAL] programmatic way to fetch latest release for a given major.minor version

2007-04-10 Thread Listmail



Yeah yeah, but terminology aside, having 2 or three digits in each
attribute is just wrong!


Terminology aside, why? The unit is "8.1" not "8" and "1". It makes no
sense to say you're on version 8, in the given context, so why should the
XML data pretend there is?

//Magnus


Just pretend that :

- version = a tuple of integers (a, b, c, ...)
- major = (a, b)
- minor = (c, ...)

Besides, that is sortable (unlike strings where 15 < 2) :

latest minor for major :
major, max(minor) where major = what you want

value="3" />


from BeautifulSoup import BeautifulSoup as Soup
s = Soup("""/>""" )



v = s.find('pgversion')
[int(x['value']) for x in v.find('major') ]

[8, 2]

[int(x['value']) for x in v.find('minor') ]

[3]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] programmatic way to fetch latest release for a given major.minor version

2007-04-10 Thread Listmail


I love Open Source XD

http://ethan.tira-thompson.com/cvslog2web/

Note that this is overkill (but it would look SEXY on the site).
	However, the original poster probably wants to know when to update his  
servers, so he won't care about CVS commits...
	If there was a RSS feed of new postgres versions I'd subscribe to it  
though.




  
  


But I can't help thinking that we should have some additional values for
release notes, download sub-URLs (to be appended to the mirror roots)  
etc.




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] The rule question before, request official documentation on the problem

2007-04-11 Thread Listmail



Rules mess with queries. For data copying/archiving kinds of tasks,
triggers are a better bet, like you suggested in your original post.


Let me put that a different way:  rules can *only* be used where data
integrity is not at stake.  My own thinking is that it might be time to
make an official recommendation that they are only safe for views.


NEW and OLD mean different things in a PL/pgSQL context and a Rules  
context.
In PL/pgSQL NEW and OLD are values, in Rules (which specifically mess  
with

queries) they are expressions.

The fact that the same words mean different things in different contexts
is a bit unfortunate but not as messy as say using "NEWEXPR" in the
Rules context would be.



	Since we now have UPDATE/INSERT/DELETE RETURNING, one could imagine the  
rules using these to access the actual rows and not the expressions...


But there is a perfectly valid argument against that :

	- There already is a mechanism designed specifically for this purpose  
(triggers).

- It works perfectly.
- Rules are supposed to rewrite queries to do stuff like views.

	It should be mentioned in the docs, though : someone with an account on  
the PG site should copypaste this mail exchange in the comments field...





---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [pgsql-www] [GENERAL] programmatic way to fetch latest release for a given major.minor version

2007-04-11 Thread Listmail




If someone wants the schema change, react *now*. Later on we can only
append to it, and not change it :)


Since I like to complain...



	Suppose you someday add another dot, or a "b" for beta, wouldn't it be  
better to have


823

... or 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Arrays with Rails?

2007-04-13 Thread Listmail
On Fri, 13 Apr 2007 10:30:29 +0200, Tino Wildenhain <[EMAIL PROTECTED]>  
wrote:



Joshua D. Drake schrieb:

Rick Schumeyer wrote:

Has anyone here used a postgres array with Rails?  If so, how?


 split()?


Err... there is no type mapping?



	You know, some languages spoil us developers, so that we for granted that  
Doing It The Right Way is indeed the way it is, and then we feel the  
burning pain of having to deal with the reality...

For instance, python and psycopg massively spoil you :

cursor.execute( "select '{1,2}'::INTEGER[] AS myarray, 1 AS myint, 'abc'  
AS mytext, %s as myarg1, %s as myarg2, %s as myarg3", (

u"this is an unicode string €éàù",
[1, 2, 3],
["this","is","an","array","of","text"]
))

for row in cursor:
for name, item in row.items():
print name, type( item ), item

Results :
	As you can see, arguments and results are auto-converted from python  
types to postgres types (you can write converters for your own types).  
This includes unicode, arrays, etc.
	The interface is clean and elegant : you provide SQL with %s in it and a  
list of arguments. You can use named or positional arguments too.


mytext  abc
myarray  [1, 2]
myint  1
myarg1  this is an unicode string €éàù
myarg2  [1, 2, 3]
myarg3  ['this', 'is', 'an', 'array', 'of', 'text']

	Some (like me) would argue that this is NORMAL and that anything inferior  
to this is murder. I believe the job of libraries and languages is to help  
the programmer, not torture him.


	Then, other languages will make you feel the pain of having to quote all  
your arguments YOURSELF and provide all results as string.

The most famous offender is PHP (this causes countless security holes).
Ruby is no better :

require "postgres"
conn = PGconn.connect("localhost", 5432, "", "", "test")
res = conn.exec("select '{1,2}'::INTEGER[] AS myarray, 1 AS myint, 'abc'  
AS mytext;")

res.result.each {|row|
row.each { |item|
puts item
puts item.class
puts "\n"
}
}

{1,2}
String

1
String

abc
String

So the answer to your question is :

- either suffer
	- or code an additional layer above the ruby postgres lib which does the  
same as the python lib above.
	I partially did this for PHP. It's a lifesaver. No more addslashes() !  
Yay !



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Arrays with Rails?

2007-04-13 Thread Listmail
On Fri, 13 Apr 2007 12:15:30 +0200, Alexander Presber  
<[EMAIL PROTECTED]> wrote:



Listmail schrieb:

Then, other languages will make you feel the pain of having to
quote all your arguments YOURSELF and provide all results as string.
The most famous offender is PHP (this causes countless security
holes).
I partially did this for PHP. It's a lifesaver. No more
addslashes() ! Yay !


What about PEAR MDB2?
http://pear.php.net/manual/en/package.database.mdb2.php

Is it any good?

Cheers, Alex


	Well, the problem with a lot of PHP libraries is that they are written by  
people who don't think.


	Python's interface for doing a query in your code is close to the ideal,  
which should be something like that :

query( "SELECT * FROM duhhh WHERE id=%s AND date < %s", id, date )
(python's API has an extra set of () and it also takes named parameters 
)

	If id is an python integer and date a python datetime object, format  
conversion is automatic.
	If they are not, first they should be, but whatever error the programmer  
makes DOES NOT make a SQL injection. At most psql will complain that you  
try to compare a date with something that is not a date, but you don't get  
hacked, since in order to put an un-quoted argument into the SQL you have  
to do it really on purpose.


	Now, I use many database queries in my web applications (so does  
everyone), therefore I consider a system that needs me to type a lot of  
crap in order to work is DUMB.


PEAR::DB2 says :

$query = 'INSERT INTO tablename (id, itemname, saved_time) VALUES ('
. $mdb2->quote($id,   'integer')   .', '
. $mdb2->quote($name, 'text')  .', '
. $mdb2->quote($time, 'timestamp') .')';
$res =& $mdb2->query($query);

As you see,
- it's a lot less compact and readable
	- it's a pain to use, so I will copypaste code, which is the GREAT EVIL  
and a good source of bugs

- you have to repeat the variable types (who cares ?)
	- if I am not well awake I may forget to type that crap because I'm sure  
the variable is an integer, why bother (but am I really sure ? => get  
hacked)


Since PHP has no type for date, a much better way of doing this would 
be :

	query( "INSERT INTO ... VALUES %s,%s,%s", $id, $name,  
DB::datetime( $timestamp ) )
	with all the basic types being quoted as they come (ie like a string  
since postgres doesn't care between 1 and '1'), and a few adapters for  
other types (like date).


	Also the ORM part of PEAR::DB2 is braindead since the field specifiers  
are not objects that you can custmize and derive...



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Temporary table in pl/pgsql

2007-04-13 Thread Listmail


OK, suppose in his function :
	- if it does not exist, he creates the temp table, with ON COMMIT DELETE  
ROWS

- if it does exists, he truncates it just to be sure

	So the next execution of the function will find the temp table, it will  
have the same OID, all is well.


Now :

BEGIN
execute the function (and creates the table)
ROLLBACK
execute the function

	The Rollback will have rolled back the table creation too. Now when he  
executes the function again, will it get the cached plan with the rolled  
back table's OID ?



On Fri, 13 Apr 2007 22:55:49 +0200, Merlin Moncure <[EMAIL PROTECTED]>  
wrote:



On 4/13/07, Raymond O'Donnell <[EMAIL PROTECTED]> wrote:

Hello again all,

I'm using a temporary table in a pl/PgSQL function, and I've read the
bit in the FAQ about using EXECUTE to force the table-creation query to
be re-evaluated each time the function is called.

However, it doesn't seem to work for me. The first time I call the
function, all is well; the second (and every subsequent) time, the
function bombs with the 'relation with OID X does not exist' error -
this seems to imply that the EXECUTE statement is getting cached along
with the rest of the function, which according to the docs ain't
supposed to happen :-(


in addition to the 'create table' stmt, all queries that touch the
table must also be dynamic.  postgresql 8.3 will have improved plan
invalidation which will (aiui) remove this requirement.

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Can Postgresql be ported to a device with no OS and simple file I/O and memory management?

2007-04-19 Thread Listmail


	If you want embedded SQL, you'll probably have only 1 user at a time so  
sqlite is a better choice.

But do you want embedded SQL ?


On Wed, 18 Apr 2007 22:48:52 +0200, Raymond Hurst <[EMAIL PROTECTED]>  
wrote:



I have a requirement of incorporating a database in a hard disk drive.

Can postgresql do the job? Any consultants out here that have done this?

If not, any suggestions?


Ray Hurst

Western Digital

20511 Lake Forest Drive

Lake Forest, CA 92630

949-672-9853








---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] tsearch2 benchmarks, Oleg gets prize

2007-04-19 Thread Listmail


tsearch2 versus mysql FULLTEXT in the context of a large forum.

	I guess you know the answer already, but it never hurts to have nice  
graphics to show your boss.


http://peufeu.free.fr/ftsbench/

	I will upload new versions with more results, and maybe other engines, as  
I can.
	If someone is really interested, it would be interesting to run this  
stuff on a real server.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] tsearch2 dictionary that indexes substrings?

2007-04-20 Thread Listmail


You want trigram based search.
ie.

postgresql -> 'pos', 'ost', 'stg', 'tgr', 'gre', 'res', 'esq', 'sql'

	searching for 'gresq' is searching for  'gre' and 'res' and 'esq' which  
is good friends with bitmap scan. Then a little LIKE '%gresq%' to filter  
the results.


	PS : indexing all substring means for long words you get huge number of  
lexems...




On Fri, 20 Apr 2007 11:06:26 +0200, Tilmann Singer <[EMAIL PROTECTED]>  
wrote:



Hi,

In this http://archive.netbsd.se/?ml=pgsql-hackers&a=2006-10&t=2475028
thread Teodor Sigaev describes a way to make tsearch2 index substrings
of words:


Brain storm method:

Develop a dictionary which returns all substring for lexeme, for
example for word foobar it will be 'foobar fooba foob foo fo oobar
ooba oob oo obar oba ob bar ba ar'


Did anyone do that already?

If I understand it correctly such a dictionary would require to write
a custom C component - is that correct? Or could I get away with
writing a plpgsql function that does the above and hooking that
somehow into the tsearch2 config?


thanks, Til

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] tsearch2 dictionary that indexes substrings?

2007-04-20 Thread Listmail



I'm aware of that and in my case I don't think it will be a
problem. It is for a type-ahead search web interface so actually it
only requires indexing all possible substrings starting from char 1,
ie. p, po, pos, post, postg, postgr, postgre, postgres, postgresq,
postgresql.


	If you want to provide autocompletion, you could build a list of unique  
lexemes from inserted posts, and use a btree index and LIKE...


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[GENERAL] Bitmap Scan Pages

2007-04-24 Thread Listmail


Hello,

Is there a way to know how many pages were hit by a Bitmap Heap scan ?

For instance :

Bitmap Heap Scan on posts
(cost=56.71..295.24 rows=2123 width=67)
(actual time=0.575..1.462 rows=2160 loops=1)

	I'd like to know if it hit 2160 pages (ie I should really run CLUSTER) or  
50 pages (ie. my table is well clustered, everything cool).
	Since, when it's cached, it's so fast anyway I cant' tell the difference,  
but when it's not cached, it is important.


This would give an interesting probe for EXPLAIN ANALYZE tuning...

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] query from a list of ids

2007-04-25 Thread Listmail




name| ids
-
Peter| 2, 3, 4, 5
Jack| 100, 34, 3

Both name and ids are in text format.


	IF you really do not want to use a link table (user_id, flag_id), you  
could use an array of ints instead of a string...


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Vacuum-full very slow

2007-04-25 Thread Listmail



I don't see a way to remove the old index entries before inserting new
ones without creating a window where the index and table will be
inconsistent if vacuum fails.


VACUUM FULL is slow because it plays with the indexes...
CLUSTER is slow because it has to order the rows...

	Maybe, drop all the indexes, VACUUM FULL only the table, then recreate  
all the indexes ?

If vacuum fails, the index drop would be rolled back.

By the way, about indexes :

	When you have a small table (say, for a website, maybe a few tens of  
megabytes max...) reindexing it takes just a few seconds, maybe 10-20  
seconds.
	It could be interesting, performance-wise, to tell postgres not to bother  
about crash-survivability of indexes on this table. Like temporary tables.  
Write nothing to WAL. If it crashes, on recovery, postgres would reindex  
the table.
	btree indexing is so fast on postgres that I'd definitely use this  
feature.
	I'd rather trade a minute of recovery versus less disk IO for index  
update.


	You could even do that for whole tables (like, web sessions table) which  
hold "perishable" data...



CLUSTER avoids all this thrashing by recopying the whole table, but
of course that has peak space requirements approximately twice the
table size (and is probably not a win anyway unless most of the table
rows need to be moved).  You pays your money, you takes your choice.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Query in function not using index...

2007-04-27 Thread Listmail




Any idea why using a variable v_streetName instead of a string 'wynn%'
behaves differently?


Yeah.
'wynn%' doesn't start with % so LIKE uses index.
	But Postgres doesn't know that you know that the contents of this  
variable never starts with '%'...





Thanks,

John





---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Query in function not using index...

2007-04-28 Thread Listmail



Tom,
  We were looking at the explain results and noticed that it was  
converting

the 'wynn%' into fname >= 'wynn' and fname < 'wyno'


Does this also work if the last character is a unicode character ?

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [GENERAL] Temporal Units

2007-04-29 Thread Listmail
On Sun, 29 Apr 2007 22:33:37 +0200, Rich Shepard  
<[EMAIL PROTECTED]> wrote:



On Sun, 29 Apr 2007, Martijn van Oosterhout wrote:

Then I'm afraid you havn't indicated your requirements properly. All I  
can

see is that the interval type does exactly what you want. It can store
days, weeks, months or any combination thereof. You can multiply them  
and

add them to dates and all works as expected.


   How does one define 'shift' with intervals? 0.33 DAY?


Yeah, that's the problem.

	An Interval in the mathematical sense is a range (say, [1...2], or  
[monday 10AM ... thursday 10PM]) which means two known endpoints. INTERVAL  
in postgres is simply a convenient way to express a time difference in a  
way which is very helpful to solve practical problems (ie. today + '1  
month' behaves as expected whatever the current month), but which might  
not be suited to your problem.


	I see your shifts as ranges expressed over a modulo something set :  
example, morning shift, day shift, night shift, are [ begin hour .. end  
hour ] modulo 24 hour, since they repeat every day. Work days are [monday  
.. friday] modulo 7 days.


	Postgres intervals can't express this, since they have no fixed beginning  
or end points, they are simply differences.


	So if you want to know how many times a thing has been monitored each  
month, maybe count(*) GROUP BY EXTRACT( month FROM monitoring_time ); same  
thing for week and weekdays, and more funky formulations will be needed  
for shifts...


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] PG Books

2007-05-01 Thread Listmail




loaded by a single application and used by our customers with MS Access,


	From by previous job where we had an Access based backoffice application,  
you might want to learn enough to be ready to switch to something better  
(ie postgres) when you feel the need.
	Access is a booby trap setup by Microsoft to sell SQL server. It will  
work well up to a certain point, then die miserably (some of our web pages  
were taking minutes to load !).
	When this happens many people complain and you usually buy SQL server to  
ease the pain.

Or, as we did, you ditch the MS stuff and switch to other tools.
I wasn't in charge of this, so can't say more, but beware of Access.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org/


Re: [GENERAL] Have I b0rked something? Slow comparisons on "where x in (...)"

2007-05-03 Thread Listmail


Try creating a temporary table, populating with the list and joining  
against it. That's probably your best bet for a long list of target  
values.


Check :

forum_bench=> CREATE TABLE test (value INTEGER NOT NULL);
CREATE TABLE
forum_bench=> INSERT INTO test SELECT * FROM generate_series( 1, 100 );
INSERT 0 100
forum_bench=> ANALYZE test;
forum_bench=> EXPLAIN ANALYZE SELECT * FROM test;
  QUERY PLAN
---
 Seq Scan on test  (cost=0.00..14405.24 rows=24 width=4) (actual  
time=0.030..349.699 rows=100 loops=1)

 Total runtime: 542.914 ms
(2 lignes)


OK : 542 ms to grab the data.
IN() :

EXPLAIN ANALYZE SELECT * FROM test WHERE value IN ( 1000 values from 0 to  
999000 in steps of 1000 ):
 Seq Scan on test  (cost=0.00..1264310.24 rows=1000 width=4) (actual  
time=17.649..17977.085 rows=999 loops=1)

   Filter: (value = ANY ('{0,1000..99000}'::integer[]))
 Total runtime: 17978.061 ms

Ouch.

forum_bench=> EXPLAIN ANALYZE SELECT * FROM test WHERE value IN (VALUES  
(0),(1000),(2000),(998000),(999000));

   QUERY PLAN

 Hash Join  (cost=19.50..18176.45 rows=200 width=4) (actual  
time=2.823..736.960 rows=999 loops=1)

   Hash Cond: (test.value = "*VALUES*".column1)
   ->  Seq Scan on test  (cost=0.00..14405.24 rows=24 width=4) (actual  
time=0.032..335.680 rows=100 loops=1)
   ->  Hash  (cost=17.00..17.00 rows=200 width=4) (actual  
time=2.108..2.108 rows=1000 loops=1)
 ->  HashAggregate  (cost=15.00..17.00 rows=200 width=4) (actual  
time=1.165..1.542 rows=1000 loops=1)
   ->  Values Scan on "*VALUES*"  (cost=0.00..12.50 rows=1000  
width=4) (actual time=0.004..0.478 rows=1000 loops=1)

 Total runtime: 737.362 ms

	Removing the 542 ms to read the table, we see checking if the values are  
in the hash is really rally fast.


So, obvious truth : hash is faster than dumb compare. Much faster.
Now, postgres should do this on its own, I think.

	PS : if the 1000 values are all the same (1000 times 1), IN() doesn't  
detect it, so the runtime does not change. Hash join doesn't care, so the  
runtime doesn't change either.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [GENERAL] Have I b0rked something? Slow comparisons on "where x in (...)"

2007-05-03 Thread Listmail


Followup to my previous test, with an index this time

EXPLAIN ANALYZE SELECT * FROM test WHERE value IN ( 1000 integers )

 Bitmap Heap Scan on test  (cost=3519.09..7156.83 rows=1000 width=4)  
(actual time=5.843..8.897 rows=999 loops=1)

   Recheck Cond: (value = ANY ('{0,...,999000}'::integer[]))
   ->  Bitmap Index Scan on testindex  (cost=0.00..3518.84 rows=1000  
width=0) (actual time=5.594..5.594 rows=999 loops=1)

 Index Cond: (value = ANY ('{0,...,999000}'::integer[]))
 Total runtime: 9.157 ms

EXPLAIN ANALYZE SELECT * FROM test WHERE value IN (VALUES  
(0),(1000),...(999000))


 Nested Loop  (cost=15.00..1461.74 rows=200 width=4) (actual  
time=1.191..26.127 rows=999 loops=1)
   ->  HashAggregate  (cost=15.00..17.00 rows=200 width=4) (actual  
time=1.169..1.673 rows=1000 loops=1)
 ->  Values Scan on "*VALUES*"  (cost=0.00..12.50 rows=1000  
width=4) (actual time=0.007..0.517 rows=1000 loops=1)
   ->  Index Scan using testindex on test  (cost=0.00..7.21 rows=1  
width=4) (actual time=0.023..0.023 rows=1 loops=1000)

 Index Cond: (test.value = "*VALUES*".column1)
 Total runtime: 26.411 ms

Mixing the two would be a win :

- hashing the values
- making a bitmap from them
- grabbing the pages and using the hash in "Recheck Cond"

ie. something like that :

   ->  HashAggregate  (cost=15.00..17.00 rows=200 width=4) (actual  
time=1.169..1.673 rows=1000 loops=1)
 ->  Values Scan on "*VALUES*"  (cost=0.00..12.50 rows=1000  
width=4) (actual time=0.007..0.517 rows=1000 loops=1)
 Bitmap Heap Scan on test  (cost=3519.09..7156.83 rows=1000 width=4)  
(actual time=5.843..8.897 rows=999 loops=1)

   Recheck Cond: (value in hash)
   ->  Bitmap Index Scan on testindex  (cost=0.00..3518.84 rows=1000  
width=0) (actual time=5.594..5.594 rows=999 loops=1)

 Index Cond: (value in hash)



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Indexing questions: Index == key? And index vs substring - how successful?

2007-05-03 Thread Listmail


 1. Does an indexed column on a table have to be a potential primary  
key?


Nope, create as many index as you need/must/should.

I've been working with a couple of rather large tables where a common  
select is on a foreign key called 'cntrct_id' (Varchar(9) in format).   
However, the same 'cntrct_id' can appear on multiple records in the  
tables I'm trying to work with now; the tables themselves record events  
associated with the given 'cntrct_id' record and can store many events  
for one 'cntrct_id' value.  I'd thought that creating an index on the  
table.cntrct_id field for the event tables would allow me to speed up  
the transations some, but comparisons of time before and after the  
indexing lead me to wonder if I was mistaken in this.  The times were  
almost identical in the following areas: Before Indexing, after Indexing  
but before Analyzing, and after Analyzing.
 2. Another common sort on these fields uses part, not all, of the  
'cntrct_id' value to search for things; the first character marks  
original location in an internal framework we're using, for example, and  
the third character marks the month of the year that the original  
'cntrct_id' record was set up.  Sorts on either of those are fairly  
common as well; would indexing on the cntrct_id as a whole be able to  
speed up a sort on a portion of it?


Nope.
This looks like suboptimal schema design...
	If you had an indexed date column, you would be able to make fast indexed  
queries with BETWEEN, >=, <=, etc.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [GENERAL] Have I b0rked something? Slow comparisons on "where x in (...)"

2007-05-04 Thread Listmail



	I used VALUES as a replacement for the temporary table since for this  
application, it is a lot more useful.


The point is :

SELECT * FROM table WHERE value IN ( 1000 integers ) : does 1000  
comparisons for each row
SELECT * FROM table WHERE value IN ( VALUES (1000 integerss) ) : builds a  
Hash with the 1000 values and uses it to test rows, which is a lot faster  
if you have many values to compare with.


	The first one is faster if the number of values in the IN() is small. The  
second one is faster if the number of values in the IN() is large.


EXPLAIN ANALYZE SELECT * FROM table JOIN test ON (table.column =  
test.value)


	It wouldn't give the same result : both queries above remove duplicates,  
this one does not.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq