Re: [GENERAL] Checking is TSearch2 query is valid
That is helpful but these functions to do help me detect errors in queries such as (moose frog where the left parentheses is missing. I may just have to write a lexical analyzer. Benjamin On Sep 8, 2007, at 10:45 PM, Oleg Bartunov wrote: There are two useful functions - numnode() and querytree() More details in http://www.sai.msu.su/~megera/postgres/fts/doc/fts- query.html Oleg On Sat, 8 Sep 2007, Benjamin Arai wrote: Is there a way to pass a query to PostgreSQL to check if the TSeasrch2 search text is valid? For example, SELECT to_tsquery('default', '!'); returns an error. I want to know if there is a way get true/ false for the '!' portion of the query? Benjamin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] replacing Access/ Approach etc
Shelby Cain wrote on 08.09.2007 20:57: Compared to that, I don't really understand follow your argument as to why installing Postgresql as a service and stopping/starting it through the service control panel such a big deal. Or stopping/starting using a batch file (with net start pgsql) Thomas ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] work hour calculations
2007/9/9, novice [EMAIL PROTECTED]: The result I'm expecting for the above to be notification_time| finished_time | actual ++- 2007-07-06 15:50:00+10 | 2007-07-09 07:10:00+10 | 01:20:00 2007-07-07 12:30:00+10 | 2007-07-09 07:20:00+10 | 00:20:00 create table log ( id integer PRIMARY KEY, notification_time timestamp with time zone, finished_time timestamp with time zone ); INSERT INTO log values (1, '2007-07-06 15:50', '2007-07-09 07:10'); INSERT INTO log values (2, '2007-07-07 12:30', '2007-07-09 07:20'); SELECT notification_time, finished_time, sum(finished_time - notification_time) as actual FROM log GROUP BY notification_time, finished_time; OK. so I have bad news for you: with such structure, you will have to write some function to calculate work time spent for each task. general algorithm would be similar to * take the notification time * take the finished_time * set pointer := notif. time * set actual := 0 * while there is any weekend between the pointer and finished_time, do the following ** set actual := actual + ( weekend begin - pointer ) ** move pointer to the next monday morning * set actual := actual + ( finished_time - pointer ) BUT: this is ugly. do you always assume that people are doing ONLY one task at a time? maybe think of representing work sheets in the database? maybe think of adding work_time field to your table (why not trust people, they know best) maybe the application you use for entering data could give some hint basing on above algo. good luck, -- Filip Rembiałkowski ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] SQL for Deleting all duplicate entries
Merlin, Its just about three columns - not any column. Two columns are varchars and the third is a date. The date column value is NULL for the rows for which I want to delete the duplicates. Yes, please, be a bit more specific! /regards, Håkan Jacobsson Ursprungligt meddelande Från: [EMAIL PROTECTED] Datum: 06-09-2007 01:56 Till: Håkan Jacobsson[EMAIL PROTECTED] Kopia: pgsql-general@postgresql.org Ärende: Re: [GENERAL] SQL for Deleting all duplicate entries On 9/5/07, Håkan Jacobsson [EMAIL PROTECTED] wrote: Hi, I want to create a DELETE statement which deletes duplicates in a table. That is, I want to remove all rows - but one - having three columns with the same data (more columns exist and there the data varies). For example: column1 column2 column3 column4 column5 column2 = 'test', column3 = 'hey' and column4 IS NULL for several rows in the table. I want to keep just one of those rows. Is this possible? I can't figure it out, so any help MUCH appreciated! when removing duplicates, I find it is usually better to look at this problem backwards...you want to select out the data you want to keep, truncate the original table, and insert select the data back in. What isn't exactly clear from your question is if you are interested in only particular fields or if you want to throw out based on any columns (nut just 2, 3, and 4). If so, this is a highly irregular (and interesting) problem, and should prove difficult to make efficient. If you are only interested in three particular columns, then it's easy. 1. select out data you want to keep using create table scratch SELECT DISTINCT ON or GROUP BY into scratch 2. truncate main table 3. insert into main select * from scratch for a more specific answer, you will have to provide some more detail, especially regarding exactly how you determine two rows as being 'duplicates'. merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] SQL for Deleting all duplicate entries
On 9/9/07, Håkan Jacobsson [EMAIL PROTECTED] wrote: Merlin, Its just about three columns - not any column. Two columns are varchars and the third is a date. The date column value is NULL for the rows for which I want to delete the duplicates. getting ready to go on vacation :). The idea is you want to write a query that pulls out the data you want to keep. If you have a table with 6 fields, f1 though f6 and you only want one record with identical values of f1, f2, f3, you might do: begin; create temp table scratch as select f1, f2, f3, max(f4), max(f5), max(f6) from foo group by f1, f2, f3; truncate foo; insert into foo select * from scratch; commit; You can replace max() with any suitable aggregate you deem gets you the best data out of the record. If you are feeling really clever, you can write a custom aggregate for the record type (it's easier than you think!) merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Checking is TSearch2 query is valid
Benjamin Arai [EMAIL PROTECTED] writes: Is there a way to pass a query to PostgreSQL to check if the TSeasrch2 search text is valid? For example, SELECT to_tsquery('default', '!'); returns an error. I want to know if there is a way get true/false for the '!' portion of the query? The generic solution to this type of problem is to write a function that tries to do whatever-it-is-that-throws-an-error inside a plpgsql BEGIN/EXCEPTION block, and catch the errors you are expecting. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [GENERAL] Scalability Design Questions
OK, this has been very informative and I'd like to thank the three of you. Asynchronous replication to readonly slaves is something I will look into. I've never touched posgtres replication; and Scott mentioned that he was not familiar with PGCluster, so there must be some other replication system he's referencing, maybe Slony-I? Trevor Talbot-2 wrote: On 9/8/07, novnov [EMAIL PROTECTED] wrote: But basically, it seems that the answer to one of my questions is that there is currently no way with postgres to spread a single database over multiple servers, ala a loadbalanced apache cluster, where requests are forwarded to different boxes. Actually, that's essentially the same thing. Whether it's the front end or middleware, something splits the requests apart before they're processed. The asynchronous replication to readonly slaves Scott mentioned earlier would be roughly equivalent to having several identical apache boxes that have their own local copies of files that you periodically rsync/ftp/whatever to them from a single place. Partitioning data would be roughly equivalent to having one apache box for images, one for ads, etc. From what I've seen people mention of RAC, it provides strong guarantees about server consistency -- all of them have the changes or none of them do -- but you need to go to great effort to achieve the same thing on a set of apache boxes too. I mean, you don't have each box accepting file uploads via the web and assume the others will magically see the same file at exactly the same time, right? Unless, of course, you're using them purely for CPU reasons and have a single shared storage pool. Whatever is splitting the requests may do it on a session level too, which makes it easier for the backend clusters. E.g. if a given user always hits a given apache box, that file upload situation isn't a problem as long as you can rsync faster than the sessions time out. Often you need to load balance this way anyway if you have a web app using an internal notion of sessions -- session data isn't replicated to other apache boxes. (If you need it to be replicated, you're already in special design territory, not just easy load balancing.) It all varies depending on the details of what you're doing. Even that seemingly straightforward question isn't specific enough :( ---(end of broadcast)--- TIP 6: explain analyze is your friend -- View this message in context: http://www.nabble.com/Scalability-Design-Questions-tf4406693.html#a12580273 Sent from the PostgreSQL - general mailing list archive at Nabble.com. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] Query with like is really slow
Gregory Stark wrote: Christian Schrder [EMAIL PROTECTED] writes: ... - Seq Scan on table2 (cost=0.00..186.64 rows=2 width=4) (actual time=0.052..2.259 rows=42 loops=1) Filter: (c ~~ '1131%'::text) ... - Seq Scan on table2 (cost=0.00..200.89 rows=14 width=4) (actual time=0.084..3.419 rows=42 loops=1) Filter: ("substring"((c)::text, 1, 4) = 1131'::text) ... My question is: Why do I have to optimize my query (use "substring" instead of "like") instead of having the database do this for me? Or is there a difference between both queries which I cannot see? The only difference is that the optimizer understands LIKE better than it does substring and so it tries harder to come up with a good estimate of how many rows will match. In this case it seems its estimate is actually better (by pure luck) with the substring() call. But it's still not very good. Have these tables been analyzed recently? If so try raising the statistics target on the "c" column. If the number of rows estimated goes up from 2 to the 14 it's estimating with substring() then you'll get the better plan. Hopefully it would be even better than that though. Yes, all tables are "vacuum analyzed" twice per day. (I did not have time to configure the auto-vacuum feature.) But after increasing the statistics target of the column to 20 and re-analyzing the table the query planner chose the better plan and the query got sped up dramatically. You seem to have found the problem! I have now increased the default statistics target from 10 to 20 and the statistics target of this column to 500. We have about 190 distinct values in this column, so with a statistics target of 500 the statistics should be as exact as possible. (At least if I have understood well what this parameter means.) Since we have many queries that rely on this column to me it seems to be a good idea to have best statistics about it. I cannot see any disadvantage of this approach, at least if I do it only for one single column. Or do I overlook anything? And last question: I do not really understand the first query plan. The actual time for the outer nested loop is 532673.631 ms. As far as I have understood the docs this includes the child nodes. But I cannot find the time-consuming child node. I only see two child nodes: The inner nested loop (which took 31.692 ms) and the index scan (which took 243.643 ms). Or do I have to multiply the 243.643 ms with 1627 (number of loops)? But even then I get 396407.161 ms, which is still far away from the 532673.631 ms in the parent node. The nested loop still has to do some work. Actually it's quite possible that that extra overhead in nested loop is largely gettimeofday() calls for the explain analyze. Does the query take less time run without explain analyze than it does run with it? You seem to be right with your assumption that most of the extra time is spent in the gettimeofday() calls: Without "explain analyze" the query took about 6 minutes which is close to 380 seconds that I get from multiplying the number of loops (1627) with the actual time per loop (234.643 ms). Many thanks for your very helpful explanations! Regards, Christian -- Deriva GmbH Tel.: +49 551 489500-42 Financial IT and Consulting Fax: +49 551 489500-91 Hans-Bckler-Strae 2 http://www.deriva.de D-37079 Gttingen Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer
Re: [GENERAL] Getting result from EXECUTE
On 9/8/07, Robert Fitzpatrick [EMAIL PROTECTED] wrote: I have a trigger function that I want to apply to several tables, hence my use of TG_RELNAME. I just want the record to get inserted if an UPDATE comes from my view rule if the record for the client doesn't already exist. This is what I have, but I'm finding the FOUND is not returned for EXECUTE. How can I accomplish what I need? CREATE OR REPLACE FUNCTION public.create_fldclientnumber_trigg_func () RETURNS trigger AS' begin EXECUTE ''SELECT * FROM '' || TG_RELNAME || '' WHERE fldclientnumber = '' || NEW.fldclientnumber; IF NOT FOUND THEN EXECUTE ''INSERT INTO '' || TG_RELNAME || '' (fldclientnumber) VALUES ('' || NEW.fldclientnumber || '')''; END IF; RETURN NEW; end; 'LANGUAGE 'plpgsql' VOLATILE CALLED ON NULL INPUT SECURITY INVOKER; Thanks for the help. -- Robert ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster AFAIK the FOUND variable does not get updated as part of an EXECUTE command. Consider using a strict INTO clause alongwith EXECUTE. This way a NO_DATA_FOUND exception will be generated if your query did'nt return any data. Something like this: DECLARE v_rec record; BEGIN EXECUTE ''SELECT * FROM '' || TG_RELNAME || '' WHERE fldclientnumber = '' || NEW.fldclientnumber INTO STRICT v_rec; EXCEPTION when no_data_found then /* do something */ regards, -- Sibte Abbas
Re: [GENERAL] SQL for Deleting all duplicate entries
Thanx Merlin, have a nice one (vacation)! It turns out I have'nt described the problem accurately=( Data may actually differ in two of the columns (the varchar columns). I still want to remove rows which share the same data in those two columns and have the date column set to NULL. I.e. row 1,2,3 have: column1 = 'foo', column2 = 'hey' and the date column = NULL row 4,5,6 have: column1 = 'brat', column2 = 'yo' and the date column = NULL I want to keep just one of the 1 - 3 rows and one of the 4 - 6 rows.. I will try Merlins and Scotts solutions tomorrow. Anyone know if I need to modify Merlins and/or Scotts solutions to solve this new situation? /best regards, Håkan Ursprungligt meddelande Från: [EMAIL PROTECTED] Datum: 09-09-2007 15:42 Till: Håkan Jacobsson[EMAIL PROTECTED] Kopia: pgsql-general@postgresql.org Ärende: Re: [GENERAL] SQL for Deleting all duplicate entries On 9/9/07, Håkan Jacobsson [EMAIL PROTECTED] wrote: Merlin, Its just about three columns - not any column. Two columns are varchars and the third is a date. The date column value is NULL for the rows for which I want to delete the duplicates. getting ready to go on vacation :). The idea is you want to write a query that pulls out the data you want to keep. If you have a table with 6 fields, f1 though f6 and you only want one record with identical values of f1, f2, f3, you might do: begin; create temp table scratch as select f1, f2, f3, max(f4), max(f5), max(f6) from foo group by f1, f2, f3; truncate foo; insert into foo select * from scratch; commit; You can replace max() with any suitable aggregate you deem gets you the best data out of the record. If you are feeling really clever, you can write a custom aggregate for the record type (it's easier than you think!) merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] replacing Access/ Approach etc
Run it as a service. This is the best way. -- ___ Andrew J. P. Maclean Centre for Autonomous Systems The Rose Street Building J04 The University of Sydney 2006 NSW AUSTRALIA Ph: +61 2 9351 3283 Fax: +61 2 9351 7474 URL: http://www.acfr.usyd.edu.au/ ___
Re: [GENERAL] SQL for Deleting all duplicate entries
Håkan Jacobsson wrote: Thanx Merlin, have a nice one (vacation)! It turns out I have'nt described the problem accurately=( Data may actually differ in two of the columns (the varchar columns). I still want to remove rows which share the same data in those two columns and have the date column set to NULL. I.e. row 1,2,3 have: column1 = 'foo', column2 = 'hey' and the date column = NULL row 4,5,6 have: column1 = 'brat', column2 = 'yo' and the date column = NULL I want to keep just one of the 1 - 3 rows and one of the 4 - 6 rows.. I will try Merlins and Scotts solutions tomorrow. Anyone know if I need to modify Merlins and/or Scotts solutions to solve this new situation? If i understand correctly, this should give you the records you want to keep: SELECT DISTINCT ON (t.one, t.two) t.one, t.two, t.three, [t.n] FROM foo AS t; Put those into a tmp table, truncate the original, then put the saved rows back in. brian ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
[GENERAL] Time Zone design issues
Time zones are a new issue for me. I have read around a bit and learned some. I have a bunch of questions still because I've not found a really good overview of how all of the factors tie in together. At this time my app will be hosted on a server at a single location. Users will be updating and viewing from various time zones. I would like to present the datetime of the last update to each user according to the time zone that they've entered into their profile (stored in the same app db). I'm going to lay out my notions, some may be 'wrong' or too simplistic. It'd be excellent if anyone knows of a writeup on this. I see if a field is type timestamptz, the update is tagged with the server's tz (pst) example 2007-09-09 20:00:17.906-07. Very cool. I expect the simplest route is to record all updates according to a tz that does not vary, UTC or GMT. But it doesn't particularly make sense to set the server's clock to UTC. It'd seem to skew a lot of server functionality which I'd think should normally be geared around local time. So I'd guess that the route to take is to keep the server pegged to local time, and use a function that derives UTC from now(), tags the 'last modified' fields with that value. On the application level, when a user views the 'last updated' info, the app could to adjust the UTC timestamp so that the time data is adjusted from UTC. So far I make sense to myself, but would like feedback, I may have too simple a view or just missed something. What I am not so sure of is how to coordinate the timezones. It seems like a complex subject. Most of the timezone data lists include all timezones, with all dst variations. When one of my app's users is setting the time zone that applies to them, are they supposed to pick the current time (PDT) or the base zone they live in (Pacific USA or somesuch). I'd have thought the second choice. So, they're chosing their physical location w/time zone, region or whatever it's called. The details of this are daunting...postgresql comes with a lot of timezone data, in windows for example ...\postgresql\share\timezonesets\America.txt And some binary files that I don't get at all like ...\postgresql\share\timezone\US\Pacific OK, maybe I can work out what to do with those. But it'd be some real work, and it must have been done many times before. So I'm hoping someone has a logical, standard way of handling all of this with postgres up on the web that I can pore over and adjust if needed. I am only going to need North America for the first year or so, I expect, and could fake my way through this pretty well. I'd much rather start with a fairly mature scheme in place so I don't need to redo in later on. I will understand if this topic is too big of a headache to get into here...just hoping I get lucky g. -- View this message in context: http://www.nabble.com/Time-Zone-design-issues-tf4411984.html#a12586169 Sent from the PostgreSQL - general mailing list archive at Nabble.com. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Time Zone design issues
novnov [EMAIL PROTECTED] writes: At this time my app will be hosted on a server at a single location. Users will be updating and viewing from various time zones. I would like to present the datetime of the last update to each user according to the time zone that they've entered into their profile (stored in the same app db). This is trivial. The column should be timestamptz, and you set the PG timezone parameter to the user's preferred zone within each user's session. Under the hood, the data stored in the timestamptz column is really UTC, and conversion to and from the user's preferred zone happens automatically when the value is input or output. You should absolutely not try to fake this by skewing the server's clock, nor by doing manual timezone conversions --- any such hack will lead to great pain. What I am not so sure of is how to coordinate the timezones. It seems like a complex subject. Most of the timezone data lists include all timezones, with all dst variations. Yeah, there are a lot of 'em :-( Feel free to filter the set of timezone names you make visible to your users, if you have a good idea which ones are really likely to be interesting to them. There was some discussion of that a couple weeks ago IIRC. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Time Zone design issues
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 09/09/07 22:29, novnov wrote: [snip] But it doesn't particularly make sense to set the server's clock to UTC. It'd seem to skew a lot of server functionality which I'd think should normally be geared around local time. So I'd guess that the route to take is to keep the server pegged to local time, and use a function that derives UTC from now(), tags the 'last modified' fields with that value. Your single-user Windows mindset is shining brightly. Unix servers have had their internal clocks set to UTC for a decade or more, and there have been no noticeable ill effects, since apps all know to adjust for TZ. - -- Ron Johnson, Jr. Jefferson LA USA Give a man a fish, and he eats for a day. Hit him with a fish, and he goes away for good! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFG5MfXS9HxQb37XmcRAuf5AKDKm9h0AxznSTJ0fJx7KzVqFDblYACfeSUV Lub89IZdWSIfvGhUZde/jG0= =3+7a -END PGP SIGNATURE- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] Checking is TSearch2 query is valid
Ok, this appears to have worked but I have to check for exception code OTHERS because I could not figure out what the actual code being thrown was. Is there a specific exception code for: ERROR: no operand in tsearch query: ( Thanks for the help! Benjamin On Sep 9, 2007, at 7:54 AM, Tom Lane wrote: Benjamin Arai [EMAIL PROTECTED] writes: Is there a way to pass a query to PostgreSQL to check if the TSeasrch2 search text is valid? For example, SELECT to_tsquery('default', '!'); returns an error. I want to know if there is a way get true/false for the '!' portion of the query? The generic solution to this type of problem is to write a function that tries to do whatever-it-is-that-throws-an-error inside a plpgsql BEGIN/EXCEPTION block, and catch the errors you are expecting. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] Connection pooling
On Sat, 08 Sep 2007 19:28:52 +0400, Scott Marlowe [EMAIL PROTECTED] wrote: On 9/7/07, Max Zorloff [EMAIL PROTECTED] wrote: On Fri, 07 Sep 2007 10:58:36 +0400, Marko Kreen [EMAIL PROTECTED] wrote: The pgpool (I tried 3.1, 3.4 and pgpool-II 1.2) works fine but has the following problem - after some time it just hangs, and if I try to connect to it with psql it just hangs indefinitely. After restart it works fine again. I turned off persistent connections in php so it's not that. Does anybody have the same problem? All symptoms point to the same problem - your app fails to release server connections for reuse. If the problem is that PHP fails to disconnect connection, although the transaction is finished, you could run pgbouncer in more relaxed mode - pool_mode=transaction. Also setting client_idle_timeout to something may help debugging. If the problem is uncommitted transactions, you could set query_timeout to some small number (1-3) to see where errors appear. Both timeouts are not something I would put into productions config, so the code should be fixed still... pgbouncer does not have this problem, only pgpool does. pgbouncer has the problem of being very slow. i thought php released connections at the end of script? and also if i had this problem pgpool would hang in a few seconds because the server has some load. It does if you're running without pg_pconnect (i.e. using regular pg_connects) and if the script doesn't crash the apache/php backend it's running in. Are you using pg_pconnect and / or having crashing apache backends? I specifically turn off pconnects and I don't think I crash backends, never saw mistakes of that type. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/