[GENERAL] Multithreaded queue in PgSQL
Hi all. This may be trivial, but I cannot find good references for it. The problem is this: Suppose we have one table in PgSQL which is a job queue, each row represents one job with several status flags, IDs,... Several processes will attempt to access the queue and take their batch of jobs, the batch will have some parameterizable size. So, the simple idea is select N lowest IDs that do not have a flag in process set and set the flag, then proceed with whatever it is that should be done. Trouble is, with MVCC I don't see a way to prevent overlapping and race conditions. Oh, sure, if I issue select for update, it will lock rows, but, if I understand correctly, the change may not be instantaneous and atomic, so I might get transaction to roll back and then there is error handling that will lead to the uglies serialization I can think of. Let me clarify this, so somebody can tell me if I got it wrong. Imagine Queue table with 20 rows, ID: 1,...,20, status=new. Imagine 2 processes/threads (P1, P2) attempting to get 10 jobs each.How to do that? P1: UPDATE job_queue SET process_id=$1, status=in process WHERE id IN ( SELECT id FROM job_queue WHERE status=new and id IN ( SELECT id FROM job_queue WHERE status=new ORDER BY id LIMIT 10 FOR UPDATE) ) ) P2: the same P1: SELECT * FROM job_queue WHERE process_id=$1 P2: SELECT * FROM job_queue WHERE process_id=$1 The reason for the 2 selects is that if 2 or more processes content for the same set of jobs, the first one will set the status. The second will, after P1 has released the rows get those rows, that are already taken. Of course, this will most likely return 0 rows for P2, since all 10 will be taken. If I leave out the LIMIT 10 in the inner select, I am effectively locking the entire table. Is that the way to go? LOCK TABLE job_queue EXCLUSIVE; UPDATE ... UNLOCK TABLE job_queue; Nix.
Re: [GENERAL] Multithreaded queue in PgSQL
You may find that the PGQ component of skytools is what you want: http://pgfoundry.org/projects/skytools http://skytools.projects.postgresql.org/doc/ http://skytools.projects.postgresql.org/doc/pgq-sql.html Thanks, we will look into it. Still, I am surprised to learn that SQL as such cannot handle it. I do realize that the question is not trivial. Would setting transaction isolation level to SERIALIZABLE help in any way? Would locking of the entire table help in any way? Nix.
Re: [GENERAL] New MS patent: sounds like PG db rules
Still, this sounds dangerous. It should be, even legally, WRONG to patent something that already exist and was not invented by the patentee. I know we can laugh off MS in court, but what about new DBs or project even built on PG that have this functionality? Software patents are a menace, I'm afraid. And this is still just one portion. IBM is also into this line of work. Nix. - Original Message From: Dave Page [EMAIL PROTECTED] To: Justin Clift [EMAIL PROTECTED] Cc: Jonathan Bond-Caron [EMAIL PROTECTED]; A. Kretschmer [EMAIL PROTECTED]; pgsql-general@postgresql.org Sent: Tuesday, May 27, 2008 3:18:31 PM Subject: Re: [GENERAL] New MS patent: sounds like PG db rules HI Justin On Tue, May 27, 2008 at 2:06 PM, Justin Clift [EMAIL PROTECTED] wrote: I'm trying to point out that - PG is a database system - and MS may have just been granted a patent for a fundamental part of it. Thinking it might need looking in to, and trying to bring it to the attention of some that can (or even cares?). ; I don't think it's a major issue. Even if MS do think we infringe on the patent it would be laughable for them to try to do anything about it given that our rules implementation has provably existed in a leading FOSS project for a decade or more. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] slow speeds after 2 million rows inserted
1. There is no difference (speed-wise) between committing every 1K or every 250K rows. It was really some time ago, since I have experimented with this. My las experiment was on PG 7.2 or 7.3. I was inserting cca 800,000 rows. Inserting without transactions took 25 hrs. Inserting with 10,000 rows per transaction took about 2.5 hrs. So, the speedup was 10x. I have not experimented with the transaction batch size, but I suspect that 1,000 would not show much speedup. 2. Vacuuming also makes no difference for a heavy insert-only table, only slows it down. Makes sense. Since my application was dumping all records each month and inserting new ones, vacuum was really needed, but no speedup. 3. Table size plays no real factor. The reason I saw speedup, must have to do with the fact that without transactions, each insert was it's own transaction. That was eating resources. Nix. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: [GENERAL] slow speeds after 2 million rows inserted
The fastest way will be copy. The second fastest will be multi value inserts in batches.. eg.; INSERT INTO data_archive values () () () (I don't knwo what the max is) but commit every 1000 inserts or so. Is this some empirical value? Can someone give heuristics as to how to calculate the optimal number of transactions after which to commit? Or at least guidelines. Nix. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] How to secure PostgreSQL Data for distribute?
Premsun Choltanwanich wrote: Dear All, I need to distribute my application that use PostgreSQL as database to my customer. But I still have some questions in my mind on database security. I understand that everybody who get my application database will be have a full control permission on my database in case that PostgreSQL already installed on their computer and they are an administrator on PostgreSQL. So that mean data, structure and any ideas contain in database will does not secure on this point. Is my understanding correct? What is the good way to make it all secure? Please advise. If you want to keep your data/schema secure, you have to keep it at your own place and that is just the first step in making it secure. Espionage, and that is what basically bothers you, is a complex field of enterprize and there is no one size fits them all. Your first step is keeping the DB at your site and letting your customers connect to it. This may present a huge problem, but there is no other way. No encryption would work, trust me. Nix. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[GENERAL] PostgreSQL slow after VACUUM
Hi all. I have the following: - Mandrake Linux 9.1 - PostgreSQL 7.3.2 MDK5 There is one DB and one DB user. The DB is cleared and loaded with the data of same volume each month (monthly report). The volume is not small and it usually takes 3 hours to load. Loading is done with SQL files which use transactions, 10,000 SQL statements per transaction. A couple of days ago, disk became full, since we were not doing VACUUM on the DB at all. So, I deleted all records from the 3 tables the DB has and performed VACUUM FULL ANALYZE. This reclaimed the space. My problem is that the load is now taking (to my estimate) 20 times more! Anything I could do to find out what's going on? There is nothing in the logs that I can see. Nix. ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[GENERAL] PgSQL 8.0.0 - contributed: problem compiling
Hi all. I've managed to get 8.0.0 up and running. Now I'm compiling some contributed software and I hit a show stopper. Contrib - DBSize Papa-Legba:/home/r/root/5.1/postgresql-8.0.0beta1/contrib/dbsize# gmake ; gmake install sed 's,MODULE_PATHNAME,$libdir/dbsize,g' dbsize.sql.in dbsize.sql cc -std -ieee -pthread -O3 -c99 -I. -I../../src/include -pthread -c -o dbsize.o dbsize.c cc: Error: dbsize.c, line 58: In this statement, GetDatabasePath expects 2 arguments, but 1 are supplied. (toofewargs) dbpath = GetDatabasePath(dbid); -^ gmake: *** [dbsize.o] Error 1 cc -std -ieee -pthread -O3 -c99 -I. -I../../src/include -pthread -c -o dbsize.o dbsize.c cc: Error: dbsize.c, line 58: In this statement, GetDatabasePath expects 2 arguments, but 1 are supplied. (toofewargs) dbpath = GetDatabasePath(dbid); -^ gmake: *** [dbsize.o] Error 1 Inspecting the ./src/include/catalog/catalog.h I can see that it takes two arguments: extern char *GetDatabasePath(Oid dbNode, Oid spcNode); The problematic code segment is: Datum database_size(PG_FUNCTION_ARGS) { Namedbname = PG_GETARG_NAME(0); Oid dbid; char *dbpath; DIR*dirdesc; struct dirent *direntry; int64 totalsize; dbid = get_database_oid(NameStr(*dbname)); if (!OidIsValid(dbid)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_DATABASE), errmsg(database \%s\ does not exist, NameStr(*dbname; dbpath = GetDatabasePath(dbid); What should be the change in this code segment? libPQ docs are not informative... Nix. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] PgSQL 8.0.0 beta1 compile problem + patch
Hi all. Ran into a minor showstopper. Sometimes, my CC is just too much of a nitpick, but on this matter, I agree with it. ./src/timezone/zic.c has a mismatch in declaration and definition of memcheck(...) function. My CC beltched on it (I was running it in strict ANSI mode). The diference is this: static char *memcheck(char *tocheck); ... /* * Memory allocation. */ static char * memcheck(ptr) char *const ptr; { if (ptr == NULL) { const char *e = strerror(errno); (void) fprintf(stderr, _(%s: Memory exhausted: %s\n), progname, e); (void) exit(EXIT_FAILURE); } return ptr; } Notice char *const ptr. Trusting that the lower definition is correct, being more strict and closer to the actual code, I have changed the uper declaration to match. I have attached a patch. Nix. *** src/timezone/zic.c.orig Thu Aug 12 12:44:44 2004 --- src/timezone/zic.c Thu Aug 12 12:45:05 2004 *** *** 134,140 static intitsabbr(const char *abbr, const char *word); static intitsdir(const char *name); static intlowerit(int c); ! static char *memcheck(char *tocheck); static intmkdirs(char *filename); static void newabbr(const char *abbr); static long oadd(long t1, long t2); --- 134,140 static intitsabbr(const char *abbr, const char *word); static intitsdir(const char *name); static intlowerit(int c); ! static char *memcheck(char *const tocheck); static intmkdirs(char *filename); static void newabbr(const char *abbr); static long oadd(long t1, long t2); ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [GENERAL] PgSQL 7.4.2 - NaN on Tru64 UNIX - CORRECTION!!!
Tom Lane wrote: Nikola Milutinovic [EMAIL PROTECTED] writes: [ about NaN on Tru64 ] This compiles on Tru64 4.0D (the compiler swallows it), but fails on Tru64 UNIX 5.1B. Both basic CC and DTK Compaq CC break on that file complaining on that constant evaluation. The best way to solve it is to use system definition of Infinity Constants: ... + #define NAN DBL_INFINITY Current CVS tip will probably fail with this, because we expect the platform to distinguish between NaN and Infinity. Could you retry your experiments with a recent snapshot and let us know what seems best now? Appologies to all, I have been blind! Disregard my proposed patches. One of users on Tru64 mailing list pointed me to a correct solution - add -ieee to CC flags. I took a look at templates for OSF1 in PgSQL and they read: CFLAGS=-O -ieee Since I was putting my own definition for CFLAGS, I left out -ieee. I have reverted ./src/include/port/osf1.h to the the original and changed CFLAGS in ./src/Makefile.global to include -ieee. Then I rebuild ./src/backend/utils/adt/float.o (I'll rebuild the whole PostgreSQL in a moment) and ran regression tests. All 93 tests passed. So, again, my apologies to all, no change neccesary. Nix. ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [GENERAL] PgSQL 7.4.2 - NaN on Tru64 UNIX
Nikola Milutinovic wrote: + #define NAN DBL_INFINITY The compilation has went smoothly, I'll try to run regression tests. Will let you know if something fails. Hate to reply to myself, but here goes. With NAN defined as DBL_INFINITY I get 3 failed regression test, most notably float8 test. With NAN defined as DBL_QNAN (double float Quiet NaN) I get just one, float8. With NAN defined as DBL_SNAN (double float Signalling NaN) I get one error, float8. The problem is the same in all three cases, an expression that is supposed to cause overflow, does so, but the output error text is slightly different. Here is the DIFF file: *** ./expected/float8.out Thu Sep 25 08:58:06 2003 --- ./results/float8.outMon May 17 08:37:51 2004 *** *** 247,253 SET f1 = FLOAT8_TBL.f1 * '-1' WHERE FLOAT8_TBL.f1 '0.0'; SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f; ! ERROR: type double precision value out of range: overflow SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f; ERROR: result is out of range SELECT '' AS bad, ln(f.f1) from FLOAT8_TBL f where f.f1 = '0.0' ; --- 247,254 SET f1 = FLOAT8_TBL.f1 * '-1' WHERE FLOAT8_TBL.f1 '0.0'; SELECT '' AS bad, f.f1 * '1e200' from FLOAT8_TBL f; ! ERROR: floating-point exception ! DETAIL: An invalid floating-point operation was signaled. This probably means an out-of-range result or an invalid operation, suc h as division by zero. SELECT '' AS bad, f.f1 ^ '1e200' from FLOAT8_TBL f; ERROR: result is out of range SELECT '' AS bad, ln(f.f1) from FLOAT8_TBL f where f.f1 = '0.0' ; *** *** 270,276 -- test for over- and underflow INSERT INTO FLOAT8_TBL(f1) VALUES ('10e400'); ! ERROR: 10e400 is out of range for type double precision INSERT INTO FLOAT8_TBL(f1) VALUES ('-10e400'); ERROR: -10e400 is out of range for type double precision INSERT INTO FLOAT8_TBL(f1) VALUES ('10e-400'); --- 271,277 -- test for over- and underflow INSERT INTO FLOAT8_TBL(f1) VALUES ('10e400'); ! ERROR: invalid input syntax for type double precision: 10e400 INSERT INTO FLOAT8_TBL(f1) VALUES ('-10e400'); ERROR: -10e400 is out of range for type double precision INSERT INTO FLOAT8_TBL(f1) VALUES ('10e-400'); == Any comments? Nix. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[GENERAL] BUG: compiling PL/Python - Semi-SOLVED
Hi all. OS: Tru64 UNIX 4.0d PG: PostgreSQL v7.4.1 PY: Python v2.3.3 I just ran into a minor bug while compiling PL/Python module. It bombed out on lines153 and 160 of ./src/pl/plpython/plpython.c complaining on incomplete type "PyObject_HEAD". The source of the problem were these two typedefs: typedef struct PLyPlanObject{ PyObject_HEAD; void *plan; /* return of an SPI_saveplan */ int nargs; Oid *types; Datum *values; PLyTypeInfo *args;} PLyPlanObject; typedef struct PLyResultObject{ PyObject_HEAD; /* HeapTuple *tuples; */ PyObject *nrows; /* number of rows returned by query */ PyObject *rows; /* data rows, or None if no data returned */ PyObject *status; /* query status, SPI_OK_*, or SPI_ERR_* */} PLyResultObject; The problem is in ";" following "PyObject_HEAD". "PyObject_HEAD" is a macro defined as: #define PyObject_HEAD \ _PyObject_HEAD_EXTRA \ int ob_refcnt; \ struct _typeobject *ob_type; So, that in turn expanded to this: typedef struct PLyResultObject{ int ob_refcnt ; struct _typeobject * ob_type ; ; PyObject *nrows; PyObject *rows; PyObject *status;} PLyResultObject; The problem is in that extra semicolon - it is an excess and when removed everything compiles OK. Using this in a regular local variable declaration will not cause problems, since an extra ";" will just be swallowed as an NoOp. Nix.
[GENERAL] OT: HEADS-UP: viral storm out there
Hi all. This is off topic and is a cross-post, so I'll be brief. There is a very nasty virus out there and I urge everybody to get their AV in order. The virus is known as: "W32.Gibe-F" or "W32.Swen-A". Yesterday, I gotcca. 200 viral messages. Today, it's about 800 viral messages! I suspect that a lot of viral traffic directed to me is coming from users on one of the lists I'm crossposting to. Check yourselves. Nix.
Re: [GENERAL] Dreamweaver
This may have been better posted in the advocacy list but I am not a subscriber. We recently purchased Dreamweaver MX and I was a little surprised to find that one of its pre-defined scripting systems is PHP + MySQL. I haven't done much exploring of what is actually offered but wondered whether any one had tried to convince Macromedia to provide a PHP + PostgreSQL or at least a PHP + anydb. PHP has a shortcoming of NOT having a unified interface to any DB. MySQL uses one set of function calls, while PgSQL another. Yes, of course they could add support for another (very popular) DB. But I'd rather feel out their support for JSP/JDBC, which should be in :-) If the support for MySQL is in the form of an Extension, is there a PostgreSQL equivalent as I have been contacted by one of our clients who has said, Do we povide MySQL ?, we dont. Oh! I am teaching myself PHP + MySQL using Dreamweaver and I wanted you to host the site :-( The only unified interface offered by PHP, that I know of, is ODBC. I know there are build options to fuse some Java support in, but that is, IMHO, a waste of time and effort - not to mention resources. Nix. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[GENERAL] PostgreSQL problem with functions
Hi all. Maybe this is not such a novice question, but I'm having problem subscribin to some more professional PG lists. Here goes... I'm trying to make a function in PostgreSQL v7.0. Right now, I'm bugging with PL/PgSQL and SQL functions. What I want to achieve is: insert new row in a table with a possibility of concurent use. Concurent use means that several processes (Apache PHP4) can call this function simultaneously. The logical steps, as I see it, are: 1. TRANSACTION start 2. LOCK table 3. GET max(id)+1 4. INSERT new row with primary key from step 2 5. TRANSACTION commit For this I would like the functionality of PL/PgSQL. I would like it to return the new_id of the inserted row. This is what I had in mind. CREATE FUNCTION start_session_pl( int4, VARCHAR(40), VARCHAR(50) ) RETURNS int4 AS ' DECLARE a_id ALIAS FOR $1; a_ss ALIAS FOR $2; a_ip ALIAS FOR $3; curr_time datetime; new_idint4; BEGIN curr_time := ''now''; IF (SELECT id FROM a_user WHERE id=a_id) ISNULL THEN RAISE EXCEPTION ''No such ID in admins''; END IF; BEGIN TRANSACTION; LOCK TABLE admin_session IN EXCLUSIVE MODE; new_id := (SELECT max(id)+1 FROM admin_session); IF new_id ISNULL THEN new_id := 1; END IF; INSERT INTO admin_session VALUES (new_id, a_ss, curr_time, a_id, a_ip); COMMIT TRANSACTION; RETURN new_id; END; ' LANGUAGE 'plpgsql'; PROBLEM 1 -- According to docs, PL/PgSQL has no support for transactions! And, yes it beltches on any BEGIN TRANSACTION or any such. However, it doesn't complain on LOCK TABLE. Am I locking it or not? And what is the lifetime of that lock? OK, so I though lets write a wrapper function in ordinary SQL, lock table and call the real function. CREATE FUNCTION start_session( int4, VARCHAR(40), VARCHAR(50) ) RETURNS int4 AS ' BEGIN TRANSACTION; LOCK TABLE admin_session IN EXCLUSIVE MODE; SELECT start_session_pl( $1, $2, $3 ); COMMIT TRANSACTION; ' LANGUAGE 'sql'; PROBLEM 2 -- I'm having problems creating this SQL function. PSQL complains that the return type is mismatch. More precisely: ERROR: return type mismatch in function decl: final query is a catalog utility When I put SELECT 1; at the end, the function can be created. So, a more general SELECT is treated as a catalog utility, while a SELECT with a determined type is treated as that type. I have tried explicit conversion to int4, but no go. What can I do? Nix. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster