Re: [GENERAL] Roles inherited from a role which is the owner of a database can drop it?
Hello, On 30 October 2017 at 22:10, David G. Johnston <david.g.johns...@gmail.com> wrote: > On Mon, Oct 30, 2017 at 12:25 PM, Ivan Voras <ivo...@gmail.com> wrote: > >> >> 3. But they do log in with "developer" roles which are inherited from the >> owner role. >> >> [...] > >> I've tried it on a dummy database and it apparently works as described >> here. Is this by design? >> >> > Not quite following but ownership is an inheritable permission; > Basically, I'm asking if "ownership" can be revoked from the set of inherited permissions? If there is a role G which is granted to role A, and G is the owner of a database, can A be made to not be able to do what only owners can (specifically in this case, drop databases)? > and even if it was not SET ROLE is all that would be required. Any owner > can drop an object that it owns. > > It's kind of the reverse: I'm wondering if ownership can be made un-inheritable. > > What are the best practices for this sort of scenario where there is a >> single owner of all the schema (which is large), where developers need >> access to everything but cannot do something as drastic as dropping the dbs >> (and possibly tables)? >> > > Don't let developers into production databases... > > Trusted people (and/or software) should be provided membership into > ownership groups. Developers should provide these people/programs with > vetted scripts to execute against production. Developers can do whatever > they want on their local database instance with full schema-modifying > privileges. > > "developers need access to everything" - there is a lot of nuance and > detail behind that fragment that is needed if one is going to develop a > data access and change management policy. > Just considering the case of dropping databases for now. I.e. let the developers do everything except that. It's a start.
[GENERAL] Roles inherited from a role which is the owner of a database can drop it?
Hello, I just want to verify that what I'm observing is true, and if it is, I'd like to know how to avoid it: 1. There are databases owned by a certain role which is a superuser 2. Nobody logs in with the superuser role unless necessary 3. But they do log in with "developer" roles which are inherited from the owner role. These developer roles are not superusers themselves, but have the CREATEDB flag 4. The developer roles can still drop the databases. I've tried it on a dummy database and it apparently works as described here. Is this by design? If it is, is there a way to prevent the developer roles from dropping the databases? What are the best practices for this sort of scenario where there is a single owner of all the schema (which is large), where developers need access to everything but cannot do something as drastic as dropping the dbs (and possibly tables)?
[GENERAL] Writing on replicas?
Hello, I have a possibly unusual case, I've asked about it before on this list. There is a bunch of reporting being done regularly on some large databases, which interfere with daily operations performance-wise. So one option is to have hot async replication in place to a secondary server where the reports could be run and not impact the main server. The reporting code is *almost* read-only, with the major exception being that it creates temp tables for intermediate results. The main tables are not written to, just the temp tables. Some time ago when I've asked what to do in this situation, the answer was that maybe a future version could allow this setup to work. So, I'm asking again: was something changed in 10.0 which would allow it?
[GENERAL] DROP INDEX CASCADE doesn't want to drop unique constraints?
Hello, On trying to drop an index named "employer_employerid_key" which supports a unique constraint: "employer_employerid_key" UNIQUE CONSTRAINT, btree (employerid) I get this error: ERROR: cannot drop index employer_employerid_key because constraint employer_employerid_key on table employer requires it HINT: You can drop constraint employer_employerid_key on table employer instead. I'm using the CASCADE and IF EXISTS arguments and the docs say nothing about any special cases ( https://www.postgresql.org/docs/9.3/static/sql-dropindex.html). This is with PostgreSQL 9.3.15. The actual command is: drop index if exists employer_employerid_key cascade; Any ideas if this is normal or why it happens?
Re: [GENERAL] Making a unique constraint deferrable?
On 28 February 2017 at 18:03, David G. Johnston <david.g.johns...@gmail.com> wrote: > On Tue, Feb 28, 2017 at 9:50 AM, Ivan Voras <ivo...@gmail.com> wrote: > >> >> ivoras=# alter table foo alter constraint "foo_a_b_key" deferrable; >> ERROR: constraint "foo_a_b_key" of relation "foo" is not a foreign key >> constraint >> >> > A more clear error message would be: > > EROR: cannot alter non-foreign key constraint "foo_a_b_key" of relation > "foo" > > Though I'm not sure how that meshes with the error message style guides... > Any idea what underlying technical reason prohibits marking non-fk constraints as deferrable?
[GENERAL] Making a unique constraint deferrable?
Hello, If I'm interpreting the manual correctly, this should work: ivoras=# create table foo(a integer, b integer, unique(a,b)); CREATE TABLE ivoras=# \d foo Table "public.foo" Column | Type | Modifiers +-+--- a | integer | b | integer | Indexes: "foo_a_b_key" UNIQUE CONSTRAINT, btree (a, b) ivoras=# insert into foo(a,b) values(1,2); INSERT 0 1 ivoras=# insert into foo(a,b) values(1,2); ERROR: duplicate key value violates unique constraint "foo_a_b_key" DETAIL: Key (a, b)=(1, 2) already exists. ivoras=# alter table foo alter constraint "foo_a_b_key" deferrable; ERROR: constraint "foo_a_b_key" of relation "foo" is not a foreign key constraint The manual says this for SET CONSTRAINTS: Currently, only UNIQUE, PRIMARY KEY, REFERENCES (foreign key), and EXCLUDE constraints are affected by this setting. NOT NULL and CHECK constraints are always checked immediately when a row is inserted or modified (not at the end of the statement). Uniqueness and exclusion constraints that have not been declared DEFERRABLE are also checked immediately. I'm puzzled by the "...is not a foreign key constraint" error message. Doesn't "deferrable" also work on unique constraints?
[GENERAL] Foreign key references a unique index instead of a primary key
Hello, I've inherited a situation where: - a table has both a primary key and a unique index on the same field. - at some time, a foreign key was added which references this table (actually, I'm not sure about the sequence of events), which has ended up referencing the unique index instead of the primary key. Now, when I've tried dropping the unique index, I get an error that the foreign key references this index (with a hint I use DROP...CASCADE). This drop index is a part of an automated plpgsql script which deletes duplicate indexes, so I'm interested in two things: 1. How to detect if a foreign key depends on an index I'm about to drop, so I can skip it 2. Is there a way to get around this situation, maybe modify the pg_constraint table or other tables to reference the index / primary key I want ? This is on PostgreSQL 9.3.
[GENERAL] Enhancement proposal for psql: add a column to "\di+" to show index type
Hi, As it says in the subject: if any developer is interested in doing so, I think it is useful to see the index type in "\di+" output. The new column could be named "using" to reflect the SQL statement. It would contain entries such as "btree", "BRIN", "GIN", etc.
[GENERAL] Streaming replication and slave-local temp tables
Hello, Is it possible (or will it be possible) to issue CREATE TEMP TABLE statements on the read-only slave nodes in master-slave streaming replication in recent version of PostgreSQL (9.4+)?
Re: [GENERAL] Catalog bloat (again)
As a follow-up, here's a portion of the nightly vacuum's logs, just want to confirm if my conclusions are right: INFO: vacuuming "pg_catalog.pg_attribute" INFO: scanned index "pg_attribute_relid_attnam_index" to remove 3014172 row versions DETAIL: CPU 0.20s/1.08u sec elapsed 3.72 sec. INFO: scanned index "pg_attribute_relid_attnum_index" to remove 3014172 row versions DETAIL: CPU 0.14s/0.89u sec elapsed 1.70 sec. INFO: "pg_attribute": removed 3014172 row versions in 52768 pages DETAIL: CPU 0.31s/0.30u sec elapsed 1.15 sec. INFO: index "pg_attribute_relid_attnam_index" now contains 19578 row versions in 45817 pages DETAIL: 3013689 index row versions were removed. 45668 index pages have been deleted, 34116 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "pg_attribute_relid_attnum_index" now contains 19578 row versions in 32554 pages DETAIL: 3010630 index row versions were removed. 32462 index pages have been deleted, 24239 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_attribute": found 2278389 removable, 17319 nonremovable row versions in 52856 out of 57409 pages DETAIL: 298 dead row versions cannot be removed yet. There were 641330 unused item pointers. 0 pages are entirely empty. CPU 1.44s/2.88u sec elapsed 10.55 sec. INFO: "pg_attribute": stopping truncate due to conflicting lock request INFO: analyzing "pg_catalog.pg_attribute" INFO: "pg_attribute": scanned 3 of 57409 pages, containing 10301 live rows and 0 dead rows; 10301 rows in sample, 24472 estimated total rows So, - About 3 million rows churned in the table and its two indexes (i.e. dead rows which vacuum found and marked) - that's per day since this is a nightly operation. - After the vacuum, the indexes are left with 19578 rows in (for the first one) in 45817 pages. That's a lot of empty pages, which should be reused the next day, together with free space in partially filled tables, right? - Since pages are 8 KiB, 46,000 pages is about 360 MiB of space - that's how large just one of the pg_attribute table's indexes is after the vacuum. Altogether, it's more than 1 GB. - The "stopping truncate..." message is not really significant, it would have shortened the data files from the end if there are empty pages at the end, which in this case there isn't a significant number of. The truncation can probably never happen on system tables like these since they are always used...? The real question is: why is the total size (i.e. the number of pages) growing at an alarming rate? On one of the db's, we're seeing almost doubling in size each week. Is the internal fragmentation of the data files so significant? Ok, a couple more questions: 1. How come "0 pages are entirely empty" if there are 17319 rows spread around in 52856 pages? 2. What are "unused item pointers"? (I agree with your previous suggestions, will see if they can be implemented). On 28 January 2016 at 00:13, Bill Moran <wmo...@potentialtech.com> wrote: > On Wed, 27 Jan 2016 23:54:37 +0100 > Ivan Voras <ivo...@gmail.com> wrote: > > > > I've done my Googling, and it looks like this is a fairly common problem. > > In my case, there's a collection of hundreds of databases (10 GB+) with > > apps which are pretty much designed (a long time ago) with heavy use of > > temp tables - so a non-trivial system. > > > > The databases are vacuumed (not-full) daily, from cron > > Vacuuming once a day is seldom often enough, except on very quiet > databases. > > > (autovacuum was > > turned off some time ago for performance reasons), and still their size > > increases unexpectedly. By using some of the queries floating around on > the > > wiki and stackoverflow[*], I've discovered that the bloat is not, as was > > assumed, in the user tables, but in the system tables, mostly in > > pg_attributes and pg_class. > > The size increase isn't really unexpected. If you're only vacuuming once > per day, it's very easy for activity to cause active tables to bloat quite > a bit. > > > This is becoming a serious problem, as I've seen instances of these > tables > > grow to 6 GB+ (on a 15 GB total database), while still effectively > > containing on the order of 10.000 records or so. This is quite abnormal. > > > > For blocking reasons, we'd like to avoid vacuum fulls on these tables (as > > it seems like touching them will lock up everything else). > > It will. But to get them back down to a reasonable size, you're going to > have to do a VACUUM FULL at least _once_. If you retune things correctly, > you shouldn't need any more FULLs after that 1 time. > > > So
[GENERAL] Catalog bloat (again)
Hi, I've done my Googling, and it looks like this is a fairly common problem. In my case, there's a collection of hundreds of databases (10 GB+) with apps which are pretty much designed (a long time ago) with heavy use of temp tables - so a non-trivial system. The databases are vacuumed (not-full) daily, from cron (autovacuum was turned off some time ago for performance reasons), and still their size increases unexpectedly. By using some of the queries floating around on the wiki and stackoverflow[*], I've discovered that the bloat is not, as was assumed, in the user tables, but in the system tables, mostly in pg_attributes and pg_class. This is becoming a serious problem, as I've seen instances of these tables grow to 6 GB+ (on a 15 GB total database), while still effectively containing on the order of 10.000 records or so. This is quite abnormal. For blocking reasons, we'd like to avoid vacuum fulls on these tables (as it seems like touching them will lock up everything else). So, question #1: WTF? How could this happen, on a regularly vacuumed system? Shouldn't the space be reused, at least after a VACUUM? The issue here is not the absolute existence of the bloat space, it's that it's constantly growing for *system* tables. Question #2: What can be done about it? This is PostgreSQL 9.3, migrating soon to 9.4. [*] https://wiki.postgresql.org/wiki/Show_database_bloat , http://stackoverflow.com/questions/13931989/postgresql-automating-vacuum-full-for-bloated-tables
Re: [GENERAL] Catalog bloat (again)
On 28 January 2016 at 00:13, Bill Moran <wmo...@potentialtech.com> wrote: > On Wed, 27 Jan 2016 23:54:37 +0100 > Ivan Voras <ivo...@gmail.com> wrote: > > > So, question #1: WTF? How could this happen, on a regularly vacuumed > > system? Shouldn't the space be reused, at least after a VACUUM? The issue > > here is not the absolute existence of the bloat space, it's that it's > > constantly growing for *system* tables. > > With a lot of activity, once a day probably isn't regular enough. > > I sort of see what you are saying. I'm curious, though, what goes wrong with the following list of expectations: 1. Day-to-day load is approximately the same 2. So, at the end of the first day there will be some amount of bloat 3. Vacuum will mark that space re-usable 4. Within the next day, this space will actually be re-used 5. ... so the bloat won't grow. Basically, I'm wondering why is it growing after vacuums, not why it exists in the first place?
Re: [GENERAL] CLOB BLOB limitations in PostgreSQL
On 11/04/2014 16:45, Jack.O'sulli...@tessella.com wrote: With point two, does this mean that any table with a bytea datatype is limited to 4 billion rows (which would seem in conflict with the unlimited rows shown by http://www.postgresql.org/about)? If we had rows where the bytea was a null entry would they contribute towards this total or is it 4 billion non-null entries? This seems strange. A core developer should confirm this but it doesn't make much sense - bytea fields are stored the same as text fields (including varchar etc), i.e. the varlena internal representation, so having the limit you are talking about would mean that any non-trivial table with long-ish text fields would be limited to 2^32 entries... signature.asc Description: OpenPGP digital signature
Re: [GENERAL] encrypting data stored in PostgreSQL
On 09/04/2014 22:40, CS_DBA wrote: Hi All; We have a client with this requirement: At rest data must be encrypted with a unique client key Any thoughts on how to pull this off for PostgreSQL stored data? Some time ago I did this, mostly as an experiment but IIRC it works decently: https://bitbucket.org/ivoras/pgenctypes signature.asc Description: OpenPGP digital signature
Re: [GENERAL] how is text-equality handled in postgresql?
On 15/01/2014 10:10, Gábor Farkas wrote: hi, when i create an unique-constraint on a varchar field, how exactly does postgresql compare the texts? i'm asking because in UNICODE there are a lot of complexities about this.. or in other words, when are two varchars equal in postgres? when their bytes are? or some algorithm is applied? By default, it is whatever the operating system thinks it's right. PostgreSQL doesn't have its own collation code, it uses the OS's locale support for this. (which breaks on certain systems which don't have complete UTF-8 support - I'm in favour of importing ICU at least as an optional dependancy, similar to what the FreeBSD's patch does: http://people.freebsd.org/~girgen/postgresql-icu/). signature.asc Description: OpenPGP digital signature
Re: [GENERAL] how is text-equality handled in postgresql?
On 15/01/2014 12:36, Amit Langote wrote: On Wed, Jan 15, 2014 at 7:39 PM, Ivan Voras ivo...@freebsd.org wrote: On 15/01/2014 10:10, Gábor Farkas wrote: hi, when i create an unique-constraint on a varchar field, how exactly does postgresql compare the texts? i'm asking because in UNICODE there are a lot of complexities about this.. or in other words, when are two varchars equal in postgres? when their bytes are? or some algorithm is applied? By default, it is whatever the operating system thinks it's right. PostgreSQL doesn't have its own collation code, it uses the OS's locale support for this. Just to add to this, whenever strcoll() (a locale aware comparator) says two strings are equal, postgres re-compares them using strcmp(). See following code snippet off src/backend/utils/adt/varlena.c:varstr_cmp() - /* * In some locales strcoll() can claim that nonidentical strings are * equal. Believing that would be bad news for a number of reasons, * so we follow Perl's lead and sort equal strings according to * strcmp(). */ if (result == 0) result = strcmp(a1p, a2p); That seems odd and inefficient. Why would it be necessary? I would think indexing (and other collation-sensitive operations) don't care what the actual collation result is for arbitrary blobs of strings, as long as they are stable? signature.asc Description: OpenPGP digital signature
Re: [GENERAL] how is text-equality handled in postgresql?
On 15/01/2014 13:29, Amit Langote wrote: On Wed, Jan 15, 2014 at 9:02 PM, Ivan Voras ivo...@freebsd.org wrote: On 15/01/2014 12:36, Amit Langote wrote: * In some locales strcoll() can claim that nonidentical strings are * equal. Believing that would be bad news for a number of reasons, * so we follow Perl's lead and sort equal strings according to * strcmp(). */ if (result == 0) result = strcmp(a1p, a2p); That seems odd and inefficient. Why would it be necessary? I would think indexing (and other collation-sensitive operations) don't care what the actual collation result is for arbitrary blobs of strings, as long as they are stable? This is the behavior since quite some time introduced by this commit http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=656beff59033ccc5261a615802e1a85da68e8fad Ok, the commit comment is: Adjust string comparison so that only bitwise-equal strings are considered equal: if strcoll claims two strings are equal, check it with strcmp, and sort according to strcmp if not identical. This fixes inconsistent behavior under glibc's hu_HU locale, and probably under some other locales as well. Also, take advantage of the now-well-defined behavior to speed up texteq, textne, bpchareq, bpcharne: they may as well just do a bitwise comparison and not bother with strcoll at all. ... so it's just another workaround for OS specific locale issues - to me it looks like just another reason to use ICU. signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Consistent file-level backup of pg data directory
On 08/01/2014 16:09, gator...@yahoo.de wrote: For machines running database systems, this means, this means, that I need some way to get a consistent state of some point in time. It does not particularly matter, which time exactly (in Unfortunately, it does not look like there is any direct way to accomplish this with postgres except shutting down the whole database system while the backup is running. The systems will You could use a file system which supports snapshots (I don't know which Linux FS's do, I've read that LVM can be used to simulate those) to get an exact point-in-time backup which will use the database's usual resilience to be stable enough for restoring. signature.asc Description: OpenPGP digital signature
[GENERAL] Recheck conditions on indexes
Hi, I'm just wondering: in the execution plan such as this one, is the Recheck Cond phase what it apparently looks like: an additional check on the data returned by indexes, and why is it necessary? I would have though that indexes are accurate enough? cms= explain analyze select * from users where other_ids-'OIB'='70328909364' or code='0023017009'; QUERY PLAN -- Bitmap Heap Scan on users (cost=8.52..39.21 rows=10 width=330) (actual time=0.042..0.044 rows=2 loops=1) Recheck Cond: (((other_ids - 'OIB'::text) = '70328909364'::text) OR ((code)::text = '0023017009'::text)) - BitmapOr (cost=8.52..8.52 rows=10 width=0) (actual time=0.035..0.035 rows=0 loops=1) - Bitmap Index Scan on users_other_ids_oib (cost=0.00..4.26 rows=9 width=0) (actual time=0.023..0.023 rows=1 loops=1) Index Cond: ((other_ids - 'OIB'::text) = '70328909364'::text) - Bitmap Index Scan on users_code (cost=0.00..4.26 rows=1 width=0) (actual time=0.012..0.012 rows=1 loops=1) Index Cond: ((code)::text = '0023017009'::text) Total runtime: 0.082 ms (8 rows) Both indexes are plain btrees, the first one is on the expression on the hstore field (other_ids-'OIB') and the second one on a plain text field. Also, why is it using the Bitmap Index Scan in both cases? A plain query for code='foo' uses a plain index scan. This is PostgreSQL 9.1. signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Recheck conditions on indexes
On 25/10/2013 11:06, Albe Laurenz wrote: Just because there is an entry in the index does not imply that the corresponding table entry is visible for this transaction. To ascertain that, the table row itself has to be checked. Understood. PostgreSQL 9.2 introduced index only scan which avoids that additional step if it is safe to do so. It doesn't help in this case - the plan for the same query on a copy of the database on 9.3 is exactly the same. signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Major upgrade of PostgreSQL and MySQL
On 12/09/2013 18:16, Karl Denninger wrote: On 9/12/2013 11:11 AM, Patrick Dung wrote: While reading some manual of PostgreSQL and MySQL (eg. http://dev.mysql.com/doc/refman/5.1/en/upgrading-from-previous-series.html). I have found that MySQL has stated many incompatibilities and know issues (a long list) in the MySQL version 5.0, 5.1, 5.5, 5.6 and 5.7. For PostgreSQL, it seems I can't find the list (it just say see the Appendix E / release notes). I think it is a plus for PostgreSQL if it has few incompatibilities between major versions. By the way, for in-place major version upgrade (not dumping DB and import again), MySQL is doing a better job in here. Please share your thought, thanks. pg_upgrade will do an in-place upgrade if you wish. It is somewhat risky if not done using a COPY (it can either copy or not, as you wish) but it's considerably faster than a dump/restore and is in-place. I use it regularly. If I read the documentation correctly (http://www.postgresql.org/docs/9.3/static/pgupgrade.html), it needs oldbindir and newbindir arguments pointing to the directories of PostgreSQL executables for the old and new versions, making it basically unusable for upgrading systems which are maintained with packages instead of individually compiling installing custom versions of PostgreSQL, right? (except possibly Debian which may allow multiple pg versions to be installed, I haven't tried it). signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Major upgrade of PostgreSQL and MySQL
On 13 September 2013 21:44, Patrick Dung patrick_...@yahoo.com.hk wrote: Ivan Voras has replied that the link method work fine in Windows on another thread. That would be very surprising since I don't run Windows servers :) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Trigram (pg_trgm) GIN index not used
Hello, I have a table with the following structure: nn= \d documents Table public.documents Column | Type | Modifiers ---+--+ id| integer | not null default nextval('documents_id_seq'::regclass) ctime | integer | not null default unix_ts(now()) dtime | integer | not null title | citext | not null html_filename | text | not null raw_data | citext | not null fts_data | tsvector | not null tags | text[] | flags | integer | not null default 0 dtype | integer | not null default 0 Indexes: documents_pkey PRIMARY KEY, btree (id) documents_html_filename UNIQUE, btree (html_filename) documents_raw_data_trgm gin (raw_data gin_trgm_ops) documents_title_trgm gin (title gin_trgm_ops) I'd like to use pg_trgm for matching substrings case-insensitively, but it doesn't seem to use the index: nn= explain select id,title from documents where raw_data ilike '%zagreb%'; QUERY PLAN --- Seq Scan on documents (cost=0.00..6648.73 rows=180 width=98) Filter: (raw_data ~~* '%zagreb%'::citext) (2 rows) nn= explain select id,title from documents where raw_data like '%zagreb%'; QUERY PLAN --- Seq Scan on documents (cost=0.00..6692.71 rows=181 width=98) Filter: (raw_data ~~ '%zagreb%'::citext) (2 rows) When I try to create a GIST index as advised by the comment at: http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html I get the following error: ERROR: index row requires 10488 bytes, maximum size is 8191 What am I doing wrong? signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Trigram (pg_trgm) GIN index not used
On 21/02/2013 12:52, Ivan Voras wrote: I'd like to use pg_trgm for matching substrings case-insensitively, but it doesn't seem to use the index: As a sort-of followup, the '%' operator kind of works but takes incredibly long time, and the selectivity estimates are completely wrong: nn= vacuum analyze documents; VACUUM nn= explain select id,title from documents where raw_data % 'zagreb'; QUERY PLAN - Bitmap Heap Scan on documents (cost=128.42..330.87 rows=54 width=108) Recheck Cond: ((raw_data)::text % 'zagreb'::text) - Bitmap Index Scan on documents_raw_data_trgm (cost=0.00..128.40 rows=54 width=0) Index Cond: ((raw_data)::text % 'zagreb'::text) (4 rows) nn= explain analyze select id,title from documents where raw_data % 'zagreb'; QUERY PLAN - Bitmap Heap Scan on documents (cost=128.42..330.87 rows=54 width=108) (actual time=98750.283..98750.283 rows=0 loops=1) Recheck Cond: ((raw_data)::text % 'zagreb'::text) - Bitmap Index Scan on documents_raw_data_trgm (cost=0.00..128.40 rows=54 width=0) (actual time=26.748..26.748 rows=51874 loops=1) Index Cond: ((raw_data)::text % 'zagreb'::text) Total runtime: 98750.623 ms (5 rows) There is no IO load during this query. signature.asc Description: OpenPGP digital signature
[GENERAL] PostgreSQL and a clustered file system
Hello, Is anyone running PostgreSQL on a clustered file system on Linux? By clustered I actually mean shared, such that the same storage is mounted by different servers at the same time (of course, only one instance of PostgreSQL on only one server can be running on such a setup, and there are a lot of other precautions that need to be satisfied). signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?
On 5 October 2012 04:53, Moshe Jacobson mo...@neadwerx.com wrote: On Thu, Oct 4, 2012 at 6:12 AM, Ivan Voras ivo...@freebsd.org wrote: On 01/10/2012 15:36, Moshe Jacobson wrote: I am working on an audit logging trigger that gets called for every row inserted, updated or deleted on any table. For this, I need to store a couple of temporary session variables such as the ID of the user performing the change, which can be set at the start of the session. Do you know about session variables? The major benefit here is that it doesn't touch the table engines, temporary or not. This sounds incredibly useful. Why have I not heard of this until today?? In your example you still had to use a BEGIN...EXCEPTION block. Is that faster than a create temp table? I think I can make a fairly educated guess that catching exceptions while dealing with session variables should be much, much faster than creating any kind of a table :) Besides, from what you said, you will ensure on the app level that the session variable is set sometime close to when you open a connection to Pg, so the catch part of the exception block will probably not run at all. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?
On 5 October 2012 15:55, Merlin Moncure mmonc...@gmail.com wrote: On Fri, Oct 5, 2012 at 3:09 AM, Ivan Voras ivo...@freebsd.org wrote: I think I can make a fairly educated guess that catching exceptions while dealing with session variables should be much, much faster than creating any kind of a table :) On the other hand, if the temp table can be completely ditched for a session variable or two, then yeah, that would be much better since you'd avoid the overhead of creating the table completely. Yes, this is what I was aiming at, based on the OP mentioning he only has a limited amount of data to manage in this way. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?
On 01/10/2012 15:36, Moshe Jacobson wrote: I am working on an audit logging trigger that gets called for every row inserted, updated or deleted on any table. For this, I need to store a couple of temporary session variables such as the ID of the user performing the change, which can be set at the start of the session. Until now I have been using a permanent table to store the session variables, but it has been difficult to wipe the data properly at the end of the session. Do you know about session variables? I did something similar to what you are describing and it ended up much simpler than using tables, temporary or not. You need to configure them in postgresql.conf, e.g.: custom_variable_classes = 'myapp' Then in the application code: SET myapp.uid = 42; And in the pl/pgsql function: CREATE OR REPLACE FUNCTION dblog() RETURNS TRIGGER AS $$ DECLARE uid INTEGER; BEGIN BEGIN SELECT current_setting('myapp.uid') INTO uid; EXCEPTION WHEN undefined_object THEN uid = null; WHEN data_exception THEN uid = null; END; ... END; $$ LANGUAGE plpgsql; The major benefit here is that it doesn't touch the table engines, temporary or not. signature.asc Description: OpenPGP digital signature
[GENERAL] pg_dump, send/recv
Hello, Actually I have sort of two questions rolled into one: when creating custom data types, there's the option to implement *_send() and *_recv() functions in addition to *_in() and *_out(); does pg_dump use them for binary dumps, and, if not, what uses them? Are they only an optional optimization for storing binary data in the database? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Custom integer-like type
Hello, I'm creating a new data in C, and everything is proceeding well, except that the data type should be parsed on input like an integer. Maybe it's best if I explain it with an example: Currently, I can do this: INSERT INTO t(my_data_type) VALUES ('1') but I cannot do this: INSERT INTO t(my_data_type) VALUES (1) My type is defined as: CREATE TYPE myint ( internallength = variable, input = encbigint_in, output = encbigint_out, alignment = int4, storage = external ); (The variable length is correct in this case.) signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Versioned, chunked documents
On 02/04/2012 01:52, Martin Gregorie wrote: BTW, why use document_chunks when a text field can hold megabytes, especially if they will be concatenated to form a complete document which is then edited as a whole item and before being split into chunks and saved back to the database? ... because neither of your assumptions are true :) They will not hold megabytes, they will not often be concatenated, the document is never edited in whole, and I do *not* need version numbers on the whole document :) signature.asc Description: OpenPGP digital signature
[GENERAL] Versioned, chunked documents
Hi, I have documents which are divided into chunks, so that the (ordered) concatenation of chunks make the whole document. Each of the chunks may be edited separately and past versions of the chunks need to be kept. The structure looks fairly simple: CREATE TABLE documents ( id SERIAL PRIMARY KEY, title TEXT NOT NULL, ... ); CREATE TABLE documents_chunks ( id SERIAL PRIMARY KEY, ctime TIMESTAMP NOT NULL, documents_id INTEGER REFERENCES documents(id), seq INTEGER NOT NULL, -- sequence within the document content TEXT, ... ); The first goal is to retrieve the latest version of the whole document, made from the latest versions of all chunks, but later the goal will also be to fetch the whole version at some point in time (i.e. with chunks created before a point in time). I did the first goal by creating two helper views: CREATE VIEW documents_chunks_last_version_chunk_ids AS SELECT documents_id, max(id), seq FROM documents_chunks GROUP BY documents_id, seq; CREATE VIEW documents_chunks_last_version_content AS SELECT documents_chunks.documents_id, content FROM documents_chunks JOIN documents_chunks_last_version_chunk_ids ON documents_chunks.id=documents_chunks_last_version_chunk_ids.max ORDER BY documents_chunks_last_version_chunk_ids.seq; There are indexes on the document_chunks fields seq and documents_id. Everything looked fine until I examined the output of EXPLAIN ANALYZE and saw this: db= set enable_seqscan to off; SET db= explain analyze select * from documents_chunks_last_version_content where documents_id=1; EXPLAIN output given in: http://explain.depesz.com/s/mpY The query output seems correct on this test case: db= select * from documents_chunks_last_version_content where documents_id=1; documents_id | content --+- 1 | C1, v2 1 | C2, v3 1 | C3, v1 (3 rows) This huge cost of 100 which appeared out of nowhere in the EXPLAIN output and the seq scan worry me - where did that come from? There are absolutely no unindexed fields in the query, and the result set of the aggregate (max) is very small. Of course, I might be doing the whole structure wrong - any ideas? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Versioned, chunked documents
2012/4/2 Ondrej Ivanič ondrej.iva...@gmail.com: Hi, On 2 April 2012 08:38, Ivan Voras ivo...@freebsd.org wrote: db= set enable_seqscan to off; snip This huge cost of 100 which appeared out of nowhere in the EXPLAIN output and the seq scan worry me - where did that come from? It is not possible to disable seq scan completely. The enable_seqscan to off just sets cost of this operation extremely high (10 000 000 000) thus planner is forced look for better plan. In your case planner wasn't able to find anything better hence cost=109.55..109.56. Hi, Thanks, I suspected something hard-coded might be adding to the cost, but this leaves the more important issue: why is a seq scan happening at all with an indexed field (the id)? If I read the EXPLAIN output (at http://explain.depesz.com/s/mpY) correctly, this is where it happens: Hash Cond: ((max(public.documents_chunks.id)) = public.documents_chunks.id) The left hand side is a result of the MAX aggregate with a GROUP, but the number of records is really small so I think the index should be used on the right hand side of the hash cond. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] what Linux to run
On 28/02/2012 18:17, Rich Shepard wrote: On Tue, 28 Feb 2012, mgo...@isstrucksoftware.net wrote: If we move to Linux, what is the preferred Linux for running Postgres on. This machine would be dedicated to the database only. Michael, There is no 'preferred' linux distribution; the flame wars on this topic died out a decade or so ago. From what you write, I would suggest that you look at one of the Ubunutus http://www.ubuntu.org/. Either the KDE or Gnome versions will appear Microsoft-like; the Xfce version appears more like CDE. Download a bootable .iso (a.k.a. 'live disk) and burn it to a cdrom and you can try it without .installing it. If you do like it, install it from the same disk. The Ubuntus boot directly into the GUI and that tends to be more comfortable for newly defenestrated users. If you like that, but want the more open and readily-available equivalent, install Debian. The ubuntus are derivatives of debian. One interesting thing I've discovered recently is that there is a HUGE difference in performance between CentOS 6.0 and Ubuntu Server 10.04 (LTS) in at least the memory allocator and possibly also multithreading libraries (in favour of CentOS). PostgreSQL shouldn't be particularly sensitive to either of these, but it makes me wonder what else is suboptimal in Ubuntu. signature.asc Description: OpenPGP digital signature
Re: [GENERAL] what Linux to run
On 28/02/2012 17:57, mgo...@isstrucksoftware.net wrote: Our application runs on Windows, however we have been told that we can pick any OS to run our server on. I'm thinking Linux because from everything I've read, it appears to be a better on performance and there are other features like tablespaces which we could take advantage of. On our hosted solution, the application runs in a Software as a Service model and being able to keep each companies tables in their own table space would be nice. Additionally it appears that there are a lot more ways to tune the engine if we need to than under windows, plus the capability to hold more connections. If we move to Linux, what is the preferred Linux for running Postgres on. This machine would be dedicated to the database only. I'd like a recommendation for both a GUI hosted version and a non-GUI version. I haven't used Linux in the past but did spend several year s in a mixed Unix and IBM mainframe environment at the console level. Hi, PostgreSQL administration would not benefit much from a GUI, as it is basically centered around editing and tuning configuration files (either its or the OS's). For Linux, if you want stability and decent performance, you should probably choose either CentOS, or if you want commercial support, Red Hat Enterprise Linux (which is basically the same thing, only commercial). Personally, I'd recommend FreeBSD (it's not a Linux, it's more Unix-like) but I'm probably biased ;) signature.asc Description: OpenPGP digital signature
[GENERAL] PostgreSQL poster
I was looking for some PostgreSQL promotional material and found this: http://imgur.com/4VUUw I would very much like to get a high-res version of this image - does anyone here have it? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Session variables and C functions
I'm writing a custom C function and one of the things it needs to do is to be configured from the SQL-land, per user session (different users have different configurations in different sessions). I have found (and have used) the SET SESSION command and the current_setting() function for use with custom_variable_classes configuration in postgresql.conf, but I'm not sure how to achieve the same effect from C. Ideally, the C module would create its own custom variable class, named e.g. module, then define some setting, e.g. module.setting. The users would then execute an SQL command such as SET SESSION module.setting='something', and the module would need to pick this up in the C function. Any pointers to how this would be done? signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Session variables and C functions
On 17 November 2011 19:02, Tom Lane t...@sss.pgh.pa.us wrote: Ivan Voras ivo...@freebsd.org writes: Ideally, the C module would create its own custom variable class, named e.g. module, then define some setting, e.g. module.setting. The users would then execute an SQL command such as SET SESSION module.setting='something', and the module would need to pick this up in the C function. Plenty of examples of that in contrib/ ... Ok, I found DefineCustom*() and _PG_init() but there's a small problem: the functions I'm writing are for a user defined typed and apparently _PG_init() is not called until some operation with the type is done (e.g. anything with the input/output functions). This means that DefineCustom*() is not called and PG doesn't know about the variable until it's too late - I need the session variable to be settable by the user before input/output - relying on the default value is not enough. Is there any way to make _PG_init() called earlier, e.g. as soon as the session is established or at database connection time, something like that? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Session variables and C functions
On 18 November 2011 01:20, Tom Lane t...@sss.pgh.pa.us wrote: Ivan Voras ivo...@freebsd.org writes: Is there any way to make _PG_init() called earlier, e.g. as soon as the session is established or at database connection time, something like that? Preload the library --- see shared/local_preload_libraries configuration settings. Ok, thanks! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Bulk processing deletion
Hello, I have a table with a large number of records (millions), on which the following should be performed: 1. Retrieve a set of records by a SELECT query with a WHERE condition 2. Process these in the application 3. Delete them from the table Now, in the default read-committed transaction isolation, I can't just use the same WHERE condition with a DELETE in step 3 as it might delete more records than are processed in step 1 (i.e. phantom read). I've thought of several ways around it and would like some feedback on which would be the most efficient: #1: Create a giant DELETE WHERE ... IN (...) SQL command for step #3 with primary keys of records from step 1 - but will it hit a SQL string length limitation in the database? Is there such a limit (and what is it?) #2: Same as #1 but with batching the records to e.g. 1000 at a time, all in one transaction #2: Use a higher isolation level, probably Repeatable Read (PG 9.0) - but then the question is will this block other clients from inserting new data into the table? Also, is Repeatable Read enough? Any other ideas? signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Bulk processing deletion
On 13/10/2011 14:34, Alban Hertroys wrote: Any other ideas? CREATE TABLE to_delete ( job_created timestamp NOT NULL DEFAULT now(), fk_id int NOT NULL ); -- Mark for deletion INSERT INTO to_delete (fk_id) SELECT id FROM table WHERE condition = true; -- Process in app SELECT table.* FROM table INNER JOIN to_delete ON (table.id = to_delete.fk_id); -- Delete them DELETE FROM table WHERE id IN (SELECT fk_id FROM to_delete); Good point. I can even use a temp table for this and make use of UNLOGGED temp tables when we upgrade to 9.1! signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Bulk processing deletion
On 13 October 2011 20:08, Steve Crawford scrawf...@pinpointresearch.com wrote: On 10/13/2011 05:20 AM, Ivan Voras wrote: Hello, I have a table with a large number of records (millions), on which the following should be performed: 1. Retrieve a set of records by a SELECT query with a WHERE condition 2. Process these in the application 3. Delete them from the table Without knowing a bit more, it is difficult to say. A couple questions: 1. Are there conflicting processes - i.e. could multiple applications be in contention to process the same set of records? No, only one bulk processor. 2. Is the processing all or none or could individual records fail? If so, how do you deal with reprocessing or returning those to the main table. All or none; the nature of thing is that there can be no fatal failures. Depending on the nature of your app, it might be feasible to reorder the actions to move the records to be processed into a processing table and delete them from that table as the records are processed by the application. You could move the records into the processing table with: with foo as (delete from main_table where your_where_clause returning a,b,c) insert into processing_table (a,b,c) select a,b,c from foo; In this case I would not recommend temporary or unlogged tables for the processing table as that becomes the only source of the data once deleted from the master table. Ok, thanks (to everyone)! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PostgreSQL benchmarked on XFS vs ZFS vs btrfs vs ext4
On 14/09/2011 09:30, Toby Corkindale wrote: On 14/09/11 12:56, Andy Colson wrote: On 09/13/2011 08:15 PM, Toby Corkindale wrote: Hi, Some months ago, I ran some (probably naive) benchmarks looking at how pgbench performed on an identical system with differing filesystems. (on Linux). [snip] Did you test unplugging the power cable in the middle of a test to see which would come back up? Heh, no, but it'd be interesting to see.. I wonder if turning a virtual machine off has the same effect? No, never the exact same effect. There are two things to consider: if/when/how the OS flushes the data to the hardware and if/when/how the hardware flushes the data to physical storage. You can simulate only the failure of the first part with a virtual machine, but not the second (unless you bring down the VM host...). -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: Seg Fault in backend after beginning to use xpath (PG 9.0, FreeBSD 8.1)
On 03/05/2011 07:12, alan bryan wrote: Our developers started to use some xpath features and upon deployment we now have an issue where PostgreSQL is seg faulting periodically. Any ideas on what to look at next would be much appreciated. FreeBSD 8.1 PostgreSQL 9.0.3 (also tried upgrading to 9.0.4) built from ports Libxml2 2.7.6 (also tried upgrading to 2.7.8) built from ports pgsql logs show: May 1 17:51:13 192.168.20.100 postgres[11862]: [94-1] LOG: server process (PID 62112) was terminated by signal 11: Segmentation fault syslog shows: May 2 20:29:16 db3 kernel: pid 49956 (postgres), uid 70: exited on signal 11 (core dumped) May 2 21:06:37 db3 kernel: pid 39086 (postgres), uid 70: exited on signal 10 (core dumped) Checking out postgres.core and we see: (gdb) bt #0 0x0008f5f19afd in pthread_mutex_lock () from /lib/libthr.so.3 #1 0x000800d22965 in xmlRMutexLock () from /usr/local/lib/libxml2.so.5 This is unusual. There isn't any need to use pthreads here. As far as I can see, the normal build of libxml2 doesn't import it explicitly: ldd /usr/local/lib/libxml2.so /usr/local/lib/libxml2.so: libz.so.5 = /lib/libz.so.5 (0x800889000) libiconv.so.3 = /usr/local/lib/libiconv.so.3 (0x800e5) libm.so.5 = /lib/libm.so.5 (0x80104b000) libc.so.7 = /lib/libc.so.7 (0x800647000) Judging by the mix of SIGBUS and SIGSEGV, I'd say it is likely this is causing you problems. To make sure, you may want to rebuild libxml2 with WITHOUT_THREADS defined. You may also need to rebuild postgresql afterwards. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] multi-tenant vs. multi-cluster
On 18/03/2011 19:17, Ben Chobot wrote: if we're talking an extra 50MB of memory per cluster, that will start to add up. Consider this: each such cluster will have: a) its own database files on the drives (WAL, data - increasing IO) b) its own postgresql processes (many of them) running in memory c) its own shared_buffers in memory. It is highly unlikely that you will manage anything decent with this type of configuration with a non-trivial number of clusters. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Copying databases with extensions - pg_dump question
A fairly frequent operation I do is copying a database between servers, for which I use pg_dump. Since the database contains some extensions - most notably hstore and tsearch2, which need superuser privileges to install, I have a sort of a chicken-and-egg problem: the owner of the database (and all its objects) should be a non-superuser account so I can't simply use the output from pg_dump and expect everything to be correct after restoring it. So far, I've used this workaround: install all superuser-requiring extensions in template1 on the destination server and then restore from pg_dump, ignoring the occasional duplicate object errors. This would work out of the box but pg_dump's create database commands (outputted with -C) includes the WITH TEMPLATE=template0 clause so I made a small utility which modifies these dumps to change the one byte so the template becomes template1. (-C is useful so I can do psql template1 pgsql my_dump.sql and get it all done). Anyway, this works good enough but I wonder now if there is a better solution for this? As a feature request, I'd like a template database argument to use with -C so I don't have to modify the dumps, but there could be a better solution which side-steps this. Is there a canonical way to deal with this problem? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Copying databases with extensions - pg_dump question
On 21/01/2011 14:39, Bill Moran wrote: In response to Ivan Vorasivo...@freebsd.org: A fairly frequent operation I do is copying a database between servers, for which I use pg_dump. Since the database contains some extensions - most notably hstore and tsearch2, which need superuser privileges to install, I have a sort of a chicken-and-egg problem: the owner of the database (and all its objects) should be a non-superuser account so I can't simply use the output from pg_dump and expect everything to be correct after restoring it. Why not? If the ownership on the original database is non-superuser, then that will be faithfully preserved when the database is restored. What are you doing to cause it to behave differently? I have reviewed my operations and it looks like these are the important differences: * The database copy might be from a development machine to production so I use pg_dump -O to remove any accidentally entered unwanted user ownership data * The database restore on the target machine is done as a nonprivileged user (the target owner of the database) Are there better ways to do this? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Copying databases with extensions - pg_dump question
On 21/01/2011 15:55, Bill Moran wrote: On the no side, doing this kind of thing is always complex. We have a slew of other, very specialized scripts that do things like convert a production database to a development database by sanitizing sensitive data, or automatically deploy new database or upgrades to either our production, development, or lab environments. Build tools like phing, make, ant, etc make this kind of thing easier to create, but it's still a lot of work because each situation is special. Yes, I've also made a set of migration and sanitizing scripts so that aspect is covered. I was only wondering if there is some way I'm missing that would solve this one step more elegantly. So, my overall answer is that you're probably on the right track, you probably don't realize how much work is involved to get this really working well, and I would change just a few things about your approach based on the information you've provided so far. The requirement to use an unprivileged user to restore is going to make the tools you use to prepare the input files more important, but that's not insurmountable if you keep control over them (i.e. use schema files and a tool for normalizing them and do the data dump separately) Thanks, I'll consider that approach. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimal settings for embedded system running PostgreSQL
On 13/01/2011 17:31, Christian Walter wrote: Von: pgsql-general-ow...@postgresql.org im Auftrag von Ivan Voras - Average read = 15,6Mb/s - 4Kbyte reads = 3,5Mb/s - 1Kbyte read = 1Mb/s This is very slow. Have you considered something more light-weight like SQLite? This is comparable to a standard USB2.0 flash drive. Reads are faster on very large block sizes but I thought postgres by default uses 8Kbyte. At 8KByte we have about 5000kbyte/s. For example if you compare a standard flash disk, e.g. http://www.innodisk.com/production.jsp?flashid=34 so can see that they specify 26mb/sec as a max. The same is specified here but I think is is only for very large block sizes. Using a different database is not an option for us since our software depends on PostgreSQL and on its JDBC drivers. We are working with the vendor to find a solution for this but of course this is only a long term option. Of course, it depends on your constraints, but my suggestion was not only related to block IO speed but more to the nature of the application: there is a large difference between the number and the complexity of operations between postgresql and sqlite. SQLite might need much less IO bandwidth independently of block sizes. As others said, switching to a non-Windows OS will probably also increase performance, independently of database type or block sizes. My point is: you may need to scale everything optimally for your application, not only PostgreSQL. Your original post doesn't really mention if your database activity is write-mostly or read-mostly, so you can do additional tuning. For example, if you are write-heavy and can survive a couple of last transactions being lost, you could configure PostgreSQL for synchronous_commit=off without losing complete database integrity on power failure. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Optimal settings for embedded system running PostgreSQL
On 13/01/2011 14:30, Christian Walter wrote: Dear Members, We are currently using PostgreSQL 7.3 on an Embedded System (Based on http://www.msc-ge.com/de/produkte/com/qseven/q7-us15w.html) running Windows XP Embedded / SP3. The onbard flash shows the following performance figures: - Average read = 15,6Mb/s - 4Kbyte reads = 3,5Mb/s - 1Kbyte read = 1Mb/s This is very slow. Have you considered something more light-weight like SQLite? We are sometimes facing problems that the system seems to hang due to database activity and now we are looking to improve the situation. If the database application is running and we run the benchmark again figures drop down to zero so we assume that the database (and application) is responsible for the high load. Since the embedded flash is quite hard to change we are looking for solution on improving this. For me it seems that blocksize seems a big issue and I would like to know if a recompile of Postgres with a larger block size would resolve this? You can, see --with-blocksize and --with-wal-blocksize arguments to ./configure, but flash memory is notorious for having really large block sizes, on the order of 256 KiB - I suspect it would be very inefficient if the database tried to do everything in such large blocks. Are there any other options which should be set for flash based disks since latency is a lower than on harddisks. First, you need to confirm or find where the real problem is. Windows is hard for debugging but you may get some information from the perfmon applet. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Bytea error in PostgreSQL 9.0
On 14/12/2010 14:51, tuanhoanganh wrote: Thanks for your help. Is there any .Net or VB tutorial new 9.0 bytea? You do not need to change your code if you add bytea_output = 'escape' # hex, escape into postgresql.conf. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Best practice to get performance
On 11/19/10 15:49, Andy Colson wrote: unlogged will only help insert/update performance. Lookup tables sound readonly for a majority of time. (I'm assuming lots of reads and every once and a while updates). I doubt that unlogged tables would speed up lookup tables. Are FreeBSD's temp tables still storage-backed? (i.e. are there memory-backed temp tables)? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Best practice to get performance
On 11/19/10 23:14, Andy Colson wrote: On 11/19/2010 4:04 PM, Ivan Voras wrote: On 11/19/10 15:49, Andy Colson wrote: unlogged will only help insert/update performance. Lookup tables sound readonly for a majority of time. (I'm assuming lots of reads and every once and a while updates). I doubt that unlogged tables would speed up lookup tables. Are FreeBSD's temp tables still storage-backed? (i.e. are there memory-backed temp tables)? Sorry, I'm not sure what you are asking. Not sure what this has to do with FreeBSD, or its temp tables. Unless: s/freebsd/postgres/i Yes, sorry, switched between mailing lists too fast :) Ah, yes, Postgres temp tables still make it to disk eventually. They are not WAL logged (but if they get really big they spill to disk). There are no memory only tables, no. But PG will cache the heck out of all tables, so you get the best of both worlds. Thanks! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
On 11/17/10 01:23, Scott Ribe wrote: On Nov 16, 2010, at 3:46 PM, Josh Berkus wrote: ...and will be truncated (emptied) on database restart. I think that's key. Anything that won't survive a database restart, I sure don't expect to survive backup restore. FWIW, I agree with this reasoning. Iff the automatic truncate on clean restart could be turned off, I'd also expect the data to be in the backup - in that case only. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
On 11/17/10 02:55, Josh Berkus wrote: If you do wish to have the data tossed out for no good reason every so often, then there ought to be a separate attribute to control that. I'm really having trouble seeing how such behavior would be desirable enough to ever have the server do it for you, on its terms rather than yours. I don't quite follow you. The purpose of unlogged tables is for data which is disposable in the event of downtime; the classic example is the a user_session_status table. In the event of a restart, all user sessions are going to be invalid anyway. Depends on what you mean by session. Typical web application session data, e.g. for PHP applications which are deployed in *huge* numbers resides directly on file systems, and are not guarded by anything (not even fsyncs). On operating system crash (and I do mean when the whole machine and the OS go down), the most that can happen is that some of those session files get garbled or missing - all the others work perfectly fine when the server is brought back again and the users can continue to work within their sessions. -- *That* is useful session behaviour and it is also useful for logs. The definition of unlogged tables which are deliberately being emptied for no good reason does not seem very useful to me. I'd rather support a (optional) mode (if it can be implemented) in which PostgreSQL scans through these unlogged tables on startup and discards any pages whose checkums don't match, but accepts all others as good enough. Even better: maybe not all pages need to be scanned, only the last few, if there is a chance for any kind of mechanism which can act as checkpoints for data validity. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
On 11/17/10 17:43, A.M. wrote: On Nov 17, 2010, at 11:32 AM, Ivan Voras wrote: On 11/17/10 02:55, Josh Berkus wrote: If you do wish to have the data tossed out for no good reason every so often, then there ought to be a separate attribute to control that. I'm really having trouble seeing how such behavior would be desirable enough to ever have the server do it for you, on its terms rather than yours. I don't quite follow you. The purpose of unlogged tables is for data which is disposable in the event of downtime; the classic example is the a user_session_status table. In the event of a restart, all user sessions are going to be invalid anyway. Depends on what you mean by session. Typical web application session data, e.g. for PHP applications which are deployed in *huge* numbers resides directly on file systems, and are not guarded by anything (not even fsyncs). On operating system crash (and I do mean when the whole machine and the OS go down), the most that can happen is that some of those session files get garbled or missing - all the others work perfectly fine when the server is brought back again and the users can continue to work within their sessions. -- *That* is useful session behaviour and it is also useful for logs. The definition of unlogged tables which are deliberately being emptied for no good reason does not seem very useful to me. I'd rather support a (optional) mode (if it can be implemented) in which PostgreSQL scans through these unlogged tables on startup and discards any pages whose checkums don't match, but accepts all others as good enough. Even better: maybe not all pages need to be scanned, only the last few, if there is a chance for any kind of mechanism which can act as checkpoints for data validity. This is not really a fair feature comparison. With the file-based sessions, the webserver will continue to deal with potentially corrupted sessions, which is worse than dealing with no sessions. I guess it depends on specific use case, but in the common case (i.e. non-mission critical massive deployments) I'd say it's definitely *not* worse than no sessions. Dealing with potential corruption in this case usually means the web application will attempt to deserialize the session data and fail if it's corrupted, leading to a new session being created. I also plan on using this feature for materialized views to replace memcached. Just how large a performance gain is expected from this thing? :) I don't see a mention that fsync will be disabled on unlogged tables (though it makes sense so it probably will be). Having materialized views this way will mean that something - either an application or an external script triggered by database startup - will have to calculate and create this materialized view, which will probably involve massive table scanning all around - I suspect that performance gains from unlogged tables could be hidden by this scanning. Anyway, I'm not arguing against them, I'm arguing for making them more powerful. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] postgresql scalability issue
On 11/08/10 16:33, umut orhan wrote: Hi all, I've collected some interesting results during my experiments which I couldn't figure out the reason behind them and need your assistance. I'm running PostgreSQL 9.0 on a quad-core machine having two level on-chip cache hierarchy. Based on this information, you are most likely running on Intel Xeon 5000-5400 series CPU, right? It probably doesn't matter much since you apparently have only a single socket populated but is a bit old architecture known for its FSB bottleneck to the memory. You should write some details about your hardware: at least CPU model/speed and memory speed, and software (which OS? 32-bit or 64-bit?) PostgreSQL has a large and warmed-up buffer cache thus, no disk I/O is observed during experiments (i.e. for each query buffer cache hit rate is 100%). I'm pinning each query/process to an individual core. Queries are simple read-only queries (only selects). Nested loop (without materialize) is used for the join operator. When I pin a single query to an individual core, its execution time is observed as 111 seconds. This result is my base case. Then, I fire two instances of the same query concurrently and pin them to two different cores separately. However, each execution time becomes 132 seconds in this case. In a similar trend, execution times are increasing for three instances (164 seconds) and four instances (201 seconds) cases too. What I was expecting is a linear improvement in throughput (at least). I tried several different queries and got the same trend at each time. Are you measuring wall-clock execution time for queries in parallel? I.e. start measuring when the first query is started (asynchronously?) and stop when the last one is finished? Did you try the same measurement without pinning? I wonder why execution times of individual queries are increasing when I increase the number of their instances. Btw, I don't think on-chip cache hit/miss rates make a difference since L2 cache misses are decreasing as expected. I'm not an expert in PostgreSQL internals. Maybe there is a lock-contention (spinlocks?) occurring even if the queries are read-only. Anyways, all ideas are welcome. As others said, memory bandwidth is the most likely suspect here. CPUs are unfortunately so much faster than memory and memory buses that they frequently have to wait. Unless PostgreSQL uses the exclusive lock model instead of shared-exclusive, there shouldn't be much contention for the shared buffers. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Table update problem works on MySQL but not Postgres
On 09/01/10 16:13, Igor Neyman wrote: -Original Message- From: Raymond C. Rodgers [mailto:sinful...@gmail.com] Sent: Tuesday, August 31, 2010 7:56 PM To: pgsql-general@postgresql.org Subject: Table update problem works on MySQL but not Postgres update mydemo set cat_order = cat_order + 1 where client_id = 1 and cat_order= 0 in order to insert categories at the top of the sorted list for example. As you can probably guess, this query doesn't work very well. On both MySQL and PostgreSQL I get a constraint violation. That makes sense; I screwed up. What you need for your update to work is deferred unique constraints. I think, this feature appears in 9.0. Yes: http://www.postgresql.org/docs/9.0/static/sql-set-constraints.html Currently, only UNIQUE, PRIMARY KEY, REFERENCES (foreign key), and EXCLUDE constraints are affected by this setting. NOT NULL and CHECK constraints are always checked immediately when a row is inserted or modified (not at the end of the statement). Uniqueness and exclusion constraints that have not been declared DEFERRABLE are also checked immediately. In 8.4 it says: Currently, only foreign key constraints are affected by this setting. Check and unique constraints are always effectively not deferrable. Triggers that are declared as constraint triggers are also affected. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Massively Parallel transactioning?
On 08/19/10 20:19, Benjamin Smith wrote: On Wednesday, August 18, 2010 04:58:08 pm Joshua D. Drake wrote: Well if you are just using it for updates to the schema etc... you should only need to launch a single connection to each database to make those changes. And that's exactly the problem. On each server, we have at least dozens of active databases - one for each client. Opening a connection for each database starts to become problematic. Hmm, let's talk about this problem. If I understand you correctly, you have an arrangement where you have one server which will push updates, and other, database servers, with dozens of databases per server? Let's try to convert this to numbers and assume that dozens of databases per server means 50 and that you have 50 more servers. This means that the server which pushes the updates needs to connect to 2500 databases. This is way to small a number of connections (sockets) from a single client to create problems on the client side. On the other hand, if you have, say, 50 databases per server, this means that the update clients connects 50 times to the same server, which is only a problem if the server has a small total number of connections configured (max_connections) - smaller than 2x the number of databases. Unless you are very careful not to actually exceed this number of connections during normal database use (and depending on what you do this may or may not be possible), it would actually be more benefitial to raise max_connections to a sensible value - e.g. 500 (which would probably need a bump in SEMMNI). -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: What to do if space for database decrease and no additional hard drive is possible?
On 26.7.2010 12:43, AlannY wrote: Hi there. I have a huge database with several tables. Some tables have statistics information. And it's very huge. I don't want to loose any of this data. But hard drives on my single server are not eternal. Very soon, there will be no left space. And the most awful this, that it's a 1U server, and I can't install new hard drive. What can I do to enlarge space, without loosing data and performance? Absolutely nothing quick and easy. In fact, about the only thing you can do which won't cause a (long term) data loss and performance degradation is a full backup, installing bigger drives to replace the old ones, and full restore. Some other ideas which might help you if you don't want to swap drives, but generally require a lot of work and you *will* lose either data or performance: * use a file system which supports compression (NTFS on Windows, ZFS on FreeBSD Solaris, don't know any on Linux) * move unneeded data out from the database and into a separate, compressed data storage format (e.g. move statistical data into gzipped csv or text files or something to that effect) * buy external storage (NAS, or even an external USB drive), move the database to it * use an external data storage service like amazon s3 (actually, this is a bad idea since you will need to completely rewrite your database and application) * decide that you really don't need some of the data and just delete it. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full Text Search dictionary issues
On 07/16/10 02:23, Tom Lane wrote: Howard Rogers h...@diznix.com writes: I have 10 million rows in a table, with full text index created on one of the columns. I submit this query: ims=# select count(*) from search_rm ims-# where to_tsvector('english', textsearch) ims-# @@ to_tsquery('english', 'woman beach ball'); count --- 646 (1 row) Time: 107.570 ms ...and those are excellent times. But if I alter the query to read: ims=# select count(*) from search_rm where to_tsvector('english', textsearch) @@ to_tsquery('english', 'woman beach ftx1'); count --- 38343 (1 row) Time: 640.985 ms ...then, as you see, it slows the query down by a factor of about 6, ... um, but it increased the number of matching rows by a factor of almost 60. I think your complaint of poor scaling is misplaced. This is basically the same question I asked a few days ago and I think the reason for this (mis)expectation of performance comes from expecting tsearch2 to behave like external specialized indexers. In such products, the search result can be returned simply from the index, which can scale fairly well, but PostgreSQL actually has to lookup all the records returned and this is where most time is spent. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Disk performance
On 06/15/10 14:59, Janning wrote: Hi all, as we encountered some limitations of our cheap disk setup, I really would like to see how cheap they are compared to expensive disk setups. We have a 12 GB RAM machine with intel i7-975 and using 3 disks Seagate Barracuda 7200.11, ST31500341AS (1.5 GB) One disk for the system and WAL etc. and one SW RAID-0 with two disks for postgresql data. Now I ran a few test as described in http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm # time sh -c dd if=/dev/zero of=bigfile bs=8k count=300 sync 300+0 records in 300+0 records out 2457600 bytes (25 GB) copied, 276.03 s, 89.0 MB/s real 4m48.658s user 0m0.580s sys 0m51.579s # time dd if=bigfile of=/dev/null bs=8k 300+0 records in 300+0 records out 2457600 bytes (25 GB) copied, 222.841 s, 110 MB/s real 3m42.879s user 0m0.468s sys 0m18.721s The figures are ok if the tests were done on a single drive (i.e. not your RAID-0 array). IMHO it is looking quite fast compared to the values mentioned in the article. What values do you expect with a very expensive setup like many spindles, scsi, raid controller, battery cache etc. How much faster will it be? For start, you are attempting to use RAID-0 with two disks here. This means you have twice as much risk that a drive failure will cause total data loss. In any kind of serious setup this would be the first thing to replace. Of yourse, you can't give me exact results, but I would just like to get a an idea about how much faster an expensive disk setup could be. Would it be like 10% faster, 100% or 1000% faster? If you can give me any hints, I would greatly appreciate it. There is no magic here - scalability of drives can be approximated linearly: a) faster drives: 15,000 RPM drives will be almost exactly 15000/7200 times faster at random access b) more drives: depending on your RAID schema, each parallel drive or drive combination will grow your speed linearly. For example, a 3-drive RAID-0 will be 3/2 times faster than a 2-drive RAID-0. Of course, you would not use RAID-0 anywhere serious. But an 8-drive RAID-10 array will be 8/4=2 times faster than a 4-drive RAID-10 array. Finally, it all depends on your expected load vs budget. If you are unsure of what you want and what you need, but don't expect serious write loads, make a 4-drive RAID-10 array of your cheap 7200 RPM drives, invest in more RAM and don't worry about it. Drive controllers are another issue and there is somewhat more magic here. If the above paragraph describes you well, you probably don't need a RAID controller. There are many different kinds of these with extremely different prices, and many different configuration option so nowadays it isn't practical to think about those until you really need to. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Disk performance
On 15 June 2010 18:22, Janning m...@planwerk6.de wrote: The figures are ok if the tests were done on a single drive (i.e. not your RAID-0 array). Ahh, I meant raid-1, of course. Sorry for this. I tested my raid 1 too and it looks quite the same. Not much difference. This is expected: a RAID-1 array (mirroring) will have the performance of the slowest drive (or a single drive if they are equal). There is no magic here - scalability of drives can be approximated linearly: a) faster drives: 15,000 RPM drives will be almost exactly 15000/7200 times faster at random access ok. (or if you are looking at raw numbers: a 15,000 RPM drive will sustain 15000/60=250 random IOs per second (IOPS); but now you are entering magic territory - depending on the exact type of your load you can get much better results, but not significantly worse). b) more drives: depending on your RAID schema, each parallel drive or drive combination will grow your speed linearly. For example, a 3-drive RAID-0 will be 3/2 times faster than a 2-drive RAID-0. Of course, you would not use RAID-0 anywhere serious. But an 8-drive RAID-10 array will be 8/4=2 times faster than a 4-drive RAID-10 array. So RAID-10 with 4 disks is 2 times faster than a RAID-1, I got it. So as I need much more power I should look for a RAID-10 with 8 or more 15k RPM disks. Yes, if you expect serious write or random IO load. To illustrate: if you are trying to power a generic web site, for example a blog, you can expect that most of your load will be read-only (mostly pageviews) and except if you plan on having a really large site (many authors for example), that your database will largely fit into RAM, so you don't have to invest in disk drives as it will be served from cache. On the other hand, a financial application will do a lot of transactions and you will almost certainly need good storage infrastructure - this is where the 250 IOPS for a 15000 RPM drive estimates come into play. thanks very much for your help. It gave me a good idea of what to do. If you have further recommendations, I would be glad to here them. I can point you to a dedicated mailing list: pgsql-performance @ postgresql.org for questions about performance such as yours. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] tsearch2 dictionaries - possible problem
hello, I think I have a problem with tsearch2 configuration I'm trying to use. I have created a text search configuration as: -- CREATE TEXT SEARCH DICTIONARY hr_ispell ( TEMPLATE = ispell, DictFile = 'hr', AffFile = 'hr', StopWords = 'hr' ); CREATE TEXT SEARCH CONFIGURATION public.ts2hr (COPY=pg_catalog.english); ALTER TEXT SEARCH CONFIGURATION ts2hr ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH hr_ispell; SET default_text_search_config = 'public.ts2hr'; -- and here are some queries: -- cms= select to_tsvector('voras vorasom'); to_tsvector - (1 row) cms= SET default_text_search_config = 'simple'; SET cms= select to_tsvector('voras vorasom'); to_tsvector --- 'voras':1 'vorasom':2 (1 row) cms= SET default_text_search_config = 'ts2hr'; SET cms= select to_tsvector('voras vorasom'); to_tsvector - (1 row) cms= select to_tsvector('kiša kiši'); to_tsvector - 'kiša':1,2 (1 row) -- The good news is that the text search configuration is actually used (the 'kiša kiši') example but apparently on an uncommon word, to_tsvector() returns nothing (the 'voras vorasom' example). Is there something wrong in the configuration? I would definitely not want unknown words to be ignored. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search on a complex schema - a classic problem?
On 05/23/10 18:03, Andy Colson wrote: On 05/22/2010 09:40 PM, Ivan Voras wrote: Hello, I have a schema which tracks various pieces of information which would need to be globally searchable. One approach I came up with to make all of the data searchable is to create a view made of UNION ALL queries that would integrate different tables into a common structure which could be uniformly queried by using tsearch2 functions. This would work, up to the point where it would be practically unavoidable (for performance reasons) to create indexes on this view, which cannot be done. I would like to avoid using a hand-made materialized view (via triggers, etc.) because of administrative overhead and because it would duplicate data, of which there is potentially a lot. I think this looks like a fairly common problem with full text searches on a large-ish schemas, so I'm wondering what are the best practices here, specifically with using tsearch2? I have something like this, but with PostGIS layers. When a person clicks I search all the different layers (each a table) for information. I use a stored proc. Each table has its own index so each table is fast. It also lets me abstract out differences between the layers (I can search each a little differently). If each of your tables had its own full text fields and indexes, then write a stored proc to search them all individually, it should be pretty quick. This looks like an interesting solution. And it could be done generically in our case by having a separate table describing which tables need to be searched and by what fields. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Full text search on a complex schema - a classic problem?
On 05/23/10 07:17, Craig Ringer wrote: On 23/05/10 10:40, Ivan Voras wrote: Hello, I have a schema which tracks various pieces of information which would need to be globally searchable. If systems that exist outside the database its self are acceptable, check out Apache Lucerne, and tools that use it like Hibernate Search. We are currently using such an external system and while it works fast enough it has two problems: 1) the need to periodically refresh it (via cron) 2) it complicates deployment a bit by adding dependencies so we're moving away from it. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Full text search on a complex schema - a classic problem?
Hello, I have a schema which tracks various pieces of information which would need to be globally searchable. One approach I came up with to make all of the data searchable is to create a view made of UNION ALL queries that would integrate different tables into a common structure which could be uniformly queried by using tsearch2 functions. This would work, up to the point where it would be practically unavoidable (for performance reasons) to create indexes on this view, which cannot be done. I would like to avoid using a hand-made materialized view (via triggers, etc.) because of administrative overhead and because it would duplicate data, of which there is potentially a lot. I think this looks like a fairly common problem with full text searches on a large-ish schemas, so I'm wondering what are the best practices here, specifically with using tsearch2? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Authentication method for web app
On 14 May 2010 09:08, Leonardo F m_li...@yahoo.it wrote: Personally I would lean toward making the bulk of security within the application so to simplify everything - the database would do what it does best - store and manipulate data - and the application would be the single point of entry. Protect the servers - keep the applications (like ssh, php, apache, your application) updated and make sure you have good and frequent backups. Thank you for your reply. Anyone else? How do you secure your db used by webapps? Basically what I've said: 1) find all points of entry to the db (i.e. the application), secure them 2) keep the server itself secure (applications patched, firewall enabled with custom rules to protect the db if necessary, only trusted local users etc.) If the app and the db are on different servers, consider a direct (patch cable) link between them or if the structure is more complex a switch, remove them from Internet, possibly make a DMZ, etc. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Authentication method for web app
On 05/13/10 09:21, Leonardo F wrote: Hi all, we're going to deploy a web app that manages users/roles for another application. We want the database to be safe from changes made by malicious users. I guess our options are: 1) have the db listen only on local connections; basically when the machine is accessed the db could be compromised. Hardening the server access is the only true security defense we have. 2) Use, as user/password, the same user/password used to enter the web app. Basically there would be a 1 to 1 matching between our app users (and password...) and the db users (with proper permissions...) I'm not a great expert on these things (as you've already guessed...). Can someone help me? As you already figured out, the key is protecting both the application and the db server from intrusions. If anyone gets unauthorized access to either of them, especially if it's root access, any and all security you build on top of that will in all likelihood collapse. Personally I would lean toward making the bulk of security within the application so to simplify everything - the database would do what it does best - store and manipulate data - and the application would be the single point of entry. Protect the servers - keep the applications (like ssh, php, apache, your application) updated and make sure you have good and frequent backups. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] hstore problem with UNION?
I've encountered the following problem: ivoras=# create table htest2(id integer, t hstore); CREATE TABLE ivoras=# create table htest3(id integer, t2 hstore); CREATE TABLE ivoras=# select id, t from htest2 union select id,t2 as t from htest3; ERROR: could not identify an ordering operator for type hstore HINT: Use an explicit ordering operator or modify the query. I think it has something to do with UNION being the type of operation that could, as a variation, include DISTINCT, which would require sorting, but still... UNION by itself doesn't. How to get around this? I really don't care how hstores get sorted and more, would like to avoid sorting them at all as they could get big. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] hstore problem with UNION?
On 05/10/10 14:10, Jayadevan M wrote: When we do a union, the database has to get rid of duplicates and get distinct values. To achieve this, probably it does a sort. Just guesswork You are right, it looks like I have inverted the logic of UNION and UNION ALL - I actually needed UNION ALL here, thanks! -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] File compression in WinXP
On 2.5.2010 16:48, pasman pasmański wrote: Hello. I'm install postgresql 8.4.3 on WinXPsp3. Because of small disk i create tablespace pointing to commpressed folder and move some tables to it. Compression is good: 10GB to 3-4GB speed acceptable (small activity,10 users) But is this safe ? Microsoft guarantees your safety here, there should be no difference in behaviour. Of course, memory and other resource requirements are still dependent on the uncompressed size. Your performance will probably be less than without compression. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Upcoming hot standby replication question
Hello, I'd like to ask about the asynchronous nature of upcoming replication implementation in 9.0 - what guarantees does it give with regards to delays and latency? E.g. do COMMITs finish and return to the caller before or after the data is sent to the slave? (being asynchronous, they probably don't wait for the other side's confirmation, right?). -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Upcoming hot standby replication question
On 9 April 2010 18:21, Greg Smith g...@2ndquadrant.com wrote: Ivan Voras wrote: I'd like to ask about the asynchronous nature of upcoming replication implementation in 9.0 - what guarantees does it give with regards to delays and latency? E.g. do COMMITs finish and return to the caller before or after the data is sent to the slave? (being asynchronous, they probably don't wait for the other side's confirmation, right?). Exactly--synchronous replication, the only way to enforce that data is on the slave before completing the COMMIT, was postponed from this release. It should make it into 9.1 as an option, but it will always be expensive to turn on. What is in 9.0 is eventual consistency. If your slave is keeping up with traffic being sent by the master, it should receive each incremental commit shortly after it's made. In practice, slaves should only lag some number of seconds behind the master. But there are zero guarantees that will be the case, or that latency will be bounded at all. Recommended practice is to carefully monitor how much latency lag there is on the standby and trigger alerts if it exceed your expectations. Ok, but I imagine there should be a difference between COMMITs returning before or after the actual data is sent over the network (though admittedly socket buffering could make it hard to distinguish). In addition to that, how about a compromise: COMMITs returning when the remote side ACKs acceptance but before it passes data to storage? Just thinking out lout. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Re: What locking mechanism is used for database backup and restore and Master-Slave Replication?
On 01/21/10 16:09, John Mitchell wrote: So am I to presume that the current stable version of postgres (before 8.5) does require extra locking? There is currently (before 8.5) no official replication mechanism in PostgreSQL. There are some 3rd party implementations, for which information can be gathered here: http://www.postgresql.org/docs/current/interactive/high-availability.html http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling 2010/1/21 Grzegorz Jaśkiewicz gryz...@gmail.com mailto:gryz...@gmail.com On Thu, Jan 21, 2010 at 1:12 PM, John Mitchell mitchellj...@gmail.com mailto:mitchellj...@gmail.com wrote: Hi, In reading the documentation it states that the SQL dump backup does not block other operations on the database while it is working. yes, pg_dump opens serializable transaction thus guarantees data to be the exact snapshot (as opposed to the default isolation level, which is called 'read commited' not without reason). I presume that while a restore is occurring that no reads or updates are allowed against the restored database. nope, what restoring does, is just running all the commands in the pg_dump (whether it is binary or textual). So as soon as the database is created, it is treated just as any connection, thus allows you to connect and use it. What locking mechanism is used for Master-Slave Replication? master slave that's introduced in what's to be 9.0 (aka 8.5), uses WAL shipping. So it doesn't require any extra locking. -- GJ -- John J. Mitchell -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Backup strategies with significant bytea data
Leigh Dyer wrote: Hi, For years now I've simply backed up my databases by doing a nightly pg_dump, but since we added the ability for users to import binary files in to our application, which are stored in a bytea fields, the dump sizes have gone through the roof — even with gzip compression, they're significantly larger than the on-disk size of the database. My guess is that this due to the way that the binary data from the bytea fields is encoded in the dump file when it's produced. Have you tried another dump format? E.g. -F c argument to pg_dump? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Something like Oracle Forms, but Open Source to use with PostgreSQL?
John R Pierce wrote: Andre Lopes wrote: Hi, I need to know if there is something like Oracle Forms in the Open Source world that works with PostgreSQL. If do you know something, please let me know. perhaps OpenOffice Data could do what you need. I'm not real familiar with Oracle Forms, but I know OOo Data can connect to most any database including postgres, and it can be used to build 'forms' based database applications with queries and reports...its somewhat akin to Microsoft Access in basic functionality. Only in the sense that stone tablets are akin to e-ink. OOo Base is buggy and unusable. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SELECTing every Nth record for better performance
A. Kretschmer wrote: In response to Tom : I have a big table that is used for datalogging. I'm designing graphing interface that will visualise the data. When the user is looking at a small daterange I want the database to be queried for all records, but when the user is 'zoomed out', looking at an overview, I want run a query that skips every nth record and returns a managable dataset that still gives a correct overview of the data without slowing the programme down. Is there an easy way to do this that I have overlooked? I looked at: Do you have 8.4? If yes: test=# create table data as select s as s from generate_Series(1,1000) s; SELECT test=*# select s from (select *, row_number() over (order by s) from data) foo where row_number % 3 = 0 limit 10; Won't this still read in the entire table and only then filter the records out? -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] PREPARE query with IN?
Is it possible to prepare a query with the IN clause in a meaningful way? I could probably do it with a hard-coded number of arguments, like SELECT x FROM t WHERE y IN ($1, $2, $3) but that kind of misses the point of using IN for my needs. In any case, it would probably be a good idea to add a sentence about it on http://www.postgresql.org/docs/8.3/interactive/sql-prepare.html since I'm probably not the first one to ask about it :) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] PREPARE query with IN?
Filip Rembiałkowski wrote: fi...@filip=# prepare sth(int[]) as select * from ids where id = ANY($1); PREPARE fi...@filip=# execute sth('{1,2,3}'); id | t +--- 1 | eenie 2 | menie 3 | moe (3 rows) Thanks! 2009/8/6 Ivan Voras ivo...@freebsd.org mailto:ivo...@freebsd.org Is it possible to prepare a query with the IN clause in a meaningful way? I could probably do it with a hard-coded number of arguments, like SELECT x FROM t WHERE y IN ($1, $2, $3) but that kind of misses the point of using IN for my needs. In any case, it would probably be a good idea to add a sentence about it on http://www.postgresql.org/docs/8.3/interactive/sql-prepare.html since I'm probably not the first one to ask about it :) -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org mailto:pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Filip Rembiałkowski JID,mailto:filip.rembialkow...@gmail.com mailto:filip.rembialkow...@gmail.com http://filip.rembialkowski.net/ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Are there plans to add data compression feature to postgresql?
Peter Eisentraut wrote: Craig Ringer wrote: So - it's potentially even worth compressing the wire protocol for use on a 100 megabit LAN if a lightweight scheme like LZO can be used. LZO is under the GPL though. But liblzf is BSD-style. http://www.goof.com/pcg/marc/liblzf.html signature.asc Description: OpenPGP digital signature
Re: [GENERAL] PostgreSQL vs FreeBSD 7.0 as regular user
Greg Smith wrote: On Sat, 26 Jul 2008, Zoltan Boszormenyi wrote: Zoltan Boszormenyi �rta: These three settings were also set with sysctl -w ... to take effect immediately. Rebooting FreeBSD solved it. Huh? Is it really like W#@$#? Looks like the PostgreSQL documentation here ( http://www.postgresql.org/docs/current/static/kernel-resources.html ) is now outdated. From http://www.manpages.info/freebsd/sysctl.8.html : The -w option has been deprecated and is silently ignored. It's not ignored as in the whole command will have no effect. Only the switch itself is ignored. So, sysctl -w a.b=c is 100% equivalent to sysctl a.b=c Looks like the correct thing to do here now is to edit the /etc/sysctl.conf file, then issue: /etc/rc.d/sysctl reload to use those values without needing to reboot. (I don't actually have such a machine to check for sure). Yes, that will work. Sysctl.conf will not reload itself :) signature.asc Description: OpenPGP digital signature
[GENERAL] Simple row serialization?
Hi, I'd like to implement some simple data logging via triggers on a small number of infrequently updated tables and I'm wondering if there are some helpful functions, plugins or idioms that would serialize a row (received for example in a AFTER INSERT trigger) into a string that I'd store in the log table. There's a limited number of field types involved: varchars, integers and booleans. I'm not looking for anything fancy, comma-separated string result would be just fine; even better, something like a dictionary (field_name:field_value,...) would be nice. The reason for trying to do it this way is that want to have a single log table to log many tables (again, they are infrequently updated). I need this for PostgreSQL 8.1. I got suggestions to try composite types but I don't think they could be useful for this. What I need is possibly a generic row type (any and record generate syntax error in CREATE TABLE) - any ideas on where to start looking? signature.asc Description: OpenPGP digital signature
Re: [GENERAL] Simple row serialization?
Adam Rich wrote: I'd like to implement some simple data logging via triggers on a small number of infrequently updated tables and I'm wondering if there are some helpful functions, plugins or idioms that would serialize a row If you're familiar with perl, you can try PL/Perl. Thanks, but another solution has been suggested to me, much simpler: create or replace function data_log() returns trigger as $$ declare sdata text; begin sdata = new; insert into log(data) values (sdata); return NULL; end; $$ language plpgsql; create trigger data_insert after insert on data for each row execute procedure data_log(); (from idea by Tom Lane) signature.asc Description: OpenPGP digital signature
[GENERAL] Recovering / undoing transactions?
Hi, About a month or so ago I read a blog entry or an article which seems to have described a method, using dirty hackery with pg_resetxlog and possibly other tools, to forcibly undo the database to a previous state. The problem described was that some employee had executed a DELETE or UPDATE without WHERE or something like it in autocommit mode and the goal was to undo it. I can't find the article now so can someone describe the technique here or point to the article? (I'm possibly misremembering important details about the article so the correct answer to my question could be no, it can't be done). ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [GENERAL] Recovering / undoing transactions?
On 07/11/2007, Tom Lane [EMAIL PROTECTED] wrote: It's not really possible to do that. The blogger might've thought he'd accomplished something but I seriously doubt that his database was consistent afterward. You can go back in time using PITR, if you had the foresight and resources to set up continuous archiving, but just whacking pg_xlog around is far from sufficient. Ok, just to verify I'm thinking about it in the right way: in abstract, with PITR, I would need a known-good starting point (e.g. a full backup) + files from pg_xlog created from the time of the starting-point, then restore the starting-point backup and then restore from PITR/xlog up to the point I want? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq