from:"Ivan Voras"

Re: [GENERAL] Roles inherited from a role which is the owner of a database can drop it?

2017-10-31 Thread Ivan Voras

Hello,

On 30 October 2017 at 22:10, David G. Johnston <david.g.johns...@gmail.com>
wrote:

> On Mon, Oct 30, 2017 at 12:25 PM, Ivan Voras <ivo...@gmail.com> wrote:
>
>>
>> 3. But they do log in with "developer" roles which are inherited from the
>> owner role.
>>
>> [...]
>
>> I've tried it on a dummy database and it apparently works as described
>> here. Is this by design?
>>
>>
> Not quite following but ownership is an inheritable permission;
>

Basically, I'm asking if "ownership" can be revoked from the set of
inherited permissions? If there is a role G which is granted to role A, and
G is the owner of a database, can A be made to not be able to do what only
owners can (specifically in this case, drop databases)?



> and even if it was not SET ROLE is all that would be required.  Any owner
> can drop an object that it owns.
>
>

It's kind of the reverse: I'm wondering if ownership can be made
un-inheritable.



>
> What are the best practices for this sort of scenario where there is a
>> single owner of all the schema (which is large), where developers need
>> access to everything but cannot do something as drastic as dropping the dbs
>> (and possibly tables)?
>>
>
> Don't let developers into production databases...
>
> Trusted people (and/or software) should be provided membership into
> ownership groups.  Developers should provide these people/programs with
> vetted scripts to execute against production.  Developers can do whatever
> they want on their local database instance with full schema-modifying
> privileges.
>
> "developers need access to everything" - there is a lot of nuance and
> detail behind that fragment that is needed if one is going to develop a
> data access and change management policy.
>

Just considering the case of dropping databases for now. I.e. let the
developers do everything except that. It's a start.

[GENERAL] Roles inherited from a role which is the owner of a database can drop it?

2017-10-30 Thread Ivan Voras

Hello,

I just want to verify that what I'm observing is true, and if it is, I'd
like to know how to avoid it:

1. There are databases owned by a certain role which is a superuser
2. Nobody logs in with the superuser role unless necessary
3. But they do log in with "developer" roles which are inherited from the
owner role. These developer roles are not superusers themselves, but have
the CREATEDB flag
4. The developer roles can still drop the databases.

I've tried it on a dummy database and it apparently works as described
here. Is this by design?

If it is, is there a way to prevent the developer roles from dropping the
databases?

What are the best practices for this sort of scenario where there is a
single owner of all the schema (which is large), where developers need
access to everything but cannot do something as drastic as dropping the dbs
(and possibly tables)?

[GENERAL] Writing on replicas?

2017-09-19 Thread Ivan Voras

Hello,

I have a possibly unusual case, I've asked about it before on this list.

There is a bunch of reporting being done regularly on some large databases,
which interfere with daily operations performance-wise. So one option is to
have hot async replication in place to a secondary server where the reports
could be run and not impact the main server.

The reporting code is *almost* read-only, with the major exception being
that it creates temp tables for intermediate results. The main tables are
not written to, just the temp tables.

Some time ago when I've asked what to do in this situation, the answer was
that maybe a future version could allow this setup to work. So, I'm asking
again: was something changed in 10.0 which would allow it?

[GENERAL] DROP INDEX CASCADE doesn't want to drop unique constraints?

2017-04-24 Thread Ivan Voras

Hello,

On trying to drop an index named "employer_employerid_key" which supports a
unique constraint:

"employer_employerid_key" UNIQUE CONSTRAINT, btree (employerid)

I get this error:

ERROR:  cannot drop index employer_employerid_key because constraint
employer_employerid_key on table employer requires it
HINT:  You can drop constraint employer_employerid_key on table employer
instead.

I'm using the CASCADE and IF EXISTS arguments and the docs say nothing
about any special cases (
https://www.postgresql.org/docs/9.3/static/sql-dropindex.html). This is
with PostgreSQL 9.3.15.

The actual command is:

drop index if exists employer_employerid_key cascade;

Any ideas if this is normal or why it happens?

Re: [GENERAL] Making a unique constraint deferrable?

2017-02-28 Thread Ivan Voras

On 28 February 2017 at 18:03, David G. Johnston <david.g.johns...@gmail.com>
wrote:

> On Tue, Feb 28, 2017 at 9:50 AM, Ivan Voras <ivo...@gmail.com> wrote:
>
>>
>> ivoras=# alter table foo alter constraint "foo_a_b_key" deferrable;
>> ERROR:  constraint "foo_a_b_key" of relation "foo" is not a foreign key
>> constraint
>>
>>
> A more clear error message would be:
>
> EROR:  cannot alter non-foreign key constraint "foo_a_b_key" of relation
> "foo"
>
> Though I'm not sure how that meshes with the error message style guides...
>


Any idea what underlying technical reason prohibits marking non-fk
constraints as deferrable?

[GENERAL] Making a unique constraint deferrable?

2017-02-28 Thread Ivan Voras

Hello,

If I'm interpreting the manual correctly, this should work:

ivoras=# create table foo(a integer, b integer, unique(a,b));
CREATE TABLE
ivoras=# \d foo
  Table "public.foo"
 Column |  Type   | Modifiers
+-+---
 a  | integer |
 b  | integer |
Indexes:
"foo_a_b_key" UNIQUE CONSTRAINT, btree (a, b)

ivoras=# insert into foo(a,b) values(1,2);
INSERT 0 1
ivoras=# insert into foo(a,b) values(1,2);
ERROR:  duplicate key value violates unique constraint "foo_a_b_key"
DETAIL:  Key (a, b)=(1, 2) already exists.
ivoras=# alter table foo alter constraint "foo_a_b_key" deferrable;
ERROR:  constraint "foo_a_b_key" of relation "foo" is not a foreign key
constraint

The manual says this for SET CONSTRAINTS:

Currently, only UNIQUE, PRIMARY KEY, REFERENCES (foreign key), and EXCLUDE
constraints are affected by this setting. NOT NULL and CHECK constraints
are always checked immediately when a row is inserted or modified (not at
the end of the statement). Uniqueness and exclusion constraints that have
not been declared DEFERRABLE are also checked immediately.


I'm puzzled by the "...is not a foreign key constraint" error message.
Doesn't "deferrable" also work on unique constraints?

[GENERAL] Foreign key references a unique index instead of a primary key

2017-02-23 Thread Ivan Voras

Hello,

I've inherited a situation where:

   - a table has both a primary key and a unique index on the same field.
   - at some time, a foreign key was added which references this table
   (actually, I'm not sure about the sequence of events), which has ended up
   referencing the unique index instead of the primary key.

Now, when I've tried dropping the unique index, I get an error that the
foreign key references this index (with a hint I use DROP...CASCADE).

This drop index is a part of an automated plpgsql script which deletes
duplicate indexes, so I'm interested in two things:

   1. How to detect if a foreign key depends on an index I'm about to drop,
   so I can skip it
   2. Is there a way to get around this situation, maybe modify the
   pg_constraint table or other tables to reference the index / primary key I
   want

?

This is on PostgreSQL 9.3.

[GENERAL] Enhancement proposal for psql: add a column to "\di+" to show index type

2016-05-21 Thread Ivan Voras

Hi,

As it says in the subject: if any developer is interested in doing so, I
think it is useful to see the index type in "\di+" output. The new column
could be named "using" to reflect the SQL statement. It would contain
entries such as "btree", "BRIN", "GIN", etc.

[GENERAL] Streaming replication and slave-local temp tables

2016-03-09 Thread Ivan Voras

Hello,

Is it possible (or will it be possible) to issue CREATE TEMP TABLE
statements on the read-only slave nodes in master-slave streaming
replication in recent version of PostgreSQL (9.4+)?

Re: [GENERAL] Catalog bloat (again)

2016-02-10 Thread Ivan Voras

As a follow-up, here's a portion of the nightly vacuum's logs, just want to
confirm if my conclusions are right:

INFO:  vacuuming "pg_catalog.pg_attribute"

INFO:  scanned index "pg_attribute_relid_attnam_index" to remove 3014172
row versions
DETAIL:  CPU 0.20s/1.08u sec elapsed 3.72 sec.
INFO:  scanned index "pg_attribute_relid_attnum_index" to remove 3014172
row versions
DETAIL:  CPU 0.14s/0.89u sec elapsed 1.70 sec.

INFO:  "pg_attribute": removed 3014172 row versions in 52768 pages
DETAIL:  CPU 0.31s/0.30u sec elapsed 1.15 sec.

INFO:  index "pg_attribute_relid_attnam_index" now contains 19578 row
versions in 45817 pages
DETAIL:  3013689 index row versions were removed.
45668 index pages have been deleted, 34116 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  index "pg_attribute_relid_attnum_index" now contains 19578 row
versions in 32554 pages
DETAIL:  3010630 index row versions were removed.
32462 index pages have been deleted, 24239 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO:  "pg_attribute": found 2278389 removable, 17319 nonremovable row
versions in 52856 out of 57409 pages
DETAIL:  298 dead row versions cannot be removed yet.
There were 641330 unused item pointers.
0 pages are entirely empty.
CPU 1.44s/2.88u sec elapsed 10.55 sec.

INFO:  "pg_attribute": stopping truncate due to conflicting lock request
INFO:  analyzing "pg_catalog.pg_attribute"
INFO:  "pg_attribute": scanned 3 of 57409 pages, containing 10301 live
rows and 0 dead rows; 10301 rows in sample, 24472 estimated total rows

So,

   - About 3 million rows churned in the table and its two indexes (i.e.
   dead rows which vacuum found and marked) - that's per day since this is a
   nightly operation.
   - After the vacuum, the indexes are left with 19578 rows in (for the
   first one) in 45817 pages. That's a lot of empty pages, which should be
   reused the next day, together with free space in partially filled tables,
   right?
   - Since pages are 8 KiB, 46,000 pages is about 360 MiB of space - that's
   how large just one of the pg_attribute table's indexes is after the vacuum.
   Altogether, it's more than 1 GB.
   - The "stopping truncate..." message is not really significant, it would
   have shortened the data files from the end if there are empty pages at the
   end, which in this case there isn't a significant number of. The truncation
   can probably never happen on system tables like these since they are always
   used...?

The real question is: why is the total size (i.e. the number of pages)
growing at an alarming rate? On one of the db's, we're seeing almost
doubling in size each week. Is the internal fragmentation of the data files
so significant?

Ok, a couple more questions:

   1. How come "0 pages are entirely empty" if there are 17319 rows spread
   around in 52856 pages?
   2. What are "unused item pointers"?

(I agree with your previous suggestions, will see if they can be
implemented).

On 28 January 2016 at 00:13, Bill Moran <wmo...@potentialtech.com> wrote:

> On Wed, 27 Jan 2016 23:54:37 +0100
> Ivan Voras <ivo...@gmail.com> wrote:
> >
> > I've done my Googling, and it looks like this is a fairly common problem.
> > In my case, there's a collection of hundreds of databases (10 GB+) with
> > apps which are pretty much designed (a long time ago) with heavy use of
> > temp tables - so a non-trivial system.
> >
> > The databases are vacuumed (not-full) daily, from cron
>
> Vacuuming once a day is seldom often enough, except on very quiet
> databases.
>
> > (autovacuum was
> > turned off some time ago for performance reasons), and still their size
> > increases unexpectedly. By using some of the queries floating around on
> the
> > wiki and stackoverflow[*], I've discovered that the bloat is not, as was
> > assumed, in the user tables, but in the system tables, mostly in
> > pg_attributes and pg_class.
>
> The size increase isn't really unexpected. If you're only vacuuming once
> per day, it's very easy for activity to cause active tables to bloat quite
> a bit.
>
> > This is becoming a serious problem, as I've seen instances of these
> tables
> > grow to 6 GB+ (on a 15 GB total database), while still effectively
> > containing on the order of 10.000 records or so. This is quite abnormal.
> >
> > For blocking reasons, we'd like to avoid vacuum fulls on these tables (as
> > it seems like touching them will lock up everything else).
>
> It will. But to get them back down to a reasonable size, you're going to
> have to do a VACUUM FULL at least _once_. If you retune things correctly,
> you shouldn't need any more FULLs after that 1 time.
>
> > So

[GENERAL] Catalog bloat (again)

2016-01-27 Thread Ivan Voras

Hi,

I've done my Googling, and it looks like this is a fairly common problem.
In my case, there's a collection of hundreds of databases (10 GB+) with
apps which are pretty much designed (a long time ago) with heavy use of
temp tables - so a non-trivial system.

The databases are vacuumed (not-full) daily, from cron (autovacuum was
turned off some time ago for performance reasons), and still their size
increases unexpectedly. By using some of the queries floating around on the
wiki and stackoverflow[*], I've discovered that the bloat is not, as was
assumed, in the user tables, but in the system tables, mostly in
pg_attributes and pg_class.

This is becoming a serious problem, as I've seen instances of these tables
grow to 6 GB+ (on a 15 GB total database), while still effectively
containing on the order of 10.000 records or so. This is quite abnormal.

For blocking reasons, we'd like to avoid vacuum fulls on these tables (as
it seems like touching them will lock up everything else).

So, question #1: WTF? How could this happen, on a regularly vacuumed
system? Shouldn't the space be reused, at least after a VACUUM? The issue
here is not the absolute existence of the bloat space, it's that it's
constantly growing for *system* tables.

Question #2: What can be done about it?

This is PostgreSQL 9.3, migrating soon to 9.4.

[*] https://wiki.postgresql.org/wiki/Show_database_bloat ,
http://stackoverflow.com/questions/13931989/postgresql-automating-vacuum-full-for-bloated-tables

Re: [GENERAL] Catalog bloat (again)

2016-01-27 Thread Ivan Voras

On 28 January 2016 at 00:13, Bill Moran <wmo...@potentialtech.com> wrote:

> On Wed, 27 Jan 2016 23:54:37 +0100
> Ivan Voras <ivo...@gmail.com> wrote:
>
> > So, question #1: WTF? How could this happen, on a regularly vacuumed
> > system? Shouldn't the space be reused, at least after a VACUUM? The issue
> > here is not the absolute existence of the bloat space, it's that it's
> > constantly growing for *system* tables.
>
> With a lot of activity, once a day probably isn't regular enough.
>
>
I sort of see what you are saying. I'm curious, though, what goes wrong
with the following list of expectations:

   1. Day-to-day load is approximately the same
   2. So, at the end of the first day there will be some amount of bloat
   3. Vacuum will mark that space re-usable
   4. Within the next day, this space will actually be re-used
   5. ... so the bloat won't grow.

Basically, I'm wondering why is it growing after vacuums, not why it exists
in the first place?

Re: [GENERAL] CLOB BLOB limitations in PostgreSQL

2014-04-14 Thread Ivan Voras

On 11/04/2014 16:45, Jack.O'sulli...@tessella.com wrote:

 With point two, does this mean that any table with a bytea datatype is
 limited to 4 billion rows (which would seem in conflict with the
 unlimited rows shown by http://www.postgresql.org/about)? If we had
 rows where the bytea was a null entry would they contribute towards
 this total or is it 4 billion non-null entries?

This seems strange. A core developer should confirm this but it doesn't
make much sense - bytea fields are stored the same as text fields
(including varchar etc), i.e. the varlena internal representation, so
having the limit you are talking about would mean that any non-trivial
table with long-ish text fields would be limited to 2^32 entries...



signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] encrypting data stored in PostgreSQL

2014-04-14 Thread Ivan Voras

On 09/04/2014 22:40, CS_DBA wrote:
 Hi All;
 
 We have a client with this requirement:
 
 At rest data must be encrypted with a unique client key
 
 Any thoughts on how to pull this off for PostgreSQL stored data?

Some time ago I did this, mostly as an experiment but IIRC it works
decently:
https://bitbucket.org/ivoras/pgenctypes




signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] how is text-equality handled in postgresql?

2014-01-15 Thread Ivan Voras

On 15/01/2014 10:10, Gábor Farkas wrote:
 hi,
 
 when i create an unique-constraint on a varchar field, how exactly
 does postgresql compare the texts? i'm asking because in UNICODE there
 are a lot of complexities about this..
 
 or in other words, when are two varchars equal in postgres? when their
 bytes are? or some algorithm is applied?

By default, it is whatever the operating system thinks it's right.
PostgreSQL doesn't have its own collation code, it uses the OS's locale
support for this.

(which breaks on certain systems which don't have complete UTF-8 support
- I'm in favour of importing ICU at least as an optional dependancy,
similar to what the FreeBSD's patch does:
http://people.freebsd.org/~girgen/postgresql-icu/).



signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] how is text-equality handled in postgresql?

2014-01-15 Thread Ivan Voras

On 15/01/2014 12:36, Amit Langote wrote:
 On Wed, Jan 15, 2014 at 7:39 PM, Ivan Voras ivo...@freebsd.org wrote:
 On 15/01/2014 10:10, Gábor Farkas wrote:
 hi,

 when i create an unique-constraint on a varchar field, how exactly
 does postgresql compare the texts? i'm asking because in UNICODE there
 are a lot of complexities about this..

 or in other words, when are two varchars equal in postgres? when their
 bytes are? or some algorithm is applied?

 By default, it is whatever the operating system thinks it's right.
 PostgreSQL doesn't have its own collation code, it uses the OS's locale
 support for this.

 
 Just to add to this, whenever strcoll() (a locale aware comparator)
 says two strings are equal, postgres re-compares them using strcmp().
 See following code snippet off
 src/backend/utils/adt/varlena.c:varstr_cmp() -

 /*
  * In some locales strcoll() can claim that
 nonidentical strings are
  * equal.  Believing that would be bad news for a
 number of reasons,
  * so we follow Perl's lead and sort equal strings
 according to
  * strcmp().
  */
 if (result == 0)
 result = strcmp(a1p, a2p);

That seems odd and inefficient. Why would it be necessary? I would think
indexing (and other collation-sensitive operations) don't care what the
actual collation result is for arbitrary blobs of strings, as long as
they are stable?



signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] how is text-equality handled in postgresql?

2014-01-15 Thread Ivan Voras

On 15/01/2014 13:29, Amit Langote wrote:
On Wed, Jan 15, 2014 at 9:02 PM, Ivan Voras ivo...@freebsd.org wrote:
On 15/01/2014 12:36, Amit Langote wrote:

* In some locales strcoll() can claim that
nonidentical strings are
* equal. Believing that would be bad news for a
number of reasons,
* so we follow Perl's lead and sort equal strings
according to
* strcmp().
*/
if (result == 0)
result = strcmp(a1p, a2p);

That seems odd and inefficient. Why would it be necessary? I would think
indexing (and other collation-sensitive operations) don't care what the
actual collation result is for arbitrary blobs of strings, as long as
they are stable?

This is the behavior since quite some time introduced by this commit

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=656beff59033ccc5261a615802e1a85da68e8fad

Ok, the commit comment is:

Adjust string comparison so that only bitwise-equal strings are considered
equal: if strcoll claims two strings are equal, check it with strcmp, and
sort according to strcmp if not identical. This fixes inconsistent
behavior under glibc's hu_HU locale, and probably under some other locales
as well. Also, take advantage of the now-well-defined behavior to speed up
texteq, textne, bpchareq, bpcharne: they may as well just do a bitwise
comparison and not bother with strcoll at all.

... so it's just another workaround for OS specific locale issues - to
me it looks like just another reason to use ICU.

signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] Consistent file-level backup of pg data directory

2014-01-10 Thread Ivan Voras

On 08/01/2014 16:09, gator...@yahoo.de wrote:

 For machines running database systems, this means, this means,
 that I need some way to get a consistent state of some point in
 time. It does not particularly matter, which time exactly (in

 Unfortunately, it does not look like there is any direct way to
 accomplish this with postgres except shutting down the whole
 database system while the backup is running. The systems will

You could use a file system which supports snapshots (I don't know which
Linux FS's do, I've read that LVM can be used to simulate those) to get
an exact point-in-time backup which will use the database's usual
resilience to be stable enough for restoring.




signature.asc
Description: OpenPGP digital signature

[GENERAL] Recheck conditions on indexes

2013-10-25 Thread Ivan Voras

Hi,

I'm just wondering: in the execution plan such as this one, is the
Recheck Cond phase what it apparently looks like: an additional check
on the data returned by indexes, and why is it necessary? I would have
though that indexes are accurate enough?

cms= explain analyze select * from users where
other_ids-'OIB'='70328909364' or code='0023017009';
QUERY PLAN

--
 Bitmap Heap Scan on users  (cost=8.52..39.21 rows=10 width=330) (actual
time=0.042..0.044 rows=2 loops=1)
   Recheck Cond: (((other_ids - 'OIB'::text) = '70328909364'::text) OR
((code)::text = '0023017009'::text))
   -  BitmapOr  (cost=8.52..8.52 rows=10 width=0) (actual
time=0.035..0.035 rows=0 loops=1)
 -  Bitmap Index Scan on users_other_ids_oib  (cost=0.00..4.26
rows=9 width=0) (actual time=0.023..0.023 rows=1 loops=1)
   Index Cond: ((other_ids - 'OIB'::text) =
'70328909364'::text)
 -  Bitmap Index Scan on users_code  (cost=0.00..4.26 rows=1
width=0) (actual time=0.012..0.012 rows=1 loops=1)
   Index Cond: ((code)::text = '0023017009'::text)
 Total runtime: 0.082 ms
(8 rows)

Both indexes are plain btrees, the first one is on the expression on the
hstore field (other_ids-'OIB') and the second one on a plain text
field. Also, why is it using the Bitmap Index Scan in both cases? A
plain query for code='foo' uses a plain index scan.

This is PostgreSQL 9.1.





signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] Recheck conditions on indexes

2013-10-25 Thread Ivan Voras

On 25/10/2013 11:06, Albe Laurenz wrote:

 Just because there is an entry in the index does not imply that the
 corresponding table entry is visible for this transaction.
 To ascertain that, the table row itself has to be checked.

Understood.

 PostgreSQL 9.2 introduced index only scan which avoids that
 additional step if it is safe to do so.

It doesn't help in this case - the plan for the same query on a copy of
the database on 9.3 is exactly the same.



signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] Major upgrade of PostgreSQL and MySQL

2013-09-13 Thread Ivan Voras

On 12/09/2013 18:16, Karl Denninger wrote:
 
 On 9/12/2013 11:11 AM, Patrick Dung wrote:
 While reading some manual of PostgreSQL and MySQL (eg.
 http://dev.mysql.com/doc/refman/5.1/en/upgrading-from-previous-series.html).

 I have found that MySQL has stated many incompatibilities and know
 issues (a long list) in the MySQL version 5.0, 5.1, 5.5, 5.6 and 5.7.

 For PostgreSQL, it seems I can't find the list (it just say see the
 Appendix E / release notes).
 I think it is a plus for PostgreSQL if it has few incompatibilities
 between major versions.

 By the way, for in-place major version upgrade (not dumping DB and
 import again), MySQL is doing a better job in here.

 Please share your thought, thanks.

 pg_upgrade will do an in-place upgrade if you wish.  It is somewhat
 risky if not done using a COPY (it can either copy or not, as you wish)
 but it's considerably faster than a dump/restore and is in-place.
 
 I use it regularly.

If I read the documentation correctly
(http://www.postgresql.org/docs/9.3/static/pgupgrade.html), it needs
oldbindir and newbindir arguments pointing to the directories of
PostgreSQL executables for the old and new versions, making it basically
unusable for upgrading systems which are maintained with packages
instead of individually compiling  installing custom versions of
PostgreSQL, right? (except possibly Debian which may allow multiple pg
versions to be installed, I haven't tried it).




signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] Major upgrade of PostgreSQL and MySQL

2013-09-13 Thread Ivan Voras

On 13 September 2013 21:44, Patrick Dung patrick_...@yahoo.com.hk wrote:

 Ivan Voras has replied that the link method work fine in Windows on another
 thread.

That would be very surprising since I don't run Windows servers :)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Trigram (pg_trgm) GIN index not used

2013-02-21 Thread Ivan Voras

Hello,

I have a table with the following structure:

nn= \d documents
 Table public.documents
Column |   Type   |   Modifiers

---+--+
 id| integer  | not null default
nextval('documents_id_seq'::regclass)
 ctime | integer  | not null default unix_ts(now())
 dtime | integer  | not null
 title | citext   | not null
 html_filename | text | not null
 raw_data  | citext   | not null
 fts_data  | tsvector | not null
 tags  | text[]   |
 flags | integer  | not null default 0
 dtype | integer  | not null default 0
Indexes:
documents_pkey PRIMARY KEY, btree (id)
documents_html_filename UNIQUE, btree (html_filename)
documents_raw_data_trgm gin (raw_data gin_trgm_ops)
documents_title_trgm gin (title gin_trgm_ops)

I'd like to use pg_trgm for matching substrings case-insensitively, but
it doesn't seem to use the index:

nn= explain select id,title from documents where raw_data ilike '%zagreb%';
  QUERY PLAN
---
 Seq Scan on documents  (cost=0.00..6648.73 rows=180 width=98)
   Filter: (raw_data ~~* '%zagreb%'::citext)
(2 rows)

nn= explain select id,title from documents where raw_data like '%zagreb%';
  QUERY PLAN
---
 Seq Scan on documents  (cost=0.00..6692.71 rows=181 width=98)
   Filter: (raw_data ~~ '%zagreb%'::citext)
(2 rows)

When I try to create a GIST index as advised by the comment at:
http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html

I get the following error:
ERROR:  index row requires 10488 bytes, maximum size is 8191

What am I doing wrong?



signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] Trigram (pg_trgm) GIN index not used

2013-02-21 Thread Ivan Voras

On 21/02/2013 12:52, Ivan Voras wrote:

 I'd like to use pg_trgm for matching substrings case-insensitively, but
 it doesn't seem to use the index:


As a sort-of followup, the '%' operator kind of works but takes
incredibly long time, and the selectivity estimates are completely wrong:

nn= vacuum analyze documents;
VACUUM
nn= explain select id,title from documents where raw_data % 'zagreb';
   QUERY PLAN

-
 Bitmap Heap Scan on documents  (cost=128.42..330.87 rows=54 width=108)
   Recheck Cond: ((raw_data)::text % 'zagreb'::text)
   -  Bitmap Index Scan on documents_raw_data_trgm  (cost=0.00..128.40
rows=54 width=0)
 Index Cond: ((raw_data)::text % 'zagreb'::text)
(4 rows)

nn= explain analyze select id,title from documents where raw_data %
'zagreb';
   QUERY
PLAN
-
 Bitmap Heap Scan on documents  (cost=128.42..330.87 rows=54 width=108)
(actual time=98750.283..98750.283 rows=0 loops=1)
   Recheck Cond: ((raw_data)::text % 'zagreb'::text)
   -  Bitmap Index Scan on documents_raw_data_trgm  (cost=0.00..128.40
rows=54 width=0) (actual time=26.748..26.748 rows=51874 loops=1)
 Index Cond: ((raw_data)::text % 'zagreb'::text)
 Total runtime: 98750.623 ms
(5 rows)


There is no IO load during this query.



signature.asc
Description: OpenPGP digital signature

[GENERAL] PostgreSQL and a clustered file system

2012-11-12 Thread Ivan Voras

Hello,

Is anyone running PostgreSQL on a clustered file system on Linux? By
clustered I actually mean shared, such that the same storage is
mounted by different servers at the same time (of course, only one
instance of PostgreSQL on only one server can be running on such a
setup, and there are a lot of other precautions that need to be satisfied).




signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

2012-10-05 Thread Ivan Voras

On 5 October 2012 04:53, Moshe Jacobson mo...@neadwerx.com wrote:
 On Thu, Oct 4, 2012 at 6:12 AM, Ivan Voras ivo...@freebsd.org wrote:

 On 01/10/2012 15:36, Moshe Jacobson wrote:
  I am working on an audit logging trigger that gets called for every row
  inserted, updated or deleted on any table.
  For this, I need to store a couple of temporary session variables such
  as
  the ID of the user performing the change, which can be set at the start
  of
  the session.

 Do you know about session variables?
 The major benefit here is that it doesn't touch the table engines,
 temporary or not.

 This sounds incredibly useful. Why have I not heard of this until today??
 In your example you still had to use a BEGIN...EXCEPTION block. Is that
 faster than a create temp table?

I think I can make a fairly educated guess that catching exceptions
while dealing with session variables should be much, much faster than
creating any kind of a table :)

Besides, from what you said, you will ensure on the app level that the
session variable is set sometime close to when you open a connection
to Pg, so the catch part of the exception block will probably not
run at all.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

2012-10-05 Thread Ivan Voras

On 5 October 2012 15:55, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, Oct 5, 2012 at 3:09 AM, Ivan Voras ivo...@freebsd.org wrote:
 I think I can make a fairly educated guess that catching exceptions
 while dealing with session variables should be much, much faster than
 creating any kind of a table :)

 On the other hand, if the temp table can be completely ditched for a
 session variable or two, then yeah, that would be much better since
 you'd avoid the overhead of creating the table completely.

Yes, this is what I was aiming at, based on the OP mentioning he only
has a limited amount of data to manage in this way.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Re: What's faster? BEGIN ... EXCEPTION or CREATE TEMP TABLE IF NOT EXISTS?

2012-10-04 Thread Ivan Voras

On 01/10/2012 15:36, Moshe Jacobson wrote:
 I am working on an audit logging trigger that gets called for every row
 inserted, updated or deleted on any table.
 For this, I need to store a couple of temporary session variables such as
 the ID of the user performing the change, which can be set at the start of
 the session.
 Until now I have been using a permanent table to store the session
 variables, but it has been difficult to wipe the data properly at the end
 of the session.

Do you know about session variables? I did something similar to what you
are describing and it ended up much simpler than using tables, temporary
or not.

You need to configure them in postgresql.conf, e.g.:

custom_variable_classes = 'myapp'

Then in the application code:

SET myapp.uid = 42;

And in the pl/pgsql function:

CREATE OR REPLACE FUNCTION dblog() RETURNS TRIGGER AS $$
DECLARE
uid INTEGER;
BEGIN
BEGIN
SELECT current_setting('myapp.uid') INTO uid;
EXCEPTION
WHEN undefined_object THEN
uid = null;
WHEN data_exception THEN
uid = null;
END;
...
END;
$$ LANGUAGE plpgsql;

The major benefit here is that it doesn't touch the table engines,
temporary or not.



signature.asc
Description: OpenPGP digital signature

[GENERAL] pg_dump, send/recv

2012-09-19 Thread Ivan Voras

Hello,

Actually I have sort of two questions rolled into one: when creating
custom data types, there's the option to implement *_send() and *_recv()
functions in addition to *_in() and *_out(); does pg_dump use them for
binary dumps, and, if not, what uses them? Are they only an optional
optimization for storing binary data in the database?



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Custom integer-like type

2012-09-14 Thread Ivan Voras

Hello,

I'm creating a new data in C, and everything is proceeding well, except
that the data type should be parsed on input like an integer. Maybe it's
best if I explain it with an example:

Currently, I can do this:

INSERT INTO t(my_data_type) VALUES ('1')

but I cannot do this:

INSERT INTO t(my_data_type) VALUES (1)

My type is defined as:

CREATE TYPE myint (
   internallength = variable,
   input = encbigint_in,
   output = encbigint_out,
   alignment = int4,
   storage = external
);

(The variable length is correct in this case.)



signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] Versioned, chunked documents

2012-04-02 Thread Ivan Voras

On 02/04/2012 01:52, Martin Gregorie wrote:

 BTW, why use document_chunks when a text field can hold megabytes,
 especially if they will be concatenated to form a complete document
 which is then edited as a whole item and before being split into chunks
 and saved back to the database? 

... because neither of your assumptions are true :)

They will not hold megabytes, they will not often be concatenated, the
document is never edited in whole, and I do *not* need version numbers
on the whole document :)




signature.asc
Description: OpenPGP digital signature

[GENERAL] Versioned, chunked documents

2012-04-01 Thread Ivan Voras

Hi,

I have documents which are divided into chunks, so that the (ordered)
concatenation of chunks make the whole document. Each of the chunks may
be edited separately and past versions of the chunks need to be kept.

The structure looks fairly simple:

CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
...
);

CREATE TABLE documents_chunks (
id SERIAL PRIMARY KEY,
ctime TIMESTAMP NOT NULL,
documents_id INTEGER REFERENCES documents(id),
seq INTEGER NOT NULL, -- sequence within the document
content TEXT,
...
);

The first goal is to retrieve the latest version of the whole document,
made from the latest versions of all chunks, but later the goal will
also be to fetch the whole version at some point in time (i.e. with
chunks created before a point in time).

I did the first goal by creating two helper views:

CREATE VIEW documents_chunks_last_version_chunk_ids AS
SELECT documents_id, max(id), seq FROM documents_chunks GROUP BY
documents_id, seq;

CREATE VIEW documents_chunks_last_version_content AS
SELECT documents_chunks.documents_id, content
FROM documents_chunks
JOIN documents_chunks_last_version_chunk_ids ON
documents_chunks.id=documents_chunks_last_version_chunk_ids.max
ORDER BY documents_chunks_last_version_chunk_ids.seq;

There are indexes on the document_chunks fields seq and documents_id.

Everything looked fine until I examined the output of EXPLAIN ANALYZE
and saw this:

db= set enable_seqscan to off;
SET
db= explain analyze select * from documents_chunks_last_version_content
where documents_id=1;

EXPLAIN output given in: http://explain.depesz.com/s/mpY

The query output seems correct on this test case:
db= select * from documents_chunks_last_version_content where
documents_id=1;
 documents_id | content
--+-
1 | C1, v2
1 | C2, v3
1 | C3, v1
(3 rows)

This huge cost of 100 which appeared out of nowhere in the
EXPLAIN output and the seq scan worry me - where did that come from?
There are absolutely no unindexed fields in the query, and the result
set of the aggregate (max) is very small.

Of course, I might be doing the whole structure wrong - any ideas?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Versioned, chunked documents

2012-04-01 Thread Ivan Voras

2012/4/2 Ondrej Ivanič ondrej.iva...@gmail.com:
 Hi,

 On 2 April 2012 08:38, Ivan Voras ivo...@freebsd.org wrote:
 db= set enable_seqscan to off;

 snip


 This huge cost of 100 which appeared out of nowhere in the
 EXPLAIN output and the seq scan worry me - where did that come from?

 It is not possible to disable seq scan completely. The enable_seqscan
 to off just sets cost of this operation extremely high (10 000 000
 000) thus planner is forced look for better plan. In your case planner
 wasn't able to find anything better hence
 cost=109.55..109.56.

Hi,

Thanks, I suspected something hard-coded might be adding to the cost,
but this leaves the more important issue: why is a seq scan happening
at all with an indexed field (the id)?

If I read the EXPLAIN output (at http://explain.depesz.com/s/mpY)
correctly, this is where it happens:
 Hash Cond: ((max(public.documents_chunks.id)) = public.documents_chunks.id)

The left hand side is a result of the MAX aggregate with a GROUP, but
the number of records is really small so I think the index should be
used on the right hand side of the hash cond.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] what Linux to run

2012-03-01 Thread Ivan Voras

On 28/02/2012 18:17, Rich Shepard wrote:
 On Tue, 28 Feb 2012, mgo...@isstrucksoftware.net wrote:
 
 If we move to Linux, what is the preferred Linux for running Postgres
 on. This machine would be dedicated to the database only.
 
 Michael,
 
   There is no 'preferred' linux distribution; the flame wars on this topic
 died out a decade or so ago.
 
   From what you write, I would suggest that you look at one of the Ubunutus
 http://www.ubuntu.org/. Either the KDE or Gnome versions will appear
 Microsoft-like; the Xfce version appears more like CDE. Download a bootable
 .iso (a.k.a. 'live disk) and burn it to a cdrom and you can try it without
 .installing it. If you do like it, install it from the same disk.
 
   The Ubuntus boot directly into the GUI and that tends to be more
 comfortable for newly defenestrated users. If you like that, but want the
 more open and readily-available equivalent, install Debian. The ubuntus are
 derivatives of debian.

One interesting thing I've discovered recently is that there is a HUGE
difference in performance between CentOS 6.0 and Ubuntu Server 10.04
(LTS) in at least the memory allocator and possibly also multithreading
libraries (in favour of CentOS). PostgreSQL shouldn't be particularly
sensitive to either of these, but it makes me wonder what else is
suboptimal in Ubuntu.




signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] what Linux to run

2012-03-01 Thread Ivan Voras

On 28/02/2012 17:57, mgo...@isstrucksoftware.net wrote:
 Our application runs on Windows, however we have been told that we can
 pick any OS to run our server on.  I'm thinking Linux because from
 everything I've read, it appears to be a better on performance and there
 are other features like tablespaces which we could take advantage of. 
 On our hosted solution, the application runs in a Software as a Service
 model and being able to keep each companies tables in their own table
 space would be nice.  Additionally it appears that there are a lot more
 ways to tune the engine if we need to than under windows, plus the
 capability to hold more connections.
 
 If we move to Linux, what is the preferred Linux for running Postgres
 on.  This machine would be dedicated to the database only. 
 
 I'd like a recommendation for both a GUI hosted version and a non-GUI
 version.  I haven't used Linux in the past but did spend several year s
 in a mixed Unix and IBM mainframe environment at the console level.

Hi,

PostgreSQL administration would not benefit much from a GUI, as it is
basically centered around editing and tuning configuration files (either
its or the OS's).

For Linux, if you want stability and decent performance, you should
probably choose either CentOS, or if you want commercial support, Red
Hat Enterprise Linux (which is basically the same thing, only commercial).

Personally, I'd recommend FreeBSD (it's not a Linux, it's more
Unix-like) but I'm probably biased ;)




signature.asc
Description: OpenPGP digital signature

[GENERAL] PostgreSQL poster

2011-11-27 Thread Ivan Voras


I was looking for some PostgreSQL promotional material and found this:

http://imgur.com/4VUUw

I would very much like to get a high-res version of this image - does 
anyone here have it?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Session variables and C functions

2011-11-17 Thread Ivan Voras

I'm writing a custom C function and one of the things it needs to do is
to be configured from the SQL-land, per user session (different users
have different configurations in different sessions).

I have found (and have used) the SET SESSION command and the
current_setting() function for use with custom_variable_classes
configuration in postgresql.conf, but I'm not sure how to achieve the
same effect from C.

Ideally, the C module would create its own custom variable class,
named e.g. module, then define some setting, e.g. module.setting.
The users would then execute an SQL command such as SET SESSION
module.setting='something', and the module would need to pick this up
in the C function.

Any pointers to how this would be done?




signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] Session variables and C functions

2011-11-17 Thread Ivan Voras

On 17 November 2011 19:02, Tom Lane t...@sss.pgh.pa.us wrote:
 Ivan Voras ivo...@freebsd.org writes:
 Ideally, the C module would create its own custom variable class,
 named e.g. module, then define some setting, e.g. module.setting.
 The users would then execute an SQL command such as SET SESSION
 module.setting='something', and the module would need to pick this up
 in the C function.

 Plenty of examples of that in contrib/ ...

Ok, I found DefineCustom*() and _PG_init() but there's a small
problem: the functions I'm writing are for a user defined typed and
apparently _PG_init() is not called until some operation with the type
is done (e.g. anything with the input/output functions).

This means that DefineCustom*() is not called and PG doesn't know
about the variable until it's too late - I need the session variable
to be settable by the user before input/output - relying on the
default value is not enough.

Is there any way to make _PG_init() called earlier, e.g. as soon as
the session is established or at database connection time, something
like that?

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Session variables and C functions

2011-11-17 Thread Ivan Voras

On 18 November 2011 01:20, Tom Lane t...@sss.pgh.pa.us wrote:
 Ivan Voras ivo...@freebsd.org writes:
 Is there any way to make _PG_init() called earlier, e.g. as soon as
 the session is established or at database connection time, something
 like that?

 Preload the library --- see shared/local_preload_libraries configuration
 settings.

Ok, thanks!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Bulk processing deletion

2011-10-13 Thread Ivan Voras

Hello,

I have a table with a large number of records (millions), on which the
following should be performed:

1. Retrieve a set of records by a SELECT query with a WHERE condition
2. Process these in the application
3. Delete them from the table

Now, in the default read-committed transaction isolation, I can't just
use the same WHERE condition with a DELETE in step 3 as it might delete
more records than are processed in step 1 (i.e. phantom read). I've
thought of several ways around it and would like some feedback on which
would be the most efficient:

#1: Create a giant DELETE WHERE ... IN (...) SQL command for step #3
with primary keys of records from step 1 - but will it hit a SQL string
length limitation in the database? Is there such a limit (and what is it?)

#2: Same as #1 but with batching the records to e.g. 1000 at a time, all
in one transaction

#2: Use a higher isolation level, probably Repeatable Read (PG 9.0) -
but then the question is will this block other clients from inserting
new data into the table? Also, is Repeatable Read enough?

Any other ideas?




signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] Bulk processing deletion

2011-10-13 Thread Ivan Voras

On 13/10/2011 14:34, Alban Hertroys wrote:

 Any other ideas?
 
 CREATE TABLE to_delete (
   job_created timestamp NOT NULL DEFAULT now(),
   fk_id int NOT NULL
 );
 
 -- Mark for deletion
 INSERT INTO to_delete (fk_id) SELECT id FROM table WHERE condition = true;
 
 -- Process in app
 SELECT table.* FROM table INNER JOIN to_delete ON (table.id = 
 to_delete.fk_id);
 
 -- Delete them
 DELETE FROM table WHERE id IN (SELECT fk_id FROM to_delete);

Good point. I can even use a temp table for this and make use of
UNLOGGED temp tables when we upgrade to 9.1!




signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] Bulk processing deletion

2011-10-13 Thread Ivan Voras

On 13 October 2011 20:08, Steve Crawford scrawf...@pinpointresearch.com wrote:
 On 10/13/2011 05:20 AM, Ivan Voras wrote:

 Hello,

 I have a table with a large number of records (millions), on which the
 following should be performed:

        1. Retrieve a set of records by a SELECT query with a WHERE
 condition
        2. Process these in the application
        3. Delete them from the table

 Without knowing a bit more, it is difficult to say. A couple questions:

 1. Are there conflicting processes - i.e. could multiple applications be in
 contention to process the same set of records?

No, only one bulk processor.

 2. Is the processing all or none or could individual records fail? If so,
 how do you deal with reprocessing or returning those to the main table.

All or none; the nature of thing is that there can be no fatal failures.

 Depending on the nature of your app, it might be feasible to reorder the
 actions to move the records to be processed into a processing table and
 delete them from that table as the records are processed by the application.

 You could move the records into the processing table with:

 with foo as (delete from main_table where your_where_clause returning a,b,c)
 insert into processing_table (a,b,c) select a,b,c from foo;

 In this case I would not recommend temporary or unlogged tables for the
 processing table as that becomes the only source of the data once deleted
 from the master table.

Ok, thanks (to everyone)!

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL benchmarked on XFS vs ZFS vs btrfs vs ext4

2011-09-14 Thread Ivan Voras

On 14/09/2011 09:30, Toby Corkindale wrote:
 On 14/09/11 12:56, Andy Colson wrote:
 On 09/13/2011 08:15 PM, Toby Corkindale wrote:
 Hi,
 Some months ago, I ran some (probably naive) benchmarks looking at how
 pgbench performed on an identical system with differing filesystems.
 (on Linux).

 [snip]

 Did you test unplugging the power cable in the middle of a test to see
 which would come back up?
 
 Heh, no, but it'd be interesting to see..
 
 I wonder if turning a virtual machine off has the same effect?

No, never the exact same effect.

There are two things to consider: if/when/how the OS flushes the data to
the hardware and if/when/how the hardware flushes the data to physical
storage. You can simulate only the failure of the first part with a
virtual machine, but not the second (unless you bring down the VM host...).



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Re: Seg Fault in backend after beginning to use xpath (PG 9.0, FreeBSD 8.1)

2011-05-03 Thread Ivan Voras


On 03/05/2011 07:12, alan bryan wrote:

Our developers started to use some xpath features and upon deployment
we now have an issue where PostgreSQL is seg faulting periodically.
Any ideas on what to look at next would be much appreciated.

FreeBSD 8.1
PostgreSQL 9.0.3 (also tried upgrading to 9.0.4)  built from ports
Libxml2 2.7.6 (also tried upgrading to 2.7.8)   built from ports

pgsql logs show:
May  1 17:51:13 192.168.20.100 postgres[11862]: [94-1] LOG:  server
process (PID 62112) was terminated by signal 11: Segmentation fault

syslog shows:
May  2 20:29:16 db3 kernel: pid 49956 (postgres), uid 70: exited on
signal 11 (core dumped)
May  2 21:06:37 db3 kernel: pid 39086 (postgres), uid 70: exited on
signal 10 (core dumped)

Checking out postgres.core and we see:

(gdb) bt
#0  0x0008f5f19afd in pthread_mutex_lock () from /lib/libthr.so.3
#1  0x000800d22965 in xmlRMutexLock () from /usr/local/lib/libxml2.so.5


This is unusual. There isn't any need to use pthreads here. As far as I 
can see, the normal build of libxml2 doesn't import it explicitly:



ldd /usr/local/lib/libxml2.so

/usr/local/lib/libxml2.so:
libz.so.5 = /lib/libz.so.5 (0x800889000)
libiconv.so.3 = /usr/local/lib/libiconv.so.3 (0x800e5)
libm.so.5 = /lib/libm.so.5 (0x80104b000)
libc.so.7 = /lib/libc.so.7 (0x800647000)

Judging by the mix of SIGBUS and SIGSEGV, I'd say it is likely this is 
causing you problems.


To make sure, you may want to rebuild libxml2 with WITHOUT_THREADS 
defined. You may also need to rebuild postgresql afterwards.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] multi-tenant vs. multi-cluster

2011-03-18 Thread Ivan Voras


On 18/03/2011 19:17, Ben Chobot wrote:


if we're talking an extra 50MB of memory per cluster, that will start to add up.


Consider this: each such cluster will have:

a) its own database files on the drives (WAL, data - increasing IO)
b) its own postgresql processes (many of them) running in memory
c) its own shared_buffers in memory.

It is highly unlikely that you will manage anything decent with this 
type of configuration with a non-trivial number of clusters.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Copying databases with extensions - pg_dump question

2011-01-21 Thread Ivan Voras

A fairly frequent operation I do is copying a database between servers, 
for which I use pg_dump. Since the database contains some extensions - 
most notably hstore and tsearch2, which need superuser privileges to 
install, I have a sort of a chicken-and-egg problem: the owner of the 
database (and all its objects) should be a non-superuser account so I 
can't simply use the output from pg_dump and expect everything to be 
correct after restoring it.


So far, I've used this workaround: install all superuser-requiring 
extensions in template1 on the destination server and then restore from 
pg_dump, ignoring the occasional duplicate object errors. This would 
work out of the box but pg_dump's create database commands (outputted 
with -C) includes the WITH TEMPLATE=template0 clause so I made a small 
utility which modifies these dumps to change the one byte so the 
template becomes template1. (-C is useful so I can do psql template1 
pgsql  my_dump.sql and get it all done).


Anyway, this works good enough but I wonder now if there is a better 
solution for this? As a feature request, I'd like a template database 
argument to use with -C so I don't have to modify the dumps, but there 
could be a better solution which side-steps this.


Is there a canonical way to deal with this problem?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Copying databases with extensions - pg_dump question

2011-01-21 Thread Ivan Voras


On 21/01/2011 14:39, Bill Moran wrote:

In response to Ivan Vorasivo...@freebsd.org:


A fairly frequent operation I do is copying a database between servers,
for which I use pg_dump. Since the database contains some extensions -
most notably hstore and tsearch2, which need superuser privileges to
install, I have a sort of a chicken-and-egg problem: the owner of the
database (and all its objects) should be a non-superuser account so I
can't simply use the output from pg_dump and expect everything to be
correct after restoring it.


Why not?  If the ownership on the original database is non-superuser, then
that will be faithfully preserved when the database is restored.  What
are you doing to cause it to behave differently?


I have reviewed my operations and it looks like these are the important 
differences:


* The database copy might be from a development machine to production so 
I use pg_dump -O to remove any accidentally entered unwanted user 
ownership data
* The database restore on the target machine is done as a nonprivileged 
user (the target owner of the database)


Are there better ways to do this?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Copying databases with extensions - pg_dump question

2011-01-21 Thread Ivan Voras


On 21/01/2011 15:55, Bill Moran wrote:


On the no side, doing this kind of thing is always complex.  We have a
slew of other, very specialized scripts that do things like convert a
production database to a development database by sanitizing sensitive
data, or automatically deploy new database or upgrades to either our
production, development, or lab environments.  Build tools like phing,
make, ant, etc make this kind of thing easier to create, but it's still
a lot of work because each situation is special.


Yes, I've also made a set of migration and sanitizing scripts so that 
aspect is covered. I was only wondering if there is some way I'm missing 
that would solve this one step more elegantly.



So, my overall answer is that you're probably on the right track, you
probably don't realize how much work is involved to get this really
working well, and I would change just a few things about your approach
based on the information you've provided so far.  The requirement to
use an unprivileged user to restore is going to make the tools you
use to prepare the input files more important, but that's not insurmountable
if you keep control over them (i.e. use schema files and a tool for
normalizing them and do the data dump separately)


Thanks, I'll consider that approach.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Optimal settings for embedded system running PostgreSQL

2011-01-16 Thread Ivan Voras


On 13/01/2011 17:31, Christian Walter wrote:

Von: pgsql-general-ow...@postgresql.org im Auftrag von Ivan Voras




- Average read = 15,6Mb/s
- 4Kbyte reads = 3,5Mb/s
- 1Kbyte read = 1Mb/s


This is very slow. Have you considered something more light-weight like
SQLite?


This is comparable to a standard USB2.0 flash drive. Reads are faster
on very large block sizes but I thought postgres by default uses
8Kbyte. At 8KByte we have about 5000kbyte/s. For example if you
compare a standard flash disk, e.g.
http://www.innodisk.com/production.jsp?flashid=34 so can see that they
specify 26mb/sec as a max. The same is specified here but I think is
is only for very large block sizes. Using a different database is not
an option for us since our software depends on PostgreSQL and on its
JDBC drivers. We are working with the vendor to find a solution for
this but of course this is only a long term option.


Of course, it depends on your constraints, but my suggestion was not 
only related to block IO speed but more to the nature of the 
application: there is a large difference between the number and the 
complexity of operations between postgresql and sqlite. SQLite might 
need much less IO bandwidth independently of block sizes.


As others said, switching to a non-Windows OS will probably also 
increase performance, independently of database type or block sizes.


My point is: you may need to scale everything optimally for your 
application, not only PostgreSQL.


Your original post doesn't really mention if your database activity is 
write-mostly or read-mostly, so you can do additional tuning. For 
example, if you are write-heavy and can survive a couple of last 
transactions being lost, you could configure PostgreSQL for 
synchronous_commit=off without losing complete database integrity on 
power failure.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Optimal settings for embedded system running PostgreSQL

2011-01-13 Thread Ivan Voras


On 13/01/2011 14:30, Christian Walter wrote:


Dear Members,

We are currently using PostgreSQL 7.3 on an Embedded System (Based on
http://www.msc-ge.com/de/produkte/com/qseven/q7-us15w.html) running
Windows XP Embedded / SP3. The onbard flash shows the following
performance figures:

- Average read = 15,6Mb/s
- 4Kbyte reads = 3,5Mb/s
- 1Kbyte read = 1Mb/s


This is very slow. Have you considered something more light-weight like 
SQLite?



We are sometimes facing problems that the system seems to hang due to
database activity and now we are looking to improve the situation. If
the database application is running and we run the benchmark again
figures drop down to zero so we assume that the database (and
application) is responsible for the high load. Since the embedded flash
is quite hard to change we are looking for solution on improving this.
For me it seems that blocksize seems a big issue and I would like to
know if a recompile of Postgres with a larger block size would resolve
this?


You can, see --with-blocksize and --with-wal-blocksize arguments to 
./configure, but flash memory is notorious for having really large block 
sizes, on the order of 256 KiB - I suspect it would be very inefficient 
if the database tried to do everything in such large blocks.



Are there any other options which should be set for flash based disks
since latency is a lower than on harddisks.


First, you need to confirm or find where the real problem is. Windows is 
hard for debugging but you may get some information from the perfmon 
applet.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Bytea error in PostgreSQL 9.0

2010-12-14 Thread Ivan Voras


On 14/12/2010 14:51, tuanhoanganh wrote:

Thanks for your help.
Is there any .Net or VB tutorial new 9.0 bytea?


You do not need to change your code if you add

bytea_output = 'escape' # hex, escape

into postgresql.conf.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Best practice to get performance

2010-11-19 Thread Ivan Voras


On 11/19/10 15:49, Andy Colson wrote:


unlogged will only help insert/update performance. Lookup tables sound
readonly for a majority of time. (I'm assuming lots of reads and every
once and a while updates). I doubt that unlogged tables would speed up
lookup tables.


Are FreeBSD's temp tables still storage-backed? (i.e. are there 
memory-backed temp tables)?




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Best practice to get performance

2010-11-19 Thread Ivan Voras


On 11/19/10 23:14, Andy Colson wrote:

On 11/19/2010 4:04 PM, Ivan Voras wrote:

On 11/19/10 15:49, Andy Colson wrote:


unlogged will only help insert/update performance. Lookup tables sound
readonly for a majority of time. (I'm assuming lots of reads and every
once and a while updates). I doubt that unlogged tables would speed up
lookup tables.


Are FreeBSD's temp tables still storage-backed? (i.e. are there
memory-backed temp tables)?





Sorry, I'm not sure what you are asking. Not sure what this has to do
with FreeBSD, or its temp tables.

Unless: s/freebsd/postgres/i


Yes, sorry, switched between mailing lists too fast :)


Ah, yes, Postgres temp tables still make it to disk eventually. They are
not WAL logged (but if they get really big they spill to disk). There
are no memory only tables, no. But PG will cache the heck out of all
tables, so you get the best of both worlds.


Thanks!


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

2010-11-17 Thread Ivan Voras


On 11/17/10 01:23, Scott Ribe wrote:

On Nov 16, 2010, at 3:46 PM, Josh Berkus wrote:


  ...and will be truncated (emptied) on database restart.


I think that's key. Anything that won't survive a database restart, I sure don't 
expect to survive backup  restore.


FWIW, I agree with this reasoning. Iff the automatic truncate on clean 
restart could be turned off, I'd also expect the data to be in the 
backup - in that case only.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

2010-11-17 Thread Ivan Voras


On 11/17/10 02:55, Josh Berkus wrote:



If you do wish to have the data tossed out for no good reason every so
often, then there ought to be a separate attribute to control that.  I'm
really having trouble seeing how such behavior would be desirable enough
to ever have the server do it for you, on its terms rather than yours.


I don't quite follow you.  The purpose of unlogged tables is for data
which is disposable in the event of downtime; the classic example is the
a user_session_status table.  In the event of a restart, all user
sessions are going to be invalid anyway.


Depends on what you mean by session.

Typical web application session data, e.g. for PHP applications which 
are deployed in *huge* numbers resides directly on file systems, and are 
not guarded by anything (not even fsyncs). On operating system crash 
(and I do mean when the whole machine and the OS go down), the most that 
can happen is that some of those session files get garbled or missing - 
all the others work perfectly fine when the server is brought back again 
and the users can continue to work within their sessions. -- *That* is 
useful session behaviour and it is also useful for logs.


The definition of unlogged tables which are deliberately being emptied 
for no good reason does not seem very useful to me. I'd rather support a 
(optional) mode (if it can be implemented) in which PostgreSQL scans 
through these unlogged tables on startup and discards any pages whose 
checkums don't match, but accepts all others as good enough. Even 
better: maybe not all pages need to be scanned, only the last few, if 
there is a chance for any kind of mechanism which can act as checkpoints 
for data validity.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

2010-11-17 Thread Ivan Voras


On 11/17/10 17:43, A.M. wrote:


On Nov 17, 2010, at 11:32 AM, Ivan Voras wrote:


On 11/17/10 02:55, Josh Berkus wrote:



If you do wish to have the data tossed out for no good reason every so
often, then there ought to be a separate attribute to control that.  I'm
really having trouble seeing how such behavior would be desirable enough
to ever have the server do it for you, on its terms rather than yours.


I don't quite follow you.  The purpose of unlogged tables is for data
which is disposable in the event of downtime; the classic example is the
a user_session_status table.  In the event of a restart, all user
sessions are going to be invalid anyway.


Depends on what you mean by session.

Typical web application session data, e.g. for PHP applications which are 
deployed in *huge* numbers resides directly on file systems, and are not 
guarded by anything (not even fsyncs). On operating system crash (and I do mean 
when the whole machine and the OS go down), the most that can happen is that 
some of those session files get garbled or missing - all the others work 
perfectly fine when the server is brought back again and the users can continue 
to work within their sessions. -- *That* is useful session behaviour and it is 
also useful for logs.

The definition of unlogged tables which are deliberately being emptied for no good reason 
does not seem very useful to me. I'd rather support a (optional) mode (if it can be 
implemented) in which PostgreSQL scans through these unlogged tables on startup and 
discards any pages whose checkums don't match, but accepts all others as good 
enough. Even better: maybe not all pages need to be scanned, only the last few, if 
there is a chance for any kind of mechanism which can act as checkpoints for data 
validity.


This is not really a fair feature comparison. With the file-based sessions, the 
webserver will continue to deal with potentially corrupted sessions, which is 
worse than dealing with no sessions.


I guess it depends on specific use case, but in the common case (i.e. 
non-mission critical massive deployments) I'd say it's definitely *not* 
worse than no sessions. Dealing with potential corruption in this case 
usually means the web application will attempt to deserialize the 
session data and fail if it's corrupted, leading to a new session being 
created.



I also plan on using this feature for materialized views to replace memcached.


Just how large a performance gain is expected from this thing? :) I 
don't see a mention that fsync will be disabled on unlogged tables 
(though it makes sense so it probably will be).


Having materialized views this way will mean that something - either an 
application or an external script triggered by database startup - will 
have to calculate and create this materialized view, which will probably 
involve massive table scanning all around - I suspect that performance 
gains from unlogged tables could be hidden by this scanning.


Anyway, I'm not arguing against them, I'm arguing for making them more 
powerful.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] postgresql scalability issue

2010-11-09 Thread Ivan Voras

On 11/08/10 16:33, umut orhan wrote:
 Hi all,
 
 
 I've collected some interesting results during my experiments which I 
 couldn't 
 figure out the reason behind them and need your assistance.
 
 I'm running PostgreSQL 9.0 on a quad-core machine having two level on-chip 
 cache 
 hierarchy. 

Based on this information, you are most likely running on Intel Xeon
5000-5400 series CPU, right? It probably doesn't matter much since you
apparently have only a single socket populated but is a bit old
architecture known for its FSB bottleneck to the memory.

You should write some details about your hardware: at least CPU
model/speed and memory speed, and software (which OS? 32-bit or 64-bit?)

 PostgreSQL has a large and warmed-up  buffer
 cache thus, no disk I/O is observed during experiments (i.e. for each query 
 buffer cache hit rate is 100%). I'm pinning each query/process to an 
 individual 
 core. Queries are simple read-only queries (only selects). Nested loop 
 (without 
 materialize) is used for the join operator.

 When I pin a single query to an individual core, its execution time is 
 observed 
 as 111 seconds. This result is my base case. Then, I fire two instances of 
 the 
 same query concurrently and pin them to two different cores separately. 
 However, 
 each execution time becomes 132 seconds in this case. In a similar trend, 
 execution times are increasing for three instances (164 seconds)  and four 
 instances (201 seconds) cases too. What I was expecting is a linear 
 improvement 
 in throughput (at least). I tried several different queries and got the same 
 trend at each time.

Are you measuring wall-clock execution time for queries in parallel?
I.e. start measuring when the first query is started (asynchronously?)
and stop when the last one is finished?

Did you try the same measurement without pinning?

 I wonder why execution times of individual queries are increasing when I 
 increase the number of their instances.

 Btw, I don't think on-chip cache hit/miss rates make a  difference since L2 
 cache misses are decreasing as expected. I'm not an expert in PostgreSQL 
 internals. Maybe there is a lock-contention (spinlocks?) occurring even if 
 the 
 queries are read-only. Anyways, all ideas are welcome.

As others said, memory bandwidth is the most likely suspect here. CPUs
are unfortunately so much faster than memory and memory buses that they
frequently have to wait. Unless PostgreSQL uses the exclusive lock model
instead of shared-exclusive, there shouldn't be much contention for the
shared buffers.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Table update problem works on MySQL but not Postgres

2010-09-01 Thread Ivan Voras

On 09/01/10 16:13, Igor Neyman wrote:

-Original Message-
From: Raymond C. Rodgers [mailto:sinful...@gmail.com]
Sent: Tuesday, August 31, 2010 7:56 PM
To: pgsql-general@postgresql.org
Subject: Table update problem works on MySQL but not Postgres

update mydemo set cat_order = cat_order + 1 where client_id =
1 and cat_order= 0

in order  to insert  categories at the top of the sorted list
for example. As you can probably guess, this query doesn't
work very well.
On both MySQL and PostgreSQL I get a constraint violation.
That makes sense; I screwed up.

What you need for your update to work is deferred unique constraints.
I think, this feature appears in 9.0.

Yes:

http://www.postgresql.org/docs/9.0/static/sql-set-constraints.html

 Currently, only UNIQUE, PRIMARY KEY, REFERENCES  (foreign key), and 
EXCLUDE constraints are affected by this setting. NOT NULL and CHECK 
constraints are always checked immediately when a row is inserted or 
modified (not at the end of the statement). Uniqueness and exclusion 
constraints that have not been declared DEFERRABLE are also checked 
immediately. 

In 8.4 it says:

 Currently, only foreign key constraints are affected by this setting. 
Check and unique constraints are always effectively not deferrable. 
Triggers that are declared as constraint triggers are also affected. 

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Massively Parallel transactioning?

2010-08-23 Thread Ivan Voras

On 08/19/10 20:19, Benjamin Smith wrote:
 On Wednesday, August 18, 2010 04:58:08 pm Joshua D. Drake wrote:
 Well if you are just using it for updates to the schema etc... you
 should only need to launch a single connection to each database to make
 those changes.
 
 And that's exactly the problem. On each server, we have at least dozens of 
 active databases - one for each client. Opening a connection for each 
 database 
 starts to become problematic. 

Hmm, let's talk about this problem.

If I understand you correctly, you have an arrangement where you have
one server which will push updates, and other, database servers, with
dozens of databases per server?

Let's try to convert this to numbers and assume that dozens of
databases per server means 50 and that you have 50 more servers.
This means that the server which pushes the updates needs to connect to
2500 databases. This is way to small a number of connections (sockets)
from a single client to create problems on the client side.

On the other hand, if you have, say, 50 databases per server, this means
that the update clients connects 50 times to the same server, which is
only a problem if the server has a small total number of connections
configured (max_connections) - smaller than 2x the number of databases.
Unless you are very careful not to actually exceed this number of
connections during normal database use (and depending on what you do
this may or may not be possible), it would actually be more benefitial
to raise max_connections to a sensible value - e.g. 500 (which would
probably need a bump in SEMMNI).



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Re: What to do if space for database decrease and no additional hard drive is possible?

2010-07-26 Thread Ivan Voras

On 26.7.2010 12:43, AlannY wrote:
 Hi there.
 
 I have a huge database with several tables. Some tables have statistics
 information. And it's very huge. I don't want to loose any of this data.
 But hard drives on my single server are not eternal. Very soon, there will
 be no left space. And the most awful this, that it's a 1U server, and I
 can't install new hard drive.
 
 What can I do to enlarge space, without loosing data and performance?

Absolutely nothing quick and easy. In fact, about the only thing you can
do which won't cause a (long term) data loss and performance degradation
is a full backup, installing bigger drives to replace the old ones, and
full restore.

Some other ideas which might help you if you don't want to swap drives,
but generally require a lot of work and you *will* lose either data or
performance:

* use a file system which supports compression (NTFS on Windows, ZFS on
FreeBSD  Solaris, don't know any on Linux)
* move unneeded data out from the database and into a separate,
compressed data storage format (e.g. move statistical data into gzipped
csv or text files or something to that effect)
* buy external storage (NAS, or even an external USB drive), move the
database to it
* use an external data storage service like amazon s3 (actually, this is
a bad idea since you will need to completely rewrite your database and
application)
* decide that you really don't need some of the data and just delete it.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full Text Search dictionary issues

2010-07-16 Thread Ivan Voras

On 07/16/10 02:23, Tom Lane wrote:
 Howard Rogers h...@diznix.com writes:
 I have 10 million rows in a table, with full text index created on one
 of the columns. I submit this query:
 
 ims=# select count(*) from search_rm
 ims-# where to_tsvector('english', textsearch)
 ims-# @@ to_tsquery('english', 'woman  beach  ball');
  count
 ---
646
 (1 row)
 Time: 107.570 ms
 
 ...and those are excellent times. But if I alter the query to read:
 
 ims=# select count(*) from search_rm
 where to_tsvector('english', textsearch)
 @@ to_tsquery('english', 'woman  beach  ftx1');
  count
 ---
  38343
 (1 row)
 Time: 640.985 ms
 
 ...then, as you see, it slows the query down by a factor of about 6,
 
 ... um, but it increased the number of matching rows by a factor of
 almost 60.  I think your complaint of poor scaling is misplaced.

This is basically the same question I asked a few days ago and I think
the reason for this (mis)expectation of performance comes from expecting
tsearch2 to behave like external specialized indexers. In such products,
the search result can be returned simply from the index, which can scale
fairly well, but PostgreSQL actually has to lookup all the records
returned and this is where most time is spent.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Disk performance

2010-06-15 Thread Ivan Voras

On 06/15/10 14:59, Janning wrote:
 Hi all,
 
 as we encountered some limitations of our cheap disk setup, I really would 
 like to see how cheap they are compared to expensive disk setups.
 
 We have a 12 GB RAM machine with intel i7-975 and using
 3 disks Seagate Barracuda 7200.11, ST31500341AS (1.5 GB) 
 One disk for the system and WAL etc. and one SW RAID-0 with two disks for  
 postgresql data.
 
 Now I ran a few test as described in
 http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm
 
 # time sh -c dd if=/dev/zero of=bigfile bs=8k count=300  sync
 300+0 records in
 300+0 records out
 2457600 bytes (25 GB) copied, 276.03 s, 89.0 MB/s
 
 real  4m48.658s
 user  0m0.580s
 sys   0m51.579s
 
 # time dd if=bigfile of=/dev/null bs=8k
 300+0 records in
 300+0 records out
 2457600 bytes (25 GB) copied, 222.841 s, 110 MB/s
 
 real  3m42.879s
 user  0m0.468s
 sys   0m18.721s

The figures are ok if the tests were done on a single drive (i.e. not
your RAID-0 array).

 IMHO it is looking quite fast compared to the values mentioned in the 
 article. 
 What values do you expect with a very expensive setup like many spindles, 
 scsi, raid controller, battery cache etc. How much faster will it be? 

For start, you are attempting to use RAID-0 with two disks here. This
means you have twice as much risk that a drive failure will cause total
data loss. In any kind of serious setup this would be the first thing to
replace.

 Of yourse, you can't give me exact results, but I would just like to get a an 
 idea about how much faster an expensive disk setup could be. 
 Would it be like 10% faster, 100% or 1000% faster? If you can give me any 
 hints, I would greatly appreciate it.

There is no magic here - scalability of drives can be approximated linearly:

a) faster drives: 15,000 RPM drives will be almost exactly 15000/7200
times faster at random access

b) more drives: depending on your RAID schema, each parallel drive or
drive combination will grow your speed linearly. For example, a 3-drive
RAID-0 will be 3/2 times faster than a 2-drive RAID-0. Of course, you
would not use RAID-0 anywhere serious. But an 8-drive RAID-10 array will
be 8/4=2 times faster than a 4-drive RAID-10 array.

Finally, it all depends on your expected load vs budget. If you are
unsure of what you want and what you need, but don't expect serious
write loads, make a 4-drive RAID-10 array of your cheap 7200 RPM drives,
invest in more RAM and don't worry about it.

Drive controllers are another issue and there is somewhat more magic
here. If the above paragraph describes you well, you probably don't need
a RAID controller. There are many different kinds of these with
extremely different prices, and many different configuration option so
nowadays it isn't practical to think about those until you really need to.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Disk performance

2010-06-15 Thread Ivan Voras

On 15 June 2010 18:22, Janning m...@planwerk6.de wrote:

 The figures are ok if the tests were done on a single drive (i.e. not
 your RAID-0 array).

 Ahh, I meant raid-1, of course.  Sorry for this.
 I tested my raid 1 too and it looks quite the same. Not much difference.

This is expected: a RAID-1 array (mirroring) will have the performance
of the slowest drive (or a single drive if they are equal).

 There is no magic here - scalability of drives can be approximated
 linearly:

 a) faster drives: 15,000 RPM drives will be almost exactly 15000/7200
 times faster at random access

 ok.

(or if you are looking at raw numbers: a 15,000 RPM drive will sustain
15000/60=250 random IOs per second (IOPS); but now you are entering
magic territory - depending on the exact type of your load you can get
much better results, but not significantly worse).

 b) more drives: depending on your RAID schema, each parallel drive or
 drive combination will grow your speed linearly. For example, a 3-drive
 RAID-0 will be 3/2 times faster than a 2-drive RAID-0. Of course, you
 would not use RAID-0 anywhere serious. But an 8-drive RAID-10 array will
 be 8/4=2 times faster than a 4-drive RAID-10 array.

 So RAID-10 with 4 disks is 2 times faster than a RAID-1, I got it. So as I
 need much more power I should look for a RAID-10 with 8 or more 15k RPM disks.

Yes, if you expect serious write or random IO load. To illustrate: if
you are trying to power a generic web site, for example a blog, you
can expect that most of your load will be read-only (mostly pageviews)
and except if you plan on having a really large site (many authors for
example), that your database will largely fit into RAM, so you don't
have to invest in disk drives as it will be served from cache. On the
other hand, a financial application will do a lot of transactions and
you will almost certainly need good storage infrastructure - this is
where the 250 IOPS for a 15000 RPM drive estimates come into play.

 thanks very much for your help.
 It gave me a good idea of what to do. If you have further recommendations, I
 would be glad to here them.

I can point you to a dedicated mailing list: pgsql-performance @
postgresql.org for questions about performance such as yours.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] tsearch2 dictionaries - possible problem

2010-06-02 Thread Ivan Voras

hello,

I think I have a problem with tsearch2 configuration I'm trying to use.
I have created a text search configuration as:

--
CREATE TEXT SEARCH DICTIONARY hr_ispell (
TEMPLATE = ispell,
DictFile = 'hr',
AffFile = 'hr',
StopWords = 'hr'
);

CREATE TEXT SEARCH CONFIGURATION public.ts2hr (COPY=pg_catalog.english);

ALTER TEXT SEARCH CONFIGURATION ts2hr
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word,
hword, hword_part
WITH hr_ispell;

SET default_text_search_config = 'public.ts2hr';
--

and here are some queries:

--
cms= select to_tsvector('voras vorasom');
 to_tsvector
-

(1 row)

cms= SET default_text_search_config = 'simple';
SET
cms= select to_tsvector('voras vorasom');
  to_tsvector
---
 'voras':1 'vorasom':2
(1 row)

cms= SET default_text_search_config = 'ts2hr';
SET
cms= select to_tsvector('voras vorasom');
 to_tsvector
-

(1 row)

cms= select to_tsvector('kiša kiši');
 to_tsvector
-
 'kiša':1,2
(1 row)
--

The good news is that the text search configuration is actually used
(the 'kiša kiši') example but apparently on an uncommon word,
to_tsvector() returns nothing (the 'voras vorasom' example).

Is there something wrong in the configuration? I would definitely not
want unknown words to be ignored.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search on a complex schema - a classic problem?

2010-05-24 Thread Ivan Voras

On 05/23/10 18:03, Andy Colson wrote:
 On 05/22/2010 09:40 PM, Ivan Voras wrote:
 Hello,

 I have a schema which tracks various pieces of information which would
 need to be globally searchable. One approach I came up with to make all
 of the data searchable is to create a view made of UNION ALL queries
 that would integrate different tables into a common structure which
 could be uniformly queried by using tsearch2 functions. This would work,
 up to the point where it would be practically unavoidable (for
 performance reasons) to create indexes on this view, which cannot be
 done. I would like to avoid using a hand-made materialized view (via
 triggers, etc.) because of administrative overhead and because it would
 duplicate data, of which there is potentially a lot.

 I think this looks like a fairly common problem with full text searches
 on a large-ish schemas, so I'm wondering what are the best practices
 here, specifically with using tsearch2?

 
 I have something like this, but with PostGIS layers.  When a person
 clicks I search all the different layers (each a table) for
 information.  I use a stored proc.  Each table has its own index so each
 table is fast.  It also lets me abstract out differences between the
 layers (I can search each a little differently).
 
 If each of your tables had its own full text fields and indexes, then
 write a stored proc to search them all individually, it should be pretty
 quick.

This looks like an interesting solution. And it could be done
generically in our case by having a separate table describing which
tables need to be searched and by what fields.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Full text search on a complex schema - a classic problem?

2010-05-24 Thread Ivan Voras

On 05/23/10 07:17, Craig Ringer wrote:
 On 23/05/10 10:40, Ivan Voras wrote:
 Hello,

 I have a schema which tracks various pieces of information which would
 need to be globally searchable.
 
 If systems that exist outside the database its self are acceptable,
 check out Apache Lucerne, and tools that use it like Hibernate Search.

We are currently using such an external system and while it works fast
enough it has two problems:

1) the need to periodically refresh it (via cron)
2) it complicates deployment a bit by adding dependencies

so we're moving away from it.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Full text search on a complex schema - a classic problem?

2010-05-22 Thread Ivan Voras

Hello,

I have a schema which tracks various pieces of information which would
need to be globally searchable. One approach I came up with to make all
of the data searchable is to create a view made of UNION ALL queries
that would integrate different tables into a common structure which
could be uniformly queried by using tsearch2 functions. This would work,
up to the point where it would be practically unavoidable (for
performance reasons) to create indexes on this view, which cannot be
done. I would like to avoid using a hand-made materialized view (via
triggers, etc.) because of administrative overhead and because it would
duplicate data, of which there is potentially a lot.

I think this looks like a fairly common problem with full text searches
on a large-ish schemas, so I'm wondering what are the best practices
here, specifically with using tsearch2?


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Authentication method for web app

2010-05-14 Thread Ivan Voras

On 14 May 2010 09:08, Leonardo F m_li...@yahoo.it wrote:
 Personally I would lean toward making
 the bulk of security within the
 application so to simplify everything - the
 database would do what it
 does best - store and manipulate data - and the
 application would be the
 single point of entry. Protect the servers - keep
 the applications (like
 ssh, php, apache, your application) updated and make
 sure you have good
 and frequent backups.



 Thank you for your reply.

 Anyone else? How do you secure your db used by
 webapps?

Basically what I've said:

1) find all points of entry to the db (i.e. the application), secure them
2) keep the server itself secure (applications patched, firewall
enabled with custom rules to protect the db if necessary, only trusted
local users etc.) If the app and the db are on different servers,
consider a direct (patch cable) link between them or if the structure
is more complex a switch, remove them from Internet, possibly make a
DMZ, etc.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Authentication method for web app

2010-05-13 Thread Ivan Voras

On 05/13/10 09:21, Leonardo F wrote:
 Hi all,
 
 
 we're going to deploy a web app that manages users/roles for another
 application.
 
 We want the database to be safe from changes made by malicious
 users.
 
 I guess our options are:
 
 1) have the db listen only on local connections; basically when the
 machine is accessed the db could be compromised. Hardening the
 server access is the only true security defense we have.
 
 2) Use, as user/password, the same user/password used to
 enter the web app. Basically there would be a 1 to 1 matching between
 our app users (and password...) and the db users (with proper
 permissions...)
 
 I'm not a great expert on these things (as you've already guessed...).
 
 Can someone help me?

As you already figured out, the key is protecting both the application
and the db server from intrusions. If anyone gets unauthorized access to
either of them, especially if it's root access, any and all security you
build on top of that will in all likelihood collapse.

Personally I would lean toward making the bulk of security within the
application so to simplify everything - the database would do what it
does best - store and manipulate data - and the application would be the
single point of entry. Protect the servers - keep the applications (like
ssh, php, apache, your application) updated and make sure you have good
and frequent backups.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] hstore problem with UNION?

2010-05-10 Thread Ivan Voras


I've encountered the following problem:

ivoras=# create table htest2(id integer, t hstore);
CREATE TABLE

ivoras=# create table htest3(id integer, t2 hstore);
CREATE TABLE

ivoras=# select id, t from htest2 union select id,t2 as t from htest3;
ERROR:  could not identify an ordering operator for type hstore
HINT:  Use an explicit ordering operator or modify the query.

I think it has something to do with UNION being the type of operation 
that could, as a variation, include DISTINCT, which would require 
sorting, but still... UNION by itself doesn't.


How to get around this? I really don't care how hstores get sorted and 
more, would like to avoid sorting them at all as they could get big.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] hstore problem with UNION?

2010-05-10 Thread Ivan Voras


On 05/10/10 14:10, Jayadevan M wrote:

When we do a union, the database has to get rid of duplicates and get
distinct values. To achieve this, probably it does a sort. Just
guesswork


You are right, it looks like I have inverted the logic of UNION and 
UNION ALL - I actually needed UNION ALL here, thanks!




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] File compression in WinXP

2010-05-02 Thread Ivan Voras

On 2.5.2010 16:48, pasman pasmański wrote:
 Hello.
 I'm install postgresql 8.4.3 on WinXPsp3. 
 Because of small disk i create tablespace
 pointing to commpressed folder and move
 some tables to it. 
 Compression is good: 10GB to 3-4GB
 speed acceptable (small activity,10 users)
 
 But is this safe ?

Microsoft guarantees your safety here, there should be no difference
in behaviour.

Of course, memory and other resource requirements are still dependent on
the uncompressed size. Your performance will probably be less than
without compression.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Upcoming hot standby replication question

2010-04-09 Thread Ivan Voras

Hello,

I'd like to ask about the asynchronous nature of upcoming replication
implementation in 9.0 - what guarantees does it give with regards to
delays and latency? E.g. do COMMITs finish and return to the caller
before or after the data is sent to the slave? (being asynchronous, they
probably don't wait for the other side's confirmation, right?).




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Upcoming hot standby replication question

2010-04-09 Thread Ivan Voras

On 9 April 2010 18:21, Greg Smith g...@2ndquadrant.com wrote:
 Ivan Voras wrote:

 I'd like to ask about the asynchronous nature of upcoming replication
 implementation in 9.0 - what guarantees does it give with regards to
 delays and latency? E.g. do COMMITs finish and return to the caller
 before or after the data is sent to the slave? (being asynchronous, they
 probably don't wait for the other side's confirmation, right?).


 Exactly--synchronous replication, the only way to enforce that data is on
 the slave before completing the COMMIT, was postponed from this release.  It
 should make it into 9.1 as an option, but it will always be expensive to
 turn on.

 What is in 9.0 is eventual consistency.  If your slave is keeping up with
 traffic being sent by the master, it should receive each incremental commit
 shortly after it's made.  In practice, slaves should only lag some number of
 seconds behind the master.  But there are zero guarantees that will be the
 case, or that latency will be bounded at all.  Recommended practice is to
 carefully monitor how much latency lag there is on the standby and trigger
 alerts if it exceed your expectations.

Ok, but I imagine there should be a difference between COMMITs
returning before or after the actual data is sent over the network
(though admittedly socket buffering could make it hard to
distinguish).

In addition to that, how about a compromise: COMMITs returning when
the remote side ACKs acceptance but before it passes data to storage?
Just thinking out lout.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Re: What locking mechanism is used for database backup and restore and Master-Slave Replication?

2010-01-21 Thread Ivan Voras


On 01/21/10 16:09, John Mitchell wrote:

So am I to presume that the current stable version of postgres (before
8.5) does require extra locking?


There is currently (before 8.5) no official replication mechanism in 
PostgreSQL. There are some 3rd party implementations, for which 
information can be gathered here:


http://www.postgresql.org/docs/current/interactive/high-availability.html

http://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling




2010/1/21 Grzegorz Jaśkiewicz gryz...@gmail.com mailto:gryz...@gmail.com

On Thu, Jan 21, 2010 at 1:12 PM, John Mitchell
mitchellj...@gmail.com mailto:mitchellj...@gmail.com wrote:
  Hi,
 
  In reading the documentation it states that the SQL dump backup
does not
  block other operations on the database while it is working.
yes, pg_dump opens serializable transaction thus guarantees data to be
the exact snapshot (as opposed to the default isolation level, which
is called 'read commited' not without reason).
 
  I presume that while a restore is occurring that no reads or
updates are
  allowed against the restored database.
nope, what restoring does, is just running all the commands in the
pg_dump (whether it is binary or textual). So as soon as the database
is created, it is treated just as any connection, thus allows you to
connect and use it.


  What locking mechanism is used for Master-Slave Replication?

master slave that's introduced in what's to be 9.0 (aka 8.5), uses WAL
shipping. So it doesn't require any extra locking.



--
GJ




--
John J. Mitchell




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Backup strategies with significant bytea data

2010-01-12 Thread Ivan Voras


Leigh Dyer wrote:

Hi,

For years now I've simply backed up my databases by doing a nightly
pg_dump, but since we added the ability for users to import binary files
in to our application, which are stored in a bytea fields, the dump
sizes have gone through the roof — even with gzip compression, they're
significantly larger than the on-disk size of the database. My guess is
that this due to the way that the binary data from the bytea fields is
encoded in the dump file when it's produced.


Have you tried another dump format? E.g. -F c argument to pg_dump?


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Something like Oracle Forms, but Open Source to use with PostgreSQL?

2009-12-21 Thread Ivan Voras


John R Pierce wrote:

Andre Lopes wrote:

Hi,

I need to know if there is something like Oracle Forms in the Open 
Source world that works with PostgreSQL.


If do you know something, please let me know.


perhaps OpenOffice Data could do what you need.  I'm not real familiar 
with Oracle Forms, but I know OOo Data can connect to most any database 
including postgres, and it can be used to build 'forms' based database 
applications with queries and reports...its somewhat akin to 
Microsoft Access in basic functionality.


Only in the sense that stone tablets are akin to e-ink. OOo Base is 
buggy and unusable.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SELECTing every Nth record for better performance

2009-12-04 Thread Ivan Voras


A. Kretschmer wrote:

In response to Tom :

I have a big table that is used for datalogging. I'm designing
graphing interface that will visualise the data. When the user is
looking at a small daterange I want the database to be queried for all
records, but when the user is 'zoomed out', looking at an overview, I
want run a query that skips every nth record and returns a managable
dataset that still gives a correct overview of the data without
slowing the programme down. Is there an easy way to do this that I
have overlooked? I looked at:



Do you have 8.4? If yes:

test=# create table data as select s as s from generate_Series(1,1000) s;
SELECT



test=*# select s from (select *, row_number() over (order by s) from
data) foo where row_number % 3 = 0 limit 10;


Won't this still read in the entire table and only then filter the 
records out?



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] PREPARE query with IN?

2009-08-05 Thread Ivan Voras

Is it possible to prepare a query with the IN clause in a meaningful
way? I could probably do it with a hard-coded number of arguments, like
SELECT x FROM t WHERE y IN ($1, $2, $3) but that kind of misses the
point of using IN for my needs.

In any case, it would probably be a good idea to add a sentence about it
on http://www.postgresql.org/docs/8.3/interactive/sql-prepare.html since
I'm probably not the first one to ask about it :)


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PREPARE query with IN?

2009-08-05 Thread Ivan Voras

Filip Rembiałkowski wrote:
 fi...@filip=# prepare sth(int[]) as select * from ids where id = ANY($1);
 PREPARE
 
 fi...@filip=# execute sth('{1,2,3}');
  id |   t
 +---
   1 | eenie
   2 | menie
   3 | moe
 (3 rows)

Thanks!

 2009/8/6 Ivan Voras ivo...@freebsd.org mailto:ivo...@freebsd.org
 
 Is it possible to prepare a query with the IN clause in a meaningful
 way? I could probably do it with a hard-coded number of arguments, like
 SELECT x FROM t WHERE y IN ($1, $2, $3) but that kind of misses the
 point of using IN for my needs.
 
 In any case, it would probably be a good idea to add a sentence about it
 on http://www.postgresql.org/docs/8.3/interactive/sql-prepare.html since
 I'm probably not the first one to ask about it :)
 
 
 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org
 mailto:pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general
 
 
 
 
 -- 
 Filip Rembiałkowski
 JID,mailto:filip.rembialkow...@gmail.com
 mailto:filip.rembialkow...@gmail.com
 http://filip.rembialkowski.net/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are there plans to add data compression feature to postgresql?

2008-11-05 Thread Ivan Voras

Peter Eisentraut wrote:
 Craig Ringer wrote:
 So - it's potentially even worth compressing the wire protocol for use
 on a 100 megabit LAN if a lightweight scheme like LZO can be used.
 
 LZO is under the GPL though.

But liblzf is BSD-style.

http://www.goof.com/pcg/marc/liblzf.html



signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] PostgreSQL vs FreeBSD 7.0 as regular user

2008-07-26 Thread Ivan Voras


Greg Smith wrote:

On Sat, 26 Jul 2008, Zoltan Boszormenyi wrote:


Zoltan Boszormenyi �rta:

These three settings were also set with sysctl -w ... to take effect
immediately.

Rebooting FreeBSD solved it. Huh? Is it really like W#@$#?


Looks like the PostgreSQL documentation here ( 
http://www.postgresql.org/docs/current/static/kernel-resources.html ) is 
now outdated.  From http://www.manpages.info/freebsd/sysctl.8.html :


The -w option has been deprecated and is silently ignored.


It's not ignored as in the whole command will have no effect. Only the 
switch itself is ignored. So,


sysctl -w a.b=c

is 100% equivalent to

sysctl a.b=c

Looks like the correct thing to do here now is to edit the 
/etc/sysctl.conf file, then issue:


/etc/rc.d/sysctl reload

to use those values without needing to reboot.  (I don't actually have 
such a machine to check for sure).


Yes, that will work. Sysctl.conf will not reload itself :)




signature.asc
Description: OpenPGP digital signature

[GENERAL] Simple row serialization?

2008-01-26 Thread Ivan Voras


Hi,

I'd like to implement some simple data logging via triggers on a small 
number of infrequently updated tables and I'm wondering if there are 
some helpful functions, plugins or idioms that would serialize a row 
(received for example in a AFTER INSERT trigger) into a string that I'd 
store in the log table. There's a limited number of field types 
involved: varchars, integers and booleans. I'm not looking for anything 
fancy, comma-separated string result would be just fine; even better, 
something like a dictionary (field_name:field_value,...) would be 
nice. The reason for trying to do it this way is that want to have a 
single log table to log many tables (again, they are infrequently 
updated). I need this for PostgreSQL 8.1.


I got suggestions to try composite types but I don't think they could be 
useful for this. What I need is possibly a generic row type (any and 
record generate syntax error in CREATE TABLE) - any ideas on where to 
start looking?






signature.asc
Description: OpenPGP digital signature

Re: [GENERAL] Simple row serialization?

2008-01-26 Thread Ivan Voras


Adam Rich wrote:

I'd like to implement some simple data logging via triggers on a small
number of infrequently updated tables and I'm wondering if there are
some helpful functions, plugins or idioms that would serialize a row



If you're familiar with perl, you can try PL/Perl.


Thanks, but another solution has been suggested to me, much simpler:

create or replace function data_log() returns trigger as $$
declare
sdata   text;
begin
sdata = new;
insert into log(data) values (sdata);
return NULL;
end;
$$ language plpgsql;

create trigger data_insert after insert on data
for each row execute procedure data_log();

(from idea by Tom Lane)



signature.asc
Description: OpenPGP digital signature

[GENERAL] Recovering / undoing transactions?

2007-11-07 Thread Ivan Voras

Hi,

About a month or so ago I read a blog entry or an article which seems to
have described a method, using dirty hackery with pg_resetxlog and
possibly other tools, to forcibly undo the database to a previous
state. The problem described was that some employee had executed a
DELETE or UPDATE without WHERE or something like it in autocommit
mode and the goal was to undo it.

I can't find the article now so can someone describe the technique here
or point to the article?

(I'm possibly misremembering important details about the article so the
correct answer to my question could be no, it can't be done).


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [GENERAL] Recovering / undoing transactions?

2007-11-07 Thread Ivan Voras

On 07/11/2007, Tom Lane [EMAIL PROTECTED] wrote:

 It's not really possible to do that.  The blogger might've thought he'd
 accomplished something but I seriously doubt that his database was
 consistent afterward.  You can go back in time using PITR, if you had
 the foresight and resources to set up continuous archiving, but just
 whacking pg_xlog around is far from sufficient.

Ok, just to verify I'm thinking about it in the right way: in
abstract, with PITR, I would need a known-good starting point (e.g. a
full backup) + files from pg_xlog created from the time of the
starting-point, then restore the starting-point backup and then
restore from PITR/xlog up to the point I want?

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

86 matches

Mail list logo