Re: [GENERAL] dblink password required

2014-10-25 Thread Oliver Kohll - Mailing Lists
 
 select * from dblink(‘dbname=database2 username=db_link 
 password=mypassword','select username, email from appuser') as t1(username 
 text, email text);:
 
 I think the problem is the above-  ^^^
 
 username=db_link should be user=db_link
 
 The accepted keywords are here:
 http://www.postgresql.org/docs/9.3/static/libpq-connect.html#LIBPQ-CONNSTRING

Yes that’s it, thanks.

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] dblink password required

2014-10-24 Thread Oliver Kohll - Mailing Lists
Hi,

I’m trying to use dblink to connect from database1 to database2 on the same 
machine. The Postgres version is 9.3.5.

Firstly, I’ve created a user to use with the connection, while logged in as the 
postgres superuser to database2:

 create user db_link with password ‘mypassword’;
 grant select on appuser to db_link;

Secondly, the following has been added to pg_hba.conf and a restart done.

# dblink between database1 and database2
local database2   db_link   md5
host database2db_link   127.0.0.1/32md5
host database2db_link   ::1/128 md5

Connecting with PSQL prompts for a password as expected:

 psql -U db_link -d database2
Password for user db_link:
psql (9.3.5)
Type help for help.

Then, as a normal user logged into database1, I tried

 select * from dblink(‘dbname=database2 username=db_link 
 password=mypassword','select username, email from appuser') as t1(username 
 text, email text);

However this returns

ERROR:  password is required
DETAIL:  Non-superusers must provide a password in the connection string.

This is my first time trying dblink so there’s probably something simple I’m 
missing, but I thought I had provided a password. Any ideas?

Regards
Oliver Kohll

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Crash in 9.4 Beta when partially collapsing left outer joins

2014-09-08 Thread lists
I think this is the first time I've ever reported a PG crash, which is notable 
since I've been using PG for over 10 years. ;) 

Using the 9.4 Beta RPMs on CentOS 6.X/64, we're experiencing a reproducible 
crash when running a query with a left outer join, partially collapsed. 

TRAP: FailedAssertion(!(!restriction_is_or_clause((RestrictInfo *) orarg)), 
File: indxpath.c, Line: 1213)
 2014-09-08 14:21:33.179 PDT LOG:  server process (PID 19957) was terminated 
by signal 6: Aborted
 2014-09-08 14:21:33.179 PDT DETAIL:  Failed process was running: SELECT
students.id
FROM students
LEFT OUTER JOIN enrollments ON
(
enrollments.students_id = students.id
)
WHERE
(
students.id = 5008
OR
(
(
students.birthcity = 'Chico'
OR students.birthcity IS NULL
)
AND enrollments.start  20141219
)
);
 2014-09-08 14:21:33.179 PDT LOG:  terminating any other active server 
processes
 2014-09-08 14:21:33.179 PDT WARNING:  terminating connection because of 
crash of another server process
 2014-09-08 14:21:33.179 PDT DETAIL:  The postmaster has commanded this 
server process to roll back the current transaction and exit, because another 
server process exited abnormally and possibly corrupted shared memory.


Here's the smallest query I could find that it would crash on. Run on a blank 
database, the following will reliably crash postgres: 

CREATE TABLE students
(id SERIAL PRIMARY KEY, 
birthcity VARCHAR DEFAULT NULL); 

CREATE TABLE enrollments 
(students_id INTEGER NOT NULL REFERENCES students(id), 
start INTEGER);

SELECT 
students.id 
FROM students 
LEFT OUTER JOIN enrollments ON 
(
enrollments.students_id = students.id
) 
WHERE 
(
students.id = 5008 
OR 
(
(
students.birthcity = 'Chico' 
OR students.birthcity IS NULL
) 
AND enrollments.start  20141219
)
);

---
Other environment stuff: 

[root@db1 pgsql]# rpm -qa | grep postg 
postgresql94-libs-9.4beta2-1PGDG.rhel6.x86_64
postgresql94-server-9.4beta2-1PGDG.rhel6.x86_64
postgresql94-devel-9.4beta2-1PGDG.rhel6.x86_64
postgresql92-libs-9.2.9-1PGDG.rhel6.x86_64
postgresql94-9.4beta2-1PGDG.rhel6.x86_64
postgresql94-contrib-9.4beta2-1PGDG.rhel6.x86_64

[root@db1 pgsql]# uname -a 
Linux db1.schoolpathways.com 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul 31 
17:20:51 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux


What other information should I provide? We have the machine available if 
necessary. 

Benjamin Smith 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Detect streaming replication failure

2014-07-17 Thread Lists

For reference:
https://wiki.postgresql.org/wiki/Streaming_Replication

Assume a master - slave streaming replication configuration, Postgresql 
9.2.
Assume that the master has been chugging away, but the slave PG service 
has been offline
for a while and the wal archive has updated enough that the slave cannot 
catch up.


When I start the slave PG instance, pg launches and runs but doesn't 
update. It also doesn't seem to throw any errors. The only outward sign 
that I can see that anything is wrong is that 
pg_last_xlog_replay_location() doesn't update. I can look in 
/var/lib/pgsql/9.2/data/pg_log/postgresql-Thu.csv and see errors there EG:


2014-07-17 22:38:23.851 UTC,,,21310,,53c8505f.533e,2,,2014-07-17 
22:38:23 UTC,,0,FATAL,XX000,could not receive data from WAL stream: 
FATAL:  requested WAL segment 000700050071 has already been 
removed


Is that the only way to detect this condition? I guess I'm looking for 
something like


select * from pg_is_replicating_ok();
1

on the slave. At the moment, it appears that I can either parse the log 
file, or look for pg_last_xact_replay_timestamp()  acceptable threshold 
minutes in the past.


http://www.postgresql.org/docs/9.2/static/functions-admin.html

Thanks,

Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Natural key woe

2014-05-13 Thread Oliver Kohll - Mailing Lists
I'm sure no one else on this list has done anything like this, but here's a 
cautionary tale.

I wanted to synchronise data in two tables (issue lists) - i.e. whenever a 
record is added into one, add a similar record into the other. The two tables 
are similar in format but not exactly the same so only a subset of fields are 
copied. Both tables have synthetic primary keys, these can't be used to match 
data as they are auto-incrementing sequences that might interfere. What I could 
have done perhaps is get both tables to use the same sequence, but what I 
actually did is:

* join both tables based on a natural key
* use that to copy any missing items from table1 to table2
* truncate table1 and copy all of table2's rows to table1
* run this routine once an hour

The natural key was based on the creation timestamp (stored on insert) and the 
one of the text fields, called 'subject'.

The problem came when someone entered a record with no subject, but left it 
null. When this was copied over and present in both tables, the *next* time the 
join was done, a duplicate was created because the join didn't see them as 
matching (null != null).

So after 1 hour there were two records. After two there were four, after 3, 8 
etc.

When I logged in after 25 hrs and noticed table access was a little slow, there 
were 2^25 = 33 million records.

That's a learning experience for me at least. It's lucky I did check it at the 
end of that day rather than leaving it overnight, otherwise I think our server 
would have ground to a halt.

One other wrinkle to note. After clearing out these rows, running 'VACUUM 
table2', 'ANALYZE table2' and 'REINDEX table table2', some queries with simple 
sequence scans were taking a few seconds to run even though there are only a 
thousand rows in the table. I finally found that running CLUSTER on the table 
sorted that out, even though we're on an SSD so I would have thought seeking 
all over the place for a seq. scan wouldn't have made that much difference. It 
obviously does still make some.

Oliver Kohll
www.agilebase.co.uk






-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] SSD Drives

2014-04-04 Thread Lists

On 04/02/2014 02:55 PM, Bret Stern wrote:

Care to share the SSD hardware you're using?

I've used none to date, and have some critical data I would like
to put on a development server to test with.

Regards,

Bret Stern


SSDs are ridiculously cheap when you consider the performance 
difference. We saw at *least* a 10x improvement in performance going 
with SATA SSDs vs. 10k SAS drives in a messy, read/write environment. 
(most of our tests were 20x or more) It's a no-brainer for us.


It might be tempting to use a consumer-grade SSD due to the significant 
cost savings, but the money saved is vapor. They may be OK for a dev 
environment, but you *will* pay in downtime in a production environment. 
Unlike regular hard drives where the difference between consumer and 
enterprise drives is performance and a few features, SSDs are different 
animals.


SSDs wear something like a salt-shaker. There's a fairly definite number 
of writes that they are good for, and when they are gone, the drive will 
fail. Like a salt shaker, when the salt is gone, you won't get salt any 
more no matter how you shake it.


So, spend the money and get the enterprise class SSDs. They have come 
down considerably in price over the last year or so. Although on paper 
the Intel Enterprise SSDs tend to trail the performance numbers of the 
leading consumer drives, they have wear characteristics that mean you 
can trust them as much as you can any other drive for years, and they 
still leave spinning rust far, far behind.


Our production servers are 4x 1U rackmounts with 32 cores, 128 GB of ECC 
RAM, and SW RAID1 400 GB SSDs in each. We back up all our databases 
hourly, with peak volume around 200-300 QPS/server with a write ratio of 
perhaps 40%, and a iostat disk utilization at about 10-20% in 5 second 
intervals.


-Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] vacuum vs pg_repack for clearing bloat?

2014-01-15 Thread Lists
Our app makes extensive use of temp tables, and this causes a 
significant amount of bloat that can often only be cleared with a manual 
vacuum process. We're looking for a better way that doesn't involve 
locking, we found pg_repack and pg_reorg  and were wondering if anybody 
here could weigh in on using this instead of using vacuum?


pg_repack:
https://github.com/reorg/pg_repack

pg_reorg
http://reorg.projects.pgfoundry.org/

Thanks in advance,

Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] vacuum vs pg_repack for clearing bloat?

2014-01-15 Thread Lists

On 01/15/2014 04:24 PM, Tom Lane wrote:

Lists li...@benjamindsmith.com writes:

Our app makes extensive use of temp tables, and this causes a
significant amount of bloat that can often only be cleared with a manual
vacuum process. We're looking for a better way that doesn't involve
locking, we found pg_repack and pg_reorg  and were wondering if anybody
here could weigh in on using this instead of using vacuum?

A temp table is only accessible to the owning process, so if you're hoping
for vacuuming of it to happen silently in background, you'll be sadly
disappointed.  The speed advantage of a temp table come exactly from not
having to worry about concurrent access, so this isn't a tradeoff that can
easily be adjusted.

regards, tom lane


Tom,

The process(es) creating the temp tables are not persistent, so the 
issue isn't trying to clean up bloat from a long running process, it's 
clearing out the cruft that results from creating temp tables, loading a 
bunch of data, then dropping the table, either explicitly or when the 
connection is terminated. This causes PG disk usage to climb without 
causing any change in pg_dump output.


I was wondering if anybody else had used either of these projects 
(pg_repack or pg_reorg, though reorg seems to be unsupported) and if so, 
how successful they had been.


-Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Making substrings uppercase

2013-09-10 Thread Oliver Kohll - Mailing Lists
On 9 Sep 2013, at 21:03, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 select string_agg(case when words like '*%*' then upper(btrim(words, '*')) 
 else words end, ' ')
 from regexp_split_to_table('The *quick* *brown* fox jumped over the *lazy* 
 dog', ' ') as words;
 
  string_agg  
 --
 The QUICK BROWN fox jumped over the LAZY dog

That's quite elegant. In the end I exported and used PERL, as some of my 
'words' had spaces (they were ingredients like monosodium glutamate), but you 
could probably do a more complex regex in regexp_split_to_table to cope with 
that, or use pl/perl as previously suggested.

Thanks
Oliver
www.agilebase.co.uk

[GENERAL] Making substrings uppercase

2013-09-09 Thread Oliver Kohll - Mailing Lists
Hello,

Given a string with certain words surrounded by stars, e.g.

The *quick* *brown* fox jumped over the *lazy* dog

can you transform the words surrounded by stars with uppercase versions, i.e.

The QUICK BROWN fox jumped over the LAZY dog

Given text in a column sentence in table sentences, I can mark/extract the 
words as follows:

SELECT regexp_replace(sentence,'\*(.*?)\*','STARTUPPER\1ENDUPPER','g') FROM 
sentences;

but my first attempt at uppercase transforms doesn't work:

select regexp_replace(sentence,'\*(.*?)\*','' || upper('\1'),'g') from 
sentences;

I thought of using substring() to split the parts up after replacing the stars 
with start and end markers, but that would fail if there was more than one word 
starred.

Any other ideas?

Oliver

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Making substrings uppercase

2013-09-09 Thread Oliver Kohll - Mailing Lists

On 9 Sep 2013, at 14:41, David Johnston pol...@yahoo.com wrote:

 Oliver Kohll - Mailing Lists wrote
 select regexp_replace(sentence,'\*(.*?)\*','' || upper('\1'),'g') from
 sentences;
 
 Yeah, you cannot embed a function-call result in the replace with section;
 it has to be a literal (with the group insertion meta-sequences allowed of
 course).
 
 I see two possible approaches.
 
 1) Use pl/perl (or some variant thereof) which has facilities to do just
 this.
 2) Use regexp_matches(,,'g') to explode the input string into its components
 parts.  You can explode it so every character of the original string is in
 the output with the different columns containing the raw and to modify
 parts of each match.  This would be done in a sub-query and then in the
 parent query you would string_agg(...) the matches back together while
 manipulating the columns needed i.e., string_agg(c1 || upper(c3))
 
 HTH
 
 David J.
 

I see, I'm going with Perl, thanks.

Oliver

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] replication stops working

2013-07-08 Thread Daniel Serodio (lists)

John DeSoi wrote:

I have a 9.2 hot standby setup with replication via rsync. For the second time, 
it has stopped working with no apparent error on the primary or standby. Last 
time this happened I fixed it by restarting the primary. Yesterday I started a 
new base backup around noon and it replicated without any problems for about 12 
hours. Then it just stopped and I don't see any errors in the Postgres log 
(primary or standby). I looked at other system logs and still don't see any 
problems.

I'm running Postgres 9.2.4 on CentOS 6.4. Thanks for any ideas or debug 
suggestions.

John DeSoi, Ph.D.

=

wal_level = hot_standby
wal_keep_segments = 48
max_wal_senders = 2

archive_mode = on
archive_command = 'rsync --whole-file --ignore-existing --delete-after -a %p 
bak-postgres:/pgbackup/%f'
archive_timeout = 300

If there are no errors in the log, how did you conclude that replication 
has stopped working? Since you're using a hot standby, you've also setup 
streaming replication in addition to the WAL archiving, correct?


Regards,
Daniel Serodio



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] AWS, cascading replication and WAL archiving

2013-07-03 Thread Daniel Serodio (lists)
I currently have a working 9.2 master + hot standby setup, using 
asynchronous replication and WAL archiving (via a shared NFS mount), 
running in our colocated datacenter.


I need to migrate this to AWS, with as little downtime as possible. My 
plan is to create two cascading standbys, daisy-chained like this: 
master (M) - primary standby (S1) - secondary standby (S2) - tertiary 
standby (S3), and at migration time, promote S2 to master and then drop 
both M and S1 (hope this explanation make sense).


WAL-E[1] seems like a perfect solution for WAL archiving on AWS, so I've 
set the master's archive_command to archive both on NFS (so I don't 
break the current setup) and on Amazon S3 (using WAL-E) so S2 and S3 can 
restore from it.


Q1) Is this a good migration strategy?

Q2) Should I promote S2 before killing M, or should I kill M before 
promoting S2?


Q2) Should S2 and S3 read from the same WAL archive, that's initially 
written to from M and by S2 when it gets promoted to master; or should I 
have two separate WAL archives to avoid conflicts


Q3) How should I set S2 and S3's recovery.conf so S3 automatically 
follows S2 when promoted to master? recovery_target_timeline = latest ?



Thanks in advance,
Daniel Serodio



[1] https://github.com/wal-e/wal-e


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Cannot connect to remote postgres database

2013-07-03 Thread Daniel Serodio (lists)

Stephen Carville wrote:

I have been asked to evaluate Oracle, mysql and postgresql as a possible
replacement for our existing Oracle and MsSQL databases. Oracle and
mysql I pretty much have covered.  Postgresql, OTOH, is somewhat less
cooperative.

I have the software (v 8.4.13) installed on 64 bit Centos 6. It is
listening on all available interfaces and netstat confirms this.  I
created an additional user for the postgres db:
If you want to evaluate PostgreSQL, you should evaluate v9.2. 8.4 is 
pretty ancient and lacks lots of cool features so your comparison won't 
be fair to PostgreSQL.

postgres=# \du
 List of roles
  Role name | Attributes  | Member of
---+-+---
  postgres  | Superuser   | {}
: Create role
: Create DB
  stephen   | Superuser   | {}
: Create role
: Create DB

I assigned passwords using alter role etc..
Which exact ALTER ROLE did you use? Feel free to redact the actual 
password, of course.

The problem is that no authentication method except trust seems to work.

in pg_hba.conf:

local   all all   trust
hostall all 198.204.114.0/24  md5

I've tried both of the above users and get the same error each time:

psql: FATAL:  password authentication failed for user username

I tried changing md5 to password and pam without success. Onlt
trust works As near as I can tell by reading the documentation, it is
setup correctly but I have, obviously, done something wrong.
md5 is the standard. password is plain text (which you don't want) 
and pam will try to authenticate against OS users, which is probably 
not what you want.

Any hints on where to start looking?
Is there any NAT happening between the client and the server? Check the 
server's log for a LOG:  connection received: host=x.x.x.x message so 
you can check which IP is reaching the server.


Regards,
Daniel Serodio


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-12 Thread Lists

On 11/10/2012 02:21 PM, Jeff Janes wrote:

On Fri, Nov 9, 2012 at 4:28 PM, Lists li...@benjamindsmith.com wrote:


...


3) For each of the tables from #2, run the commands
REINDEX TABLE $table;
VACUUM FULL ANALYZE $table;

The end result is a squeaky-clean database server with expected disk usage.

NOTES:

...


2) It was sheer chance that I discovered the need to reindex prior to vacuum
in order to get the disk space back.

As of 9.0, a vacuum full inherently does a reindex, so doing an
explicit one is neither necessary nor beneficial.

I don't know if your discovery is based on a non-full vacuum, or on an
older server.


I can only state that merely doing a vacuum full or vacuum full 
$tables sequentially did not free the space, whereas the sequential  
reindex $table, each followed immediately by a vacuum full $table) did. 
If you'd like I can easily recreate the scenario by simply not cleaning 
up one of the DB servers until it bloats up and make available (limit 
distribution) a binary copy of the database (EG: rsync the 
/var/lib/pgsql/ filesystem late at night) in order to help identify why 
it didn't work as expected.





5) I don't yet know if the full option for the vacuum is necessary to free
up all space. I will experiment with this and post results if useful.

The answer to this is mostly non-deterministic.  non-full vacuum can
only free space from the end of the table.

If all of your long-lived objects were created before pg_attribute got
bloated and so the bloat was due only to short-lived objects, then
non-full vacuum (if run often enough) should eventually be able to
return that space as the short-lived objects near the end start to go
away.  However, if even a single long-live object finds itself at the
end of the table, then only a vacuum full will ever be able to reclaim
that space.



Since the time period involved (weeks/months) would have included both a 
large number of created/destroyed temp tables and occasionally altered 
persistent objects it would appear that the full option a very good 
idea, at least periodically.


-Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Enabling Autovacuum Postgres 9.1 (was Unexpectedly high disk space usage)

2012-11-12 Thread Lists
The good news is that we have now resolved our critical problem (disk 
space overuse) with a somewhat hackish, slow answer that is nonetheless 
good enough for now.


Now I'd like to work out how to get autovacuum to work smoothly within 
our cluster. I'm happy to try to clarify my notes and post them either 
here or on the PG wiki as I get them resolved.


There's a wealth of how to tune PG instruction that's old and (based on 
this thread alone) often stale enough to be classified as 
disinformative. For example, nearest I can tell, the entirety of this 
page is just wrong and/or irrelevant for 9.x and up: 
http://wiki.postgresql.org/wiki/VACUUM_FULL. In constrast to the advice 
to avoid Vacuum Full ( 
http://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT) 
comes the insight that vacuum full is necessary to clean up stale data 
that is not at the end of the table. (See Jeff Janes 11/10/2012 email)



non-full vacuum can
only free space from the end of the table.
This would imply that a full analyze is a good idea, at least 
periodically (weekly/monthly/quarterly) in a database that combines the 
use of temp tables and periodic changes to persistent objects. Does 
autovacuum ever do a full analyze? What about autovacuum and the 
reindex question at the end of this email?


On 11/10/2012 02:49 PM, Adrian Klaver wrote:


Seems to have changed in 8.3:

http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html

Beginning in PostgreSQL 8.3, autovacuum has a multiprocess 
architecture: There is a daemon process, called the autovacuum 
launcher, which is in charge of starting autovacuum worker processes 
for all databases. The launcher will distribute the work across time, 
but attempt to start one worker on each database every 
autovacuum_naptime seconds. One worker will be launched for each 
database, with a maximum of autovacuum_max_workers processes running 
at the same time...




Sadly, this change means that I can no be certain of the utility of the 
otherwise excellent-sounding advice originally offered by Scott, quoted 
below. It appears that naptime is (as of 9.x)  almost irrelevant since 
it's defined per database, and dropping this from 1 minute to 5 seconds 
would have very little overall impact.



These two can be killers.  Long running transactions can cause
autovacuum processes to stall out or be autocancelled.
Long running transactions - is now long? In our system it's rare to 
have a transaction (even a prepared transaction) last much longer than a 
few minutes. Is that enough time to cause problems with AutoVacuum?



As well, since the default nap time is 1 minute, it will take at least
50 minutes to vacuum each db as nap time is how long autovac waits
between databases.

Reducing autovacuum nap time to 5 or 10 seconds would be a good move
here, also possibly making it more aggressive by increasing max worker
threads, decreasing cost delay (possibly to zero or close to it) and /
or increasing cost limit.  After making such a change then watching
iostat when vacuum is running to see how hard its hitting your IO
subsystem.  I'm guessing that with SSDs it isn't gonna be a big
problem.

As Greg Smith has pointed out in the past, usually the answer to an
autovacuum problem is making it more, not less aggressive.  Unless
you're flooding your IO this is almost always the right answer.  Keep
in mind that autovacuum by default is setup to be VERY unaggressive
because it may be running on a netbook for all it knows.

To tune autovacuum with 50 databases, start by dropping nap time to
something much lower, like 10s.  Then if you need to, drop cost delay
until you get to 0.  If you get to 0 and it's still not hitting your
IO too hard, but not keeping up, then increase cost limit.  If you get
to something in the 5000 to 1 range, and its still not keeping up
then start bumping the thread count


Should I increase the max_workers field from the default of 3 to 
(perhaps) 10? Noting that my solution to the disk space problem is 
effectively a max_worker of 1 since it's all done sequentially, I wonder 
if reducing max_workers would actually be better?


Also, what's the thread count ? Is that max_workers?

Why would I want to reduce the cost delay to 0, and how does this relate 
to cost_limit? Careful reading of the docs: 
http://www.postgresql.org/docs/9.1/static/runtime-config-resource.html 
makes me believe that, given my substantial I/O subsystem, I'd want to 
drop cost_delay to near zero and set the cost_limit really high, which 
is a rough restatement of the last quoted paragraph above. (I think)


Assuming that I make these suggestions and notice a subsequent system 
load problem, what information should I be gathering in order to provide 
better post-incident forensics? We have statistics turned on, and aren't 
using replication. (yet)


Lastly, there's the question of reindexing before full vacuum. I've 
observed that not doing a manual reindex prior to 

Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-12 Thread Lists

On 11/12/2012 01:31 PM, Jeff Janes wrote:

On Mon, Nov 12, 2012 at 10:38 AM, Lists li...@benjamindsmith.com wrote:

On 11/10/2012 02:21 PM, Jeff Janes wrote:

On Fri, Nov 9, 2012 at 4:28 PM, Lists li...@benjamindsmith.com wrote:


2) It was sheer chance that I discovered the need to reindex prior to
vacuum
in order to get the disk space back.

As of 9.0, a vacuum full inherently does a reindex, so doing an
explicit one is neither necessary nor beneficial.

I don't know if your discovery is based on a non-full vacuum, or on an
older server.


I can only state that merely doing a vacuum full or vacuum full $tables
sequentially did not free the space, whereas the sequential  reindex $table,
each followed immediately by a vacuum full $table) did.

With what version?

[root@alpha ~]# rpm -qi postgresql91-server
Name: postgresql91-server  Relocations: (not relocatable)
Version : 9.1.5 Vendor: (none)
Release : 3PGDG.rhel6   Build Date: Sun 02 Sep 2012 
12:13:18 PM UTC
Install Date: Wed 12 Sep 2012 03:04:41 AM UTC  Build Host: 
koji-sl6-x86-64-pg91
Group   : Applications/DatabasesSource RPM: 
postgresql91-9.1.5-3PGDG.rhel6.src.rpm

Size: 15191132 License: PostgreSQL
Signature   : DSA/SHA1, Sun 02 Sep 2012 12:13:24 PM UTC, Key ID 
1f16d2e1442df0f8

URL : http://www.postgresql.org/





If you'd like I can
easily recreate the scenario by simply not cleaning up one of the DB
servers until it bloats up and make available (limit distribution) a binary
copy of the database (EG: rsync the /var/lib/pgsql/ filesystem late at
night) in order to help identify why it didn't work as expected.

Do you think can make an easily script-able way to re-create the
resistant bloat?  That would be better than trying to disseminate
binary files, I think.
It would only be better if it actually created the situation that caused 
the space to not be freed. But, until you know the actual cause of a 
problem, I've found that it's often not productive to create simulations 
that may or may not be actually related to the problem.




What I did was just create and drop temp tables in a tight loop, with
autovacuum off, and then once pg_attribute got good and bloated, did a
vacuum full as the database owner or superuser.
Based on my understanding, if your loop included an intermittent schema 
change from within a transaction it might better approximate my actual 
scenario. Merely creating temp tables and then dropping them would 
create lots of activity at the end of the table which would free 
correctly. This still does not explain why reindex $table works when 
reindex is supposedly implicit in the vacuum.



If all of your long-lived objects were created before pg_attribute got
bloated and so the bloat was due only to short-lived objects, then
non-full vacuum (if run often enough) should eventually be able to
return that space as the short-lived objects near the end start to go
away.  However, if even a single long-live object finds itself at the
end of the table, then only a vacuum full will ever be able to reclaim
that space.


Since the time period involved (weeks/months) would have included both a
large number of created/destroyed temp tables and occasionally altered
persistent objects it would appear that the full option a very good idea, at
least periodically.

If you can prevent the extreme bloat from occurring in the first
place, then the end of the table would not be so far away from its
desired size that it needs to get reset by a vacuum full.

If you find your self in need of a vacuum full, then you should do
one.  But you should ask yourself what went wrong that you got into
that situation in the first place.

I agree; this is why my questions on enabling autovacuum in a related 
thread.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-11 Thread Lists

On 11/09/2012 05:26 PM, Steve Crawford wrote:
Bloat in pg_attribute would correlate with A) (or any constant 
creation/destruction of tables). You can vacuum and/or reindex the 
system tables if you are connected as the superuser but you are better 
off preventing bloat by appropriate adjustment of your configuration 
settings. However note that if you do frequent bulk 
creation/destruction of tables you could end up bloating the attribute 
table between vacuum runs and may need to periodically manually shrink 
it.





Steve,

Our system divides customers into distinct databases, however customers 
are often clustered. (Think: different locations of 7/11) and so we have 
to aggregate data from different databases. We do this with dblink to 
get the data and temp tables to collate it, which appears to be a cause 
of the bloat we're seeing.


-Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage RESOLVED (Manual reindex/vacuum)

2012-11-09 Thread Lists
As I've spent a considerable amount of time trying to sort this out, I'm 
posting it for the benefit other users.


I've experienced persistent, ongoing issues with autovacuum in a mixed 
read/write environment with midrange hardware (16 core Xeon, 128 GB RAM, 
200 GB SATA3 6 Gb SSDs for disk I/O on RHEL6) using the defaults in the 
9.1 RPMs provided by Postgres. (yum.postgresql.org) The cause of this is 
not yet determined. It may be related to the any or all of the 
combination of:


A) extensive use of temp tables;
B) extensive use of multiple dblink() calls in a single query;
C) use of transactions, especially prepared transactions and 
multiple savepoints;

D) concurrent use of pg_dump;
E) use of numerous databases on a single server, average about 50;

To offset this, we turned off autovacuum, and used an old script to 
vacuum the tables in the middle of the night when nobody was looking. 
Unfortunately, the vacuum script only vacuumed the userland tables and 
tremendous amounts of disk space were being wasted, particularly in the 
pg_attribute tables.


However, use of any of the statements vacuum analyze, vacuum full 
analyze, vacuum full verbose analyze without mentioning specific 
tables did not resolve the extra disk space used issue, disk usage still 
remained at least 5x the expected amount in all cases. (in one case, use 
of all of these open-ended vacuum queries did almost nothing)


Nor did running any variation of vacuum analyze $table in a loop thru 
all tables (including the pg_* tables) completely resolve the issue, 
either.


In order to completely clean things up, we ended up writing a script do 
the following:


1) Determine the databases using excessive disk space, in descending 
order of use with this query:


SELECT
d.datname as Name,
d.datistemplate::int AS datistemplate,
pg_catalog.pg_get_userbyid(d.datdba) as Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE -1
END as Size
FROM pg_catalog.pg_database d
order by
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END desc nulls first;

2) For each database from #1, get a list of tables to be cleaned up with 
this query:


SELECT
nspname || '.' || relname AS table,
pg_total_relation_size(C.oid) AS size
FROM
pg_class C
LEFT JOIN pg_namespace N ON
(N.oid = C.relnamespace) WHERE (true or (nspname NOT IN 
('pg_catalog', 'information_schema')))

AND C.relkind = 'r'
AND nspname !~ '^pg_toast'
ORDER BY
pg_total_relation_size(C.oid) DESC;

3) For each of the tables from #2, run the commands
REINDEX TABLE $table;
VACUUM FULL ANALYZE $table;

The end result is a squeaky-clean database server with expected disk usage.

NOTES:

1) The above queries are derived from queries found to determine how 
much disk space was used, even though the additional information 
provided isn't actually used by the script.


2) It was sheer chance that I discovered the need to reindex prior to 
vacuum in order to get the disk space back.


3) I'd like to get autovacuum to work. I've read suggestions to tweak 
cost_delay and/or cost_limit. I haven't yet determined if the problem is 
I/O based or lock/deadlock based. I'm guessing the problem is the 
latter, though it's hard to tell because queries stack up quickly and 
load average is sky high when autovacuum fails for us.


4) The aforementioned process is S-L-O-W. Think at least hours and 
probably days depending on your databases, your server(s), and the load.


5) I don't yet know if the full option for the vacuum is necessary to 
free up all space. I will experiment with this and post results if useful.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-08 Thread Lists

On 11/07/2012 12:42 PM, Tom Lane wrote:

... because it
occasionally causes transactions and queries to hang when an update
causes a vacuum mid-day, effectively taking us offline randomly.

I suspect this claim is based on ancient and no longer very relevant
experience.
Even so, if I felt the need to keep autovacuum off, what would I need to 
run regularly in order to keep things neat and tidy under the hood? 
Would a simple vacuum within each database suffice? Should I be logged 
in as the database owner or as an administrative user?


Thanks,

Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Lists

On 11/07/2012 09:01 AM, Jeff Janes wrote:

Ben, did you ever figure out where the space was going?


I think we've found where the space is going, but I still don't yet know 
how to resolve it. I modified your query thusly in order to get a total 
of space used, and got an answer that matches closely:


with stuff as (SELECT d.datname as Name, 
pg_catalog.pg_get_userbyid(d.datdba) as Owner,

CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE -1
END as Size
FROM pg_catalog.pg_database d
order by
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END desc nulls first) SELECT sum(size) AS overall from stuff;
Result: 171,276,369,124

# du -sbc /var/lib/pgsql/9.1/data/*
Result: 172,087,129,512

Now, the question is, I see several databases that uses disk usage with 
sizes that are dramatically different than I get from a dump/restore to 
another machine:


Production:
 santarosa444| postgres | 44 GB

Dump/Restore:
 santarosa444| postgres | 685 MB

Now, here's where it gets weird. From the disk space usage wiki, 
(http://wiki.postgresql.org/wiki/Disk_Usage) you'll recognize how I used 
it to get a total disk space used result:


with mytable AS (
SELECT
nspname || '.' || relname AS relation,
pg_total_relation_size(C.oid) AS size
FROM
pg_class C
LEFT JOIN pg_namespace N ON
(N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind  'i'
AND nspname !~ '^pg_toast'
ORDER BY
pg_total_relation_size(C.oid) DESC
)
SELECT sum(size) AS size FROM mytable

... but the total result is 747,569,152 which is close to the 
dump/restore value, not the production server value, even though I'm 
running this query on the production server. So there's *something* that 
the latter query isn't identifying that the former is.


On a hunch, ran this query:

with mytable AS (
SELECT
nspname || '.' || relname AS relation,
pg_total_relation_size(C.oid) AS size
FROM
pg_class C
LEFT JOIN pg_namespace N ON
(N.oid = C.relnamespace) WHERE (true or 
(nspname NOT IN ('pg_catalog', 'information_schema')))

AND C.relkind  'i'
AND nspname !~ '^pg_toast'
ORDER BY
pg_total_relation_size(C.oid) DESC
)
SELECT sum(size) AS size FROM mytable;

And the result is 46,771,216,384! Removing the mytable wrapper stuff, 
here are the top results:


 pg_catalog.pg_attribute  | 36727480320
 pg_catalog.pg_attrdef| 3800072192
 pg_catalog.pg_depend | 2665930752
 pg_catalog.pg_class  | 1508925440
 pg_catalog.pg_type   | 1113038848
 public.att_claims| 451698688
 public.stgrades  | 127639552
 pg_catalog.pg_index  | 107806720


Google returns this page: 
http://www.postgresql.org/docs/9.1/static/catalog-pg-attribute.html 
which doesn't help me much. So, am I doing something wrong with admin? 
Our current process is that every night in the middle of the night, a 
script connects to each database on each server and runs a query to get 
all tables in each database and, for each, run


VACUUM ANALYZE $table

for each table in the database.

And then once a week:
psql -U postgres -c \l | grep -Po (\w+444) | xargs -t -i psql -U 
postgres {} -c REINDEX DATABASE {};


(note: there is a database for the postgres user on each DB server) 
The script is a remnant from PG 8.x days, so am I missing something 
fundamental about 9.x? I will note that autovacuum is off because it 
occasionally causes transactions and queries to hang when an update 
causes a vacuum mid-day, effectively taking us offline randomly. Our 
scenario is pretty much a worst-possible case of transactions, prepared 
transactions, temp tables, and concurrent read/write queries.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Lists

On 11/07/2012 12:42 PM, Tom Lane wrote:

So you've turned off autovacuum, and are carefully not vacuuming the
system catalogs.  That's your problem all right.  Is there a
particularly good reason why this script isn't a one-liner VACUUM?


Back in the 8.x days, we experienced vacuum full analyze occasionally 
causing other processes to hang/timeout. In an attempt to minimize the 
impact of the locking, we updated the script to vacuum one table at a 
time, which seemed to work well throughout the 8.x series. I'd happily 
accept that this conclusion may have simply have been wrong, but it 
worked well enough that nobody complained and life was good. After 
switching to 9.x, we read that the full vacuum was less useful and so 
the script was changed to vacuum analyze $table rather than vacuum 
full analyze $table.



Are you sure that once-a-day vacuuming is sufficient, even if it was
covering the system catalogs?  If you've managed to bloat pg_attribute
to 36GB, I suspect you've got enough churn (perhaps from temp tables)
that you really need the catalogs vacuumed more often.
The only thing that I could find in the docs even mentioning the idea of 
vacuuming catalogs is this sentence:


(A manual VACUUM should fix the problem, as suggested by the hint; but 
note that the VACUUM must be performed by a superuser, else it will fail 
to process system catalogs and thus not be able to advance the 
database's datfrozenxid.)

http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html

This does NOT clearly say that the end user could vacuum catalogs, let 
alone that it's necessary or even a good idea. Otherwise, the only 
mention is of tables, and there's no mention of the idea that tables are 
anything but user space.



My advice is dump, reload, and *don't* turn off autovacuum.


... because it
occasionally causes transactions and queries to hang when an update
causes a vacuum mid-day, effectively taking us offline randomly.

I suspect this claim is based on ancient and no longer very relevant
experience.

We tried several times to turn on autovacuum with 9.1 and had problems 
every time. If our use case is particularly special, I'd love to work 
with you to get autovacuum to work in our situation too as it would make 
life easier for us! But for the past few months, every time we've turned 
it on, we've had our phones swamped with customers who are unable to use 
our system while our application monitors scream bloody murder, at least 
weekly.


From what we could tell (under extreme pressure to get it all working 
again ASAP, mind you) it seemed that when doing a large update from 
within a transaction, autovacuum would get triggered before the 
transaction completed, causing the transaction to hang or at least slow 
way down, causing timeouts to occur with load balancers, so customers 
would then try again, compounding the ongoing problem. Pretty soon you 
have not only I/O issues, but also locking issues and upset customers. 
This issue may be compounded because we make fairly extensive use of 
dblink and temp tables to aggregate data for our customers who have 
multiple sites.


-Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-07 Thread Lists

On 11/07/2012 12:58 PM, Scott Marlowe wrote:
My experience is that if autovac is causing problems with stalled 
queries etc you're either A: running ancient pg versions (pre 8.3), B: 
Running WAY too aggressive settings in autovac (100 threads, no nap 
time, cost limit of 10 etc.) or C: Your IO subsystem is absolute 
crap. On any modern server, default autovac settings from 8.3 and on 
should only have the possible problem of not being tuned aggressively 
enough. 


A) We are running PG 9.1.

B) We used the default settings in the RPMs provided by 
yum.postgresql.org. At the bottom of this message is information about 
the RPMs we currently are using.


C) I/O subsystem for PG is twin SATA III SSDs in a RAID 1 configuration, 
capable of tens of thousands of IO operations per second. Servers are 
recent, SATA III, 16-core Xeons with 128 GB ECC RAM in 1U rackmount cases.


As stated previously, we make extensive use of temp tables, 
transactions, and dblink, but had no trouble with catalog table bloat in 
8.x; this is a new phenomenon for us.


# rpm -qi postgresql91-9.1.5-3PGDG.rhel6.x86_64
Name: postgresql91 Relocations: (not relocatable)
Version : 9.1.5 Vendor: (none)
Release : 3PGDG.rhel6   Build Date: Sun 02 Sep 2012 
12:13:18 PM UTC
Install Date: Wed 12 Sep 2012 03:04:24 AM UTC  Build Host: 
koji-sl6-x86-64-pg91
Group   : Applications/DatabasesSource RPM: 
postgresql91-9.1.5-3PGDG.rhel6.src.rpm

Size: 5193673  License: PostgreSQL
Signature   : DSA/SHA1, Sun 02 Sep 2012 12:13:20 PM UTC, Key ID 
1f16d2e1442df0f8

URL : http://www.postgresql.org/
Summary : PostgreSQL client programs and libraries
Description :
PostgreSQL is an advanced Object-Relational database management system
(DBMS) that supports almost all SQL constructs (including
transactions, subselects and user-defined types and functions). The
postgresql package includes the client programs and libraries that
you'll need to access a PostgreSQL DBMS server.  These PostgreSQL
client programs are programs that directly manipulate the internal
structure of PostgreSQL databases on a PostgreSQL server. These client
programs can be located on the same machine with the PostgreSQL
server, or may be on a remote machine which accesses a PostgreSQL
server over a network connection. This package contains the command-line
utilities for managing PostgreSQL databases on a PostgreSQL server.

If you want to manipulate a PostgreSQL database on a local or remote 
PostgreSQL

server, you need this package. You also need to install this package
if you're installing the postgresql91-server package.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How to verify pg_dump files

2012-11-07 Thread Lists

On 11/07/2012 11:56 AM, Igor Neyman wrote:

The only 100% fool-proof test would be to restore from your backup files.

Regards,
Igor Neyman
Our internal process is to back up production databases regularly, and 
then use the backups offsite to populate  copies of databases for 
developer use. This allows us to test with real data, identifying real 
world bugs that would not appear with often-limited, manually created, 
sample data, as well as verify our backups on a regular, daily basis.


I'd strongly recommend something similar if it works for you.

-Ben


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-06 Thread Lists

Jeff, thanks for the feedback!

On 11/05/2012 08:51 PM, Jeff Janes wrote:

My first guesses would be things that are not directly under the
databases control, such as:

1) your server logs are accumulating and you aren't doing anything about them


I'm guessing that this is not the case:

[root@delta data]# du -shc *  | grep -i log
47M pg_clog
15M pg_log
641Mpg_xlog


2) you are taking backup snapshots to somewhere in that directory and
not cleaning them up


Our backup snapshots (taken with pg_dump) are taken on a different 
server over the network. Dumps are made several times during each day. 
Could this be part of the problem if (somehow) they didn't complete? And 
if so, would there be some cleanup I'd have to do other than restarting PG?



3) your archive_command is failing (which you should see reports of in
the server logs) and so you are accumulating xlog files.


As I understand things, the result above under 1) demonstrates that 
this, also, is not the cause.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unexpectedly high disk space usage

2012-11-06 Thread Lists
I followed your example, the result is at the bottom. Based on this it 
would seem that there are 3-4 databases that seem to be the culprit. How 
could I get more depth/detail on what specifically is the problem?


-Ben

On 11/05/2012 07:10 PM, Scott Marlowe wrote:
What does du -sh have to say about it? Use unix tools to examine your 
file system and see where the usage is going. For instance, I can do 
this: cd /var/lib/postgresql/8.4/main/ du -s *|sort -n 0 server.crt 0 
server.key 4 pg_tblspc 4 pg_twophase 4 PG_VERSION 4 postmaster.opts 4 
postmaster.pid 12 pg_clog 12 pg_stat_tmp 12 pg_subtrans 28 
pg_multixact 460 global 16392 pg_xlog 16396 base which tells me that 
I'm using about 16MB for each pg_xlog and base. I can then do cd into 
base and look around: cd base du -s *|sort -n 5416 1 5416 11563 5560 
11564 Which shows me using about 5MB each for three different dbs. And 
so on. On an off guess, did you go from a SQL_ASCII encoding to UTF8? 
That might increase disk space usage a bit. 


[root@delta ~]# cd /var/lib/pgsql/9.1/data/
[root@delta data]# du -s * | sort -n
4   pg_ident.conf
4   pg_serial
4   pg_tblspc
4   PG_VERSION
4   postmaster.opts
4   postmaster.pid
8   pg_hba.conf
12  pg_notify
12  pg_twophase
20  postgresql.300
20  postgresql.conf
20  postgresql.conf.20120903
20  postgresql.conf.300
76  pg_subtrans
104 pg_multixact
15044   pg_log
18184   global
25216   pg_stat_tmp
47916   pg_clog
671916 pg_xlog
164753204   base

[root@delta data]# cd base
[root@delta base]# du -s * | sort -n
4   pgsql_tmp
612412772
638812780
64241
72424   331506
72700   160676
72896   391655
73200   52389
73216   523672
74104   619675
74956   295646
76768   307580
77896   547597
80824   571547
87368   475799
90940   631604
113876  124651
123548  148525
130096  367533
149792  439726
173648  355578
175404  679545
190732  559580
225780  511706
326468  667547
352736  655477
398736  535644
469408  136582
483716  499753
513124  270926
575612  715601
590408  487780
04  463779
713208  643540
714896  583515
803216  343438
806952  427663
855156  739506
872200  197221
975692  64371
987692  775594
1005268 595488
1024812 691482
1042212 727552
1047464 379566
1260044 76601
1276756 16384
1345072 403667
1474468 209158
1477808 172604
1536168 221124
1637652 258798
1811504 88598
1963740 245588
2076748 703467
2193536 415671
2430908 801322
2552640 319552
2785212 28315
3454880 112612
3755548 451666
3929420 100666
4651876 40451
5714940 751514
6257740 233293
7313900 184735
9334796 763606
10940780283609
20837264788338
45285640607471


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Error registering at postgresql.org

2012-11-05 Thread Daniel Serodio (lists)
I'm trying to register at postgresql.org so I can edit the wiki to fix a 
broken link. I received a link on my e-mail so I could set my password, 
but when I submit the Change password form I get an error:


Forbidden (403)
CSRF verification failed. Request aborted.
More information is available with DEBUG=True.

I've tried appending ?DEBUG=True to the URL but got no further 
information.


Can someone help?

Thanks,
Daniel Serodio


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Unexpectedly high disk space usage

2012-11-05 Thread Lists
We upgraded to 9.1 from 8.4 over the summer a few months ago, to new DB 
servers with more disk space and memory. Unexpectedly, the DB servers 
have steadily increased their disk space usage since. Reported system 
load doesn't seem to be affected. It's happening to all our DB servers 
running 9.1.


When we reload all pg_dumps from our worst-affected server into an 
offline server, the disk space usage is about 26 GB, but the production 
database is using 166 GB. (# df /var/lib/pgsql;)


To resolve this, we've tried:

1) reindexed everything (cut about 10% of disk usage temporarily)

2) tried vacuum full, and vacuum analyze on all databases. (to minimal 
effect)


3) Restarting PG (no discernable effect) including a full stop/start.

4) We've looked for stale prepared transactions (none found)

5) instructions from the wiki to try to determine what the cause of all 
the disk usage is: http://wiki.postgresql.org/wiki/Disk_Usage but when 
we add up all the results for all the different databases, tables, 
indexes, etc. in a script, we get a number very close to the usage of 
the freshly loaded server. (24 GB)


What  is Postgres doing with ~ 80% of its disk space usage? This is not 
normal, is it? I would hate to have to take the servers off line just to 
dump/restore in order to bring disk usage back to normal...



SYSTEM SPECS:
I've attached the postgresql.conf on a RHEL6/64 Linux server with 128 GB 
Of RAM and 16 real CPU cores. (HT turned on, 32 CPUs according to the O/S)


#FROM: sysctl.conf:
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 136365211648
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296


-Ben
# -
# PostgreSQL configuration file
# -
#
# This file consists of lines of the form:
#
#   name = value
#
# (The = is optional.)  Whitespace may be used.  Comments are introduced with
# # anywhere on a line.  The complete list of parameter names and allowed
# values can be found in the PostgreSQL documentation.
#
# The commented-out settings shown in this file represent the default values.
# Re-commenting a setting is NOT sufficient to revert it to the default value;
# you need to reload the server.
#
# This file is read on server startup and when the server receives a SIGHUP
# signal.  If you edit the file on a running system, you have to SIGHUP the
# server for the changes to take effect, or use pg_ctl reload.  Some
# parameters, which are marked below, require a server shutdown and restart to
# take effect.
#
# Any parameter can also be given as a command-line option to the server, e.g.,
# postgres -c log_connections=on.  Some parameters can be changed at run time
# with the SET SQL command.
#
# Memory units:  kB = kilobytesTime units:  ms  = milliseconds
#MB = megabytes s   = seconds
#GB = gigabytes min = minutes
#   h   = hours
#   d   = days


#--
# FILE LOCATIONS
#--

# The default values of these variables are driven from the -D command-line
# option or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'   # use data in another directory
# (change requires restart)
#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file
# (change requires restart)
#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file
# (change requires restart)

# If external_pid_file is not explicitly set, no extra PID file is written.
#external_pid_file = '(none)'   # write an extra PID file
# (change requires restart)


#--
# CONNECTIONS AND AUTHENTICATION
#--

# - Connection Settings -

listen_addresses = '*'  # what IP address(es) to listen on;
# comma-separated list of addresses;
# defaults to 'localhost', '*' = all
# (change requires restart)
port = 5432 # (change requires restart)
# max_connections = 150 # (change requires restart)
# Note:  Increasing max_connections costs ~400 bytes of shared memory per
# connection slot, plus lock space (see max_locks_per_transaction).
#superuser_reserved_connections = 3 # (change requires restart)
#unix_socket_directory = '' # (change requires restart)

[GENERAL] Index creation problem

2012-10-19 Thread Oliver Kohll - Mailing Lists
Hi,

I'm getting a problem where my application hangs in the process of adding a 
field to a table. The app adds the field, then creates an index on the field. 
It hangs for ages (minutes) until I cancel the query. My investigation so far 
has been

Look at current queries:

agilebasedata=# SELECT datname,procpid,current_query FROM pg_stat_activity;
 datname | procpid |  
current_query  
-+-+-
 agilebaseschema |5799 | IDLE in transaction
 agilebasedata   |   18126 | SELECT datname,procpid,current_query FROM 
pg_stat_activity;
 agilebasedata   |5844 | IDLE
 agilebasedata   |5108 | CREATE INDEX l_ntvs1fk9de719830100m5aoi8suwo ON 
ntvs1fk9desoci59z(lower(m5aoi8suwo4jocu76) varchar_pattern_ops)
 agilebasedata   |5109 | IDLE in transaction
 agilebaseschema |   25200 | IDLE
 agilebasedata   |   29257 | IDLE
 agilebasedata   |   31574 | IDLE
(8 rows)

As you can see, the only user query running is the CREATE INDEX. Cancelling 
this with

select pg_cancel_backend(5108);

gets the app back on it's feet.

I thought there may be some sort of deadlock, but there is only one long 
running query. By the way, the indexing query should return quickly, as there 
are 0 rows in the table. It does in fact return immediately when you run it 
manually:

agilebasedata=# CREATE INDEX l_nx4uaurg3r1981190097whsqcun3e9 ON 
nx4uaurg3r94vwqhj(lower(whsqcun3e9lfzlwlv) varchar_pattern_ops);
ERROR:  relation l_nx4uaurg3r1981190097whsqcun3e9 already exists
agilebasedata=# drop index l_nx4uaurg3r1981190097whsqcun3e9;
DROP INDEX
agilebasedata=# CREATE INDEX l_nx4uaurg3r1981190097whsqcun3e9 ON 
nx4uaurg3r94vwqhj(lower(whsqcun3e9lfzlwlv) varchar_pattern_ops);
CREATE INDEX


One thing to mention is that there are a lot of indexes and relations:

agilebasedata=# select count(*) from pg_index;
 count 
---
  2587
(1 row)

agilebasedata=# select count(*) from pg_class;
 count 
---
  5361
(1 row)

I wonder if I'm running up against some sort of limit. I am going to change the 
code so it doesn't add an index (it's not always necessary) but would like to 
get to the bottom of things first.

Regards
Oliver Kohll


www.gtwm.co.uk - company / www.agilebase.co.uk - product



Re: [GENERAL] Index creation problem

2012-10-19 Thread Oliver Kohll - Mailing Lists
On 19 Oct 2012, at 13:28, Frank Lanitz fr...@frank.uvena.de wrote:

 Just an idea without bigger investigation: Whare are the idle in
 transactions are doing? Maybe they are blocking the create index.
 
 Cheers,
 Frank

Good question, I don't know. The app runs on Java / Apache Tomcat, which 
maintains a connection pool, so I assumed the IDLEs were just opened 
connections but I didn't notice two were idle in transaction. I haven't dealt 
with this before but this looks like a good explanation:

http://www.depesz.com/2008/08/28/hunting-idle-in-transactions/

I will up logging and try to replicate. Next time I will look at pg_locks too.

Also, the PG version is 9.1.6.

Oliver 



Re: [GENERAL] Index creation problem

2012-10-19 Thread Oliver Kohll - Mailing Lists

On 19 Oct 2012, at 15:09, Albe Laurenz laurenz.a...@wien.gv.at wrote:

 You should always include the list in your replies -
 other people might be interested in the solution.

Oops, thought I had.

 
 There must be at least two transactions involved
 to create a locking problem like you describe.
 But since CREATE INDEX takes strong locks, it can
 easily get locked by other harmless things.

OK, I've reduced the calls to CREATE INDEX, waiting to test that. If there's 
still an issue I will use CONCURRENTLY as you suggested.

Many thanks
Oliver

Re: [GENERAL] database corruption questions

2012-10-17 Thread Daniel Serodio (lists)

Craig Ringer wrote:

On 10/14/2012 05:53 AM, Heine Ferreira wrote:

Hi

Are there any best practices for avoiding database
corruption?


* Maintain rolling backups with proper ageing. For example, keep one a 
day for the last 7 days, then one a week for the last 4 weeks, then 
one a month for the rest of the year, then one a year.

What kind of rolling backups? From pg_basebackup?


* Use warm standby with log shipping and/or replication to maintain a 
live copy of the DB.


* If you want point-in-time recovery, keep a few days or weeks worth 
of WAL archives and a basebackup around. That'll help you recover from 
those oops I meant DROP TABLE unimportant; not DROP TABLE 
vital_financial_records; issues.


* Keep up to date with the latest PostgreSQL patch releases. Don't be 
one of those people still running 9.0.0 when 9.0.10 is out.
The problem is that updating the database usually results in downtime. 
Or can the downtime be avoided in a replication scenario?


* Plug-pull test your system when you're testing it before going live. 
Put it under load with something like pgbench, then literally pull the 
plug out. If your database doesn't come back up fine you have 
hardware, OS or configuration problems.


* Don't `kill -9` the postmaster. It should be fine, but it's still 
not smart.


* ABSOLUTELY NEVER DELETE postmaster.pid

* Use good quality hardware with proper cooling and a good quality 
power supply. If possible, ECC RAM is a nice extra.


* Never, ever, ever use cheap SSDs. Use good quality hard drives or 
(after proper testing) high end SSDs. Read the SSD reviews 
periodically posted on this mailing list if considering using SSDs. 
Make sure the SSD has a supercapacitor or other reliable option for 
flushing its write cache on power loss. Always do repeated plug-pull 
testing when using SSDs.


* Use a solid, reliable file system. zfs-on-linux, btrfs, etc are not 
the right choices for a database you care about. Never, ever, ever use 
FAT32.


* If on Windows, do not run an anti-virus program on your
database server. Nobody should be using it for other things or running 
programs on it anyway.


* Avoid RAID 5, mostly because the performance is terrible, but also 
because I've seen corruption issues with rebuilds from parity on 
failing disks.


* Use a good quality hardware RAID controller with a battery backup 
cache unit if you're using spinning disks in RAID. This is as much for 
performance as reliability; a BBU will make an immense difference to 
database performance.


* If you're going to have a UPS (you shouldn't need one as your system 
should be crash-safe), don't waste your money on a cheap one. Get a 
good online double-conversion unit that does proper power filtering. 
Cheap UPSs are just a battery with a fast switch, they provide no 
power filtering and what little surge protection they offer is done 
with a component that wears out after absorbing a few surges, becoming 
totally ineffective. Since your system should be crash-safe a cheap 
UPS will do nothing for corruption protection, it'll only help with 
uptime.


--
Craig Ringer


Thanks,
Daniel Serodio


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] allow servers to access to the same data

2012-10-17 Thread Daniel Serodio (lists)

Tulio wrote:
You can use a stream replication in hot standby (native) to have the 
same data and access in both (but not update and insert in the slave, 
just select)
and create a virtual IP using heartbeat.. configuring a master to use 
some IP (virtual) and when this lost the IP, the second server (slave) 
will be use this IP..

and will allow update and insert...

is it that you want?
I've come across a few mentions of Heartbeat being used for PostgreSQL 
failover, do have any links to more information about this?



Em 17/10/2012 10:20, GMAIL escreveu:
it's possible to access the same data from two different servers. the 
two servers have the same IP and not run simultaneously


Thanks in advance,
Daniel Serodio


Re: [GENERAL] allow servers to access to the same data

2012-10-17 Thread Daniel Serodio (lists)

Shaun Thomas wrote:

On 10/17/2012 12:53 PM, Daniel Serodio (lists) wrote:


I've come across a few mentions of Heartbeat being used for PostgreSQL
failover, do have any links to more information about this?


This was the subject of my talk at PG Open this year. I've got the 
entire PDF of slides, liner notes, and instructions on the Postgres Wiki:


http://wiki.postgresql.org/wiki/Postgres_Open_2012

Full link to PDF:

http://wiki.postgresql.org/images/0/07/Ha_postgres.pdf

It's a very cut-down version of the approach we've used successfully 
for a while.



That's great, thanks for the links.

Regards,
Daniel Serodio


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Is pg_basebackup also for regular backups?

2012-10-15 Thread Daniel Serodio (lists)
I was reading the documentation for pg_basebackup and it states that 
resulting backups can be used both for point-in-time recovery and as 
the starting point for a log shipping or streaming replication standby 
servers.


Should pg_basebackup also be used for regular backups, or only for 
PITR/streaming replication?


Regards,
Daniel Serodio



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Help estimating database and WAL size

2012-10-15 Thread Daniel Serodio (lists)

Jasen Betts wrote:

On 2012-10-08, Daniel Serodio (lists)daniel.li...@mandic.com.br  wrote:

We are preparing a PostgreSQL database for production usage and we need
to estimate the storage size for this database. We're a team of
developers with low expertise on database administration, so we are
doing research, reading manuals and using our general IT knowledge to
achieve this.

We have actual data to migrate to this database and some rough
estimations of growth. For the sake of the example, let's say we have a
estimation of growth of 50% per year.

The point is: what's the general proper technique for doing a good size
estimation?

We are estimating the storage usage by the following rules. Topics where
we need advice are marked with ** asterisks **. Feedback on the whole
process is more than welcome.

1) Estimate the size of each table
  1.1) Discover the actual size of each row.
  - For fields with a fixed size (like bigint, char, etc) we used
the sizes described in the documentation
  - For fields with a dynamic size (like text) we estimated the
string length and used the function select pg_column_size('expected text
here'::text)


long text is subject to compression, pg_column_size doesn't seem to
test compression, compression is some sort of LZ..
Interesting, I didn't know about automatic compression. I've just read 
the section on TOAST and haven't been able to answer this either: Is 
there any way to check for the compressed size?

  - We added 4 more bytes for the OID that PostgreSQL uses internally


OID is optional, IIRC PGXID is not
I hadn't heard of PGXID, I've just searched Google but found no 
reference to this term except for this e-mail thread and some source 
code. What is PGXID? Where can I learn more about hit?

  1.2) Multiply the size of each row by the number of estimated rows
  ** Do I need to consider any overhead here, like row or table
metadata? **


page size 8K
column overhead 1 byte per not-NULL column, NULLs are free,


2) Estimate the size of each table index
  ** Don't know how to estimate this, need advice here **


IIRC
( data being indexed + 8 bytes ) / fill factor


3) Estimate the size of the transaction log
  ** We've got no idea how to estimate this, need advice **


how big are your transactions?

Very short, a couple of statements each.

4) Estimate the size of the backups (full and incremental)
  ** Don't know how to estimate this, need advice here **


depends on the format you use, backups tend to compress well.


5) Sum all the estimates for the actual minimum size


no, you get estimated size.

Thanks a lot for the response.

Regards,
Daniel Serodio

6) Apply a factor of 1.5x (the 50% growth) to the sum of the estimates
1, 2 and 4 for the minimum size after 1 year

7) Apply an overall factor of 1.2 ~ 1.4 (20% to 40% more) to estimates 5
and 6 for a good safety margin

I know the rules got pretty extensive, please let me know if you need
more data or examples for a better understanding.

We've also posted this question to
http://dba.stackexchange.com/q/25617/10166


Re: [GENERAL] Help estimating database and WAL size

2012-10-15 Thread Daniel Serodio (lists)

John R Pierce wrote:

On 10/08/12 1:39 PM, Daniel Serodio (lists) wrote:

3) Estimate the size of the transaction log
** We've got no idea how to estimate this, need advice ** 


postgres doesn't have a 'transaction log', it has the WAL (Write-Ahead 
Logs).  These are typically 16MB each.  on databases with a really 
heavy write load, I might bump the checkpoint_segments as high as 60, 
which seems to result in about 120 of them being created, 2GB total.  
these files get reused, unless you are archiving them to implement a 
continuous realtime backup system (which enables PITR, Point in Time 
Recovery)
Thanks, I was using the term transaction log as a synonym for WAL. 
We're planning on enabling PITR; how can we calculate the WAL size and 
the WAL archive size in this case?


Regards,
Daniel Serodio


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Is pg_basebackup also for regular backups?

2012-10-15 Thread Daniel Serodio (lists)

Magnus Hagander wrote:

On Mon, Oct 15, 2012 at 10:15 PM, Daniel Serodio (lists)
daniel.li...@mandic.com.br  wrote:

I was reading the documentation for pg_basebackup and it states that
resulting backups can be used both for point-in-time recovery and as the
starting point for a log shipping or streaming replication standby servers.

Should pg_basebackup also be used for regular backups, or only for
PITR/streaming replication?


It's for regular backups, based off PITR.

I'm sorry, I don't understand what you mean by based off PITR. Do I 
need PITR enabled in order to use pg_basebackup, or do you mean that I 
won't lose the ability to use PITR after restoring from a backup created 
with pg_basebackup, or do you mean something else?


Regards,
Daniel Serodio


[GENERAL] Help estimating database and WAL size

2012-10-08 Thread Daniel Serodio (lists)
We are preparing a PostgreSQL database for production usage and we need 
to estimate the storage size for this database. We're a team of 
developers with low expertise on database administration, so we are 
doing research, reading manuals and using our general IT knowledge to 
achieve this.


We have actual data to migrate to this database and some rough 
estimations of growth. For the sake of the example, let's say we have a 
estimation of growth of 50% per year.


The point is: what's the general proper technique for doing a good size 
estimation?


We are estimating the storage usage by the following rules. Topics where 
we need advice are marked with ** asterisks **. Feedback on the whole 
process is more than welcome.


1) Estimate the size of each table
1.1) Discover the actual size of each row.
- For fields with a fixed size (like bigint, char, etc) we used 
the sizes described in the documentation
- For fields with a dynamic size (like text) we estimated the 
string length and used the function select pg_column_size('expected text 
here'::text)

- We added 4 more bytes for the OID that PostgreSQL uses internally
1.2) Multiply the size of each row by the number of estimated rows
** Do I need to consider any overhead here, like row or table 
metadata? **


2) Estimate the size of each table index
** Don't know how to estimate this, need advice here **

3) Estimate the size of the transaction log
** We've got no idea how to estimate this, need advice **

4) Estimate the size of the backups (full and incremental)
** Don't know how to estimate this, need advice here **

5) Sum all the estimates for the actual minimum size

6) Apply a factor of 1.5x (the 50% growth) to the sum of the estimates 
1, 2 and 4 for the minimum size after 1 year


7) Apply an overall factor of 1.2 ~ 1.4 (20% to 40% more) to estimates 5 
and 6 for a good safety margin


I know the rules got pretty extensive, please let me know if you need 
more data or examples for a better understanding.


We've also posted this question to 
http://dba.stackexchange.com/q/25617/10166


Thanks in advance,
Daniel Serodio


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] RFE: Column aliases in WHERE clauses

2012-09-17 Thread Daniel Serodio (lists)
It would be nice if PostgreSQL supported column aliases in WHERE 
clauses, eg:


SELECT left(value, 1) AS first_letter
FROM some_table
WHERE first_letter  'a';

Is this the proper mailing list for such feature requests?

Thanks in advance,
Daniel Serodio


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] RFE: Column aliases in WHERE clauses

2012-09-17 Thread Daniel Serodio (lists)

Ryan Kelly wrote:

On Mon, Sep 17, 2012 at 06:18:53PM -0300, Daniel Serodio (lists) wrote:

It would be nice if PostgreSQL supported column aliases in WHERE
clauses, eg:

SELECT left(value, 1) AS first_letter
FROM some_table
WHERE first_letter  'a';

Is this the proper mailing list for such feature requests?

I think this is explicitly disallowed by the spec.

And by Tom:
http://archives.postgresql.org/pgsql-general/2002-11/msg01411.php

Tom's explanation makes perfect sense, thanks for the pointer.

Regards,
Daniel Serodio



Re: [GENERAL] Too far out of the mainstream

2012-09-05 Thread Oliver Kohll - Mailing Lists
Here's a bit of positive news spin - in a backhanded way perhaps, but still a 
compliment:

http://www.theregister.co.uk/2012/08/31/postgresql_too_cool_for_school/

Oliver
www.agilebase.co.uk



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Amazon High I/O instances

2012-08-21 Thread Oliver Kohll - Mailing Lists
On 21 Aug 2012, at 13:32, Vincent Veyron vv.li...@wanadoo.fr wrote:

 
 Since Amazon has added new high I/O instance types and EBS volumes,
 anyone has done some benchmark of PostgreSQL on them ?
 
 
 I wonder : is there a reason why you have to go through the complexity
 of such a setup, rather than simply use bare metal and get good
 performance with simplicity?
 
 For instance, the dedibox I use for my app (visible in sig) costs 14,00
 euros/month, and sits at .03% load average with 5 active users; you can
 admin it like a home pc.

This is a general 'cloud or dedicated' question, I won't go into it but I 
believe cloud proponents cite management ease, scalability etc. I'm sure 
there's a place for every type of hosting. However I would be interested in 
hearing some experiences of PostgreSQL on an Amazon high I/O instance, given a 
client has just proposed running on one. If there are none forthcoming in the 
short term I may be in a position to provide some results myself in a month or 
two.

Oliver Kohll
www.agilebase.co.uk



Re: [GENERAL] Slow information_schema.views

2012-03-22 Thread Oliver Kohll - Mailing Lists

On 22 Mar 2012, at 10:17, Albe Laurenz wrote:

 Or is there a better way of finding view dependencies? I see there's a
 pg_catalog entry for tables
 that a view depends on but that's not what I'm after.
 
 You can use pg_depend and pg_rewrite as follows:
 
 SELECT DISTINCT r.ev_class::regclass
 FROM pg_depend d JOIN
 pg_rewrite r ON (d.objid = r.oid)
 WHERE d.classid = 'pg_rewrite'::regclass
  AND d.refclassid = 'pg_class'::regclass
  AND r.ev_class  d.refobjid
  AND d.refobjid::regclass::text LIKE '%myviewname%';
 
 I didn't test it very much, so play around with it a little before
 you trust it.
 
 I don't know if it will perform better in your case, but it should
 return more appropriate results
 (you don't want to find VIEW dummy AS SELECT * FROM t WHERE a =
 'myviewname').
 
 Yours,
 Laurenz Albe

Thank you - I did come to a similar method yesterday following some pointers 
from previous messages but I'm glad to have some confirmation it's the right 
direction. It does perform an order of magnitude faster for me, from 500ms down 
to under 20ms. My exact query is

SELECT distinct dependent.relname
FROM pg_depend
JOIN pg_rewrite ON pg_depend.objid = pg_rewrite.oid
JOIN pg_class as dependent ON pg_rewrite.ev_class = dependent.oid
JOIN pg_class as dependee ON pg_depend.refobjid = dependee.oid
WHERE dependee.relname = 'myviewname'
AND dependent.relname != 'myviewname'

Haven't tested this much yet either. I'll compare yours to mine and check the 
differences.

Regards
Oliver
www.agilebase.co.uk



[GENERAL] Slow information_schema.views

2012-03-19 Thread Oliver Kohll - Mailing Lists
Hello,

I'm doing some SELECTs from information_schema.views to find views with 
dependencies on other views, i.e.

SELECT table_name FROM information_schema.views WHERE view_definition ILIKE 
'%myviewname%';

and each is taking about 1/2 a second, which is getting a bit slow for my use. 
There are 1213 views listed in information_schema.views

Doing an explain analyze, it looks like the issue is likely to be the 
pg_get_viewdef function or one of the privilege check functions. I'm not 
worried about privilege checks and I don't need a nicely formatted definition. 
Is there a way of finding out how pg_get_viewdef works so I can perhaps do a 
lower level query?

I've previously used pg_catalog.pg_views which performs similarly.

Or is there a better way of finding view dependencies? I see there's a 
pg_catalog entry for tables that a view depends on but that's not what I'm 
after.

Regards
Oliver Kohll
www.agilebase.co.uk
-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Multi master use case?

2012-01-28 Thread Oliver Kohll - Mailing Lists
On 28 Jan 2012, at 15:27, Greg Sabino Mullane g...@turnstep.com wrote:

 Is this a case for multi master do you think? 
 I.e. running one on the internet, one locally.
 
 Yes, could be.
 
 b) changing schemas (new tables, fields, views etc.) as well as data
 
 That's a tall order; I don't think anything will do that automatically, 
 although rubyrep claims to at least pick up new tables.

OK, I guess I could treat one as 'schema master' and pg_dump schema + data 
across to the other once a night, once all activity has stopped and standard 
replication completed.

 
 Any experiences/thoughts?
 
 My experience is with Bucardo, which should do the job admirably 
 (but with the data only). My advice would be to just set up a test 
 system and try rubyrep and Bucardo out. For the latter, use the 
 latest Bucardo5 beta, as Bucardo4 will be deprecated soon:
 
 http://bucardo.org/downloads/Bucardo-4.99.3.tar.gz

Thanks, I'll do that.

Oliver
www.agilebase.co.uk

Re: [GENERAL] explain analyse and nested loop joins

2011-11-07 Thread Oliver Kohll - Mailing Lists
Thanks,

It does look like an incorrect prediction. Looking again, I think it's the row 
estimate for the join that's out - the planner estimates one row returned, in 
which case a nested join would probably make sense, whereas in fact there are 
23.

However it's a generated (user created) query, so I think what I might do is 
get the application to detect this case from the query plan where there is a 
slow query and automatically test turning off nested joins. I'll just have to 
keep an eye on it to see if it becomes unnecessary in future PG versions.

Regards
Oliver
www.agilebase.co.uk

On 6 Nov 2011, at 04:17, Pavel Stehule wrote:

 Hello
 
 Propably there are a dependency between following columns - and then a
 prediction is not correct.
 
 Try to move one less selective to OUTER SELECT
 
 SELECT * FROM (SELECT your query OFFSET 0) x WHERE x.invoiced = false
 
 Regards
 
 Pavel Stehule
 
 2011/11/5 Oliver Kohll - Mailing Lists oliver.li...@gtwm.co.uk:
 b2deliveryorders.complete = false AND b2deliveryorders.invoiced = false


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] explain analyse and nested loop joins

2011-11-05 Thread Oliver Kohll - Mailing Lists
Hi,

I have a query I'm trying to optimise. It takes just under a second to run, not 
too bad for my users but I'm worried that as the size of the data increases, it 
will get worse. Of course the plan may change when that happens but I'd also 
like to learn a bit more about optimisation anyway.

The explain is here:

http://explain.depesz.com/s/Ost

- that one took 690ms. Seeing it had a couple of nested joins at the top, I 
'set enable_nestloop = false;', resulting in an improvement of about 20x:

http://explain.depesz.com/s/BRi

The query is below. It joins to a second report dbvcalc_delivery_charges which 
I can also send if necessary. I've only guesses as to the reasons the default 
plan is slow or how to affect it, can someone enlighten me?

Regards
Oliver Kohll
www.agilebase.co.uk


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Fwd: explain analyse and nested loop joins

2011-11-05 Thread Oliver Kohll - Mailing Lists
Oops, forgot to include the query, it's

SELECT b2deliveryorders.idb2deliveryorders, 
a2clientpremises.ida2clientpremises, a2clientpremises.premisesname, 
a2clientpremises.town, b2deliveryorders.expectedby, 
b2deliveryorders.dateordered, b2deliveryorders.invoicenumber, 
b2deliveryorders.deliverymethod, b2deliveryorders.driver, 
dbvcalc_delivery_charges.total, dbvcalc_delivery_charges.boxes, 
b2deliveryorders.createdbyauto
   FROM b2deliveryorders
   LEFT JOIN a2clientpremises ON b2deliveryorders.a2clientpremisespremisesname 
= a2clientpremises.ida2clientpremises
   LEFT JOIN dbvcalc_delivery_charges ON b2deliveryorders.idb2deliveryorders = 
dbvcalc_delivery_charges.idb2deliveryorders
  WHERE b2deliveryorders.complete = false AND b2deliveryorders.invoiced = false
  ORDER BY b2deliveryorders.expectedby NULLS FIRST;

Oliver

Begin forwarded message:

 From: Oliver Kohll - Mailing Lists oliver.li...@gtwm.co.uk
 Subject: explain analyse and nested loop joins
 Date: 5 November 2011 19:21:23 GMT
 To: pgsql-general pgsql-general@postgresql.org
 
 Hi,
 
 I have a query I'm trying to optimise. 


Re: [GENERAL] Random multiple times

2011-09-22 Thread Oliver Kohll - Mailing Lists
Many thanks both, those solutions are great and have gone in my wiki for future 
ref.
Regards
Oliver

On 21 Sep 2011, at 21:56, Szymon Guz wrote:

 
 
  Short answer is: yes. More information you can find
  here 
  http://simononsoftware.com/problem-with-random-in-postgresql-subselect/
  regards
  Szymon
 
 


[GENERAL] Random multiple times

2011-09-21 Thread Oliver Kohll - Mailing Lists
Hi,

I understand random() is a volatile function and runs multiple times for 
multiple rows returned by a SELECT, however is there a way of getting it to run 
multiple times *within* another function call and in the same row. i.e. 
something like

select regexp_replace('+1 555 555 555', E'\\d', trunc(random() * 9 + 
1)::text,'g');
 regexp_replace 

 +1 111 111 111
(1 row)

As you can see, it returns the same digit each time. I've tried wrapping a 
select around the trunc too.

Regards
Oliver Kohll
www.gtwm.co.uk / www.agilebase.co.uk




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Realtime Query Dashboard Results

2011-01-07 Thread Oliver Kohll - Mailing Lists
On 7 Jan 2011, at 22:02, THOMPSON, JARED (ATTBAPCO) jt0...@att.com wrote:

 I assume when people use dashboards they are not being queried every
 second for updating but maybe every minute?
 
 Are there any tools that work good on top of postgres?
 
 (I see in the stock market (though I am looking at/for production data)
 they seem to use tools that frequently update their dashboards.
 
 What is a realistic timeframe to expect query updates for a dashboard?

Having written a dashboard on top of PostgreSQL (screenshot at 
http://blog.agilebase.co.uk/2010/03/31/dashboard-preview/ ), I can at least 
state my decisions:
Charts are updated once a day overnight, or cached whenever someone looks at 
them in the system underlying the dashboard, so they are at most one day old. A 
chart is also updated when a user clicks on it to drill down to the data.

Of course what you decide depends on what the business use case is and what 
demands there are on the system. In my cases so far the slowest charts take 1 
or 2 seconds to generate by SQL so if necessary, each could be loaded in in 
real time over AJAX, though that hasn't been needed yet.

Regards
Oliver Kohll

Re: [GENERAL] locating cities within a radius of another

2010-07-22 Thread Oliver Kohll - Mailing Lists

On 21 Jul 2010, at 23:14, Joe Conway m...@joeconway.com wrote:

 If you want something simple, and not requiring PostGIS, but plpgsql
 instead, see:
 
 http://archives.postgresql.org/pgsql-sql/2003-12/msg00193.php

For completeness, the earthdistance module also provides the distance between 
two lat/longs, the point@point syntax is simple to use:
http://www.postgresql.org/docs/8.3/static/earthdistance.html

Regards
Oliver Kohll

oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll
www.agilebase.co.uk - software



Re: [GENERAL] locating cities within a radius of another

2010-07-22 Thread Oliver Kohll - Mailing Lists

On 22 Jul 2010, at 12:57, Geoffrey wrote:

 For completeness, the earthdistance module also provides the distance 
 between two lat/longs, the point@point syntax is simple to use:
 http://www.postgresql.org/docs/8.3/static/earthdistance.html
 
 Disgregard my last post, Surely as soon as I hit send, the light went on...  
 I'm looking at deriving my points for point @ point from ll_to_earth().


I constructed mine using point(longitude, latitude), where long and lat are 
double precision, which returns a datatype of type point. ll_to_earth() looks 
like it returns a datatype of type earth, so not sure if it will work. Maybe 
things have changed in a recent release, please let me know if so.

So an example would be
select point(-2.2171,56.8952)@point(-1.2833,51.6667) as miles; 
  miles   
--
 363.202864676916
(1 row)

Regards
Oliver Kohll

oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll
www.agilebase.co.uk - software





[GENERAL] Backups / replication

2010-06-15 Thread Oliver Kohll - Mailing Lists
Hello,

I'm interested in using WAL shipping / replication for backup purposes but have 
no interest in failover. Currently my situation is:

I have two servers, live and backup, which are in different cities. The backup 
server is also a test/development machine.

Backups of my most important database are made hourly with pg_dump, excluding 
some larger tables with non-critical logging data. Even so, as the database 
grows, backups are taking longer and it looks as though they may start to 
impact performance. A full backup is made nightly and transferred to the backup 
machine, along with all of the day's hourly backups.

I'm looking into using replication by WAL shipping - after all, there's no use 
to backing up data which hasn't changed since last time - only a small 
percentage of records are created/updated. However, I need

a) to be able to restore to a point in time easily, which I can do to within an 
hour at the moment by restoring the correct dump. Sometimes users ask for a 
restore having accidentally updated/deleted records.
b) to carry on running a test server database, that means one that's read and 
writeable.

I obviously can't use a replication slave as a read/write test server at the 
same time. At the moment I've thought of a couple of options, I don't know if 
either are possible - I have a bit of a hazy idea of WAL replication.

1) Continuously ship the WAL records to somewhere on the test server unknown to 
Postgres but run the test machine as a normal database completely separately. 
If a backup is needed, delete the test database, restore to the last full 
backup (a filesystem backup?) and copy all WAL records into Postgres' directory 
so it can see them. Start it up configured to replay them, up to a certain time.

2) Run two instances of Postgres on the test/backup server on different ports, 
one configured as a replication slave, one normal. I'm not sure if this is 
possible with the RPM builds I'm using.

Are either of those two likely? Any other suggestions? Another question is will 
the replication coming in v9.0 change things and would it be worth holding off 
until then? In particular Command Prompt's PITR tools look useful for restoring 
to a particular point in time, will these still work or will there be 
equivalents?

Regards
Oliver Kohll

oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Cumulative count (running total) window fn

2010-04-29 Thread Oliver Kohll - Mailing Lists
Hello,

Many thanks to andreas.kretschmer for this helpful reply about how to set up a 
window function to perform a running total:
http://archives.postgresql.org/pgsql-general/2010-03/msg01122.php

It works perfectly with the simple test data but I've just got back to work, 
tried implementing it on my live data and the results are slightly different. 
My query is almost exactly the same - I've simplified by grouping by year only 
rather than year and month:

select extract(year from signup_date),
  count(email_address),
  sum(count(email_address)) over (rows unbounded preceding)
from email_list group by 1 order by 1;

 date_part | count | sum  
---+---+--
  2007 |   501 | 1374
  2008 |   491 |  491
  2009 |   382 |  873
  2010 |66 | 1440
(4 rows)

What I'm looking for is 
 date_part | count | sum  
---+---+--
  2007 |   501 | 501
  2008 |   491 |  992
  2009 |   382 |  1374
  2010 |66 | 1440

It seems to be adding up the counts but not in the right order. 

I've also tried an explicit ORDER BY inside the partition with no difference:

select extract(year from signup_date),
  count(email_address),
  sum(count(email_address)) over (partition by 1 order by 1 asc rows unbounded 
preceding)
from email_list group by 1 order by 1;

Does anyone have any other ideas?

Regards
Oliver Kohll

oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll
www.agilebase.co.uk - software
www.gtwm.co.uk - company




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Cumulative count (running total) window fn

2010-04-29 Thread Oliver Kohll - Mailing Lists

On 29 Apr 2010, at 10:01, Magnus Hagander wrote:
 
 select extract(year from signup_date),
  count(email_address),
  sum(count(email_address)) over (partition by 1 order by 1 asc rows 
 unbounded preceding)
 from email_list group by 1 order by 1;
 
 Does anyone have any other ideas?
 
 Aren't you looking for something along the line of:
 
 SELECT year, sum(c) over (order by year)
 FROM (
  SELECT extract(year from signup_date) AS year, count(email_address) AS c
  FROM email_list
  GROUP BY extract(year from signup_date)
 )
 
 (adjust for typos, I didn't test it)

Yes that does work thanks, if you give the subquery a name. I'd still like to 
know if it's possible to do with a window function rather than a subquery.

Oliver Kohll

Re: [GENERAL] Cumulative count (running total) window fn

2010-04-29 Thread Oliver Kohll - Mailing Lists
 
 Aren't you looking for something along the line of:
 
 SELECT year, sum(c) over (order by year)
 FROM (
  SELECT extract(year from signup_date) AS year, count(email_address) AS c
  FROM email_list
  GROUP BY extract(year from signup_date)
 )
 
 (adjust for typos, I didn't test it)
 
 Yes that does work thanks, if you give the subquery a name. I'd still like to 
 know if it's possible to do with a window function rather than a subquery.
 
 Oliver Kohll
 
 
 Like this?:
 
 SELECT extract(year from signup_date), count(email_address), 
 sum(count(email_address)) OVER (ORDER BY count(email_address)) FROM 
 email_list GROUP BY 1 ORDER BY 1;
 
 Thom

Almost, but put me on the right track! This one is exactly what I'm looking for:

SELECT extract(year from signup_date), count(email_address), 
sum(count(email_address)) OVER (ORDER BY extract(year from signup_date)) FROM 
email_list GROUP BY 1 ORDER BY 1;

The ORDER BY count(email_address) did give the same results for my data but 
only because the count values just happen to give the same ordering as the 
years - I tested by changing some dates.

Many thanks all.
Oliver

Re: [GENERAL] Cumulative count (running total) window fn

2010-04-29 Thread Oliver Kohll - Mailing Lists
 
 Curious note - how does the non-subselect version and the subselect
 version compare performance-wise?

Magnus,

On a test table with 12,000 rows there's not much in it, the subselect has a 
simpler plan but they both take practically the same time.

The two plans (note I've been rewriting the field names for readability until 
now but haven't here):

explain analyze SELECT year, sum(c) over (order by year)
FROM (  
  SELECT extract(year from a56b7a8d6de03f67b) AS year, count(a10e4ab8863c199f1) 
AS c
  FROM a2e9a7e9e257153de
  GROUP BY extract(year from a56b7a8d6de03f67b)
) as subq;
   QUERY PLAN   
 
-
 WindowAgg  (cost=851.49..874.06 rows=1290 width=16) (actual 
time=43.369..43.394 rows=5 loops=1)
   -  Sort  (cost=851.49..854.71 rows=1290 width=16) (actual 
time=43.340..43.342 rows=5 loops=1)
 Sort Key: (date_part('year'::text, 
a2e9a7e9e257153de.a56b7a8d6de03f67b))
 Sort Method:  quicksort  Memory: 25kB
 -  HashAggregate  (cost=752.59..771.94 rows=1290 width=26) (actual 
time=43.300..43.317 rows=5 loops=1)
   -  Seq Scan on a2e9a7e9e257153de  (cost=0.00..689.56 rows=12605 
width=26) (actual time=0.031..26.723 rows=12605 loops=1)
 Total runtime: 43.549 ms

explain analyze SELECT extract(year from a56b7a8d6de03f67b), 
count(a10e4ab8863c199f1), sum(count(a10e4ab8863c199f1)) OVER (ORDER BY 
count(a10e4ab8863c199f1)) FROM a2e9a7e9e257153de GROUP BY 1 ORDER BY 1;
  QUERY PLAN
   
---
 Sort  (cost=1382.39..1388.52 rows=2451 width=32) (actual time=44.229..44.230 
rows=5 loops=1)
   Sort Key: (date_part('year'::text, a56b7a8d6de03f67b))
   Sort Method:  quicksort  Memory: 25kB
   -  WindowAgg  (cost=1195.39..1244.41 rows=2451 width=32) (actual 
time=44.171..44.208 rows=5 loops=1)
 -  Sort  (cost=1195.39..1201.52 rows=2451 width=32) (actual 
time=44.125..44.127 rows=5 loops=1)
   Sort Key: (count(a10e4ab8863c199f1))
   Sort Method:  quicksort  Memory: 25kB
   -  HashAggregate  (cost=1014.52..1057.41 rows=2451 width=32) 
(actual time=44.071..44.099 rows=5 loops=1)
 -  Seq Scan on a2e9a7e9e257153de  (cost=0.00..833.58 
rows=24126 width=32) (actual time=0.032..26.683 rows=12605 loops=1)
 Total runtime: 44.396 ms

Regards
Oliver Kohll

oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll
www.agilebase.co.uk - software
www.gtwm.co.uk - company





Re: [GENERAL] Invalid objects

2010-04-25 Thread Oliver Kohll - Mailing Lists
On 25 Apr 2010, at 07:34, Scott Bailey wrote:

 
 I can point you to the relevant code in GitHub if you're interested (it's 
 Java).
 
 Absolutely. Thanks
 
 Scott Bailey

Line 813 of
http://github.com/okohll/agileBase/blob/master/gtpb_server/src/com/gtwm/pb/model/manageSchema/DatabaseDefn.java
- private void updateViewDbAction

is the top level function.

Regards
Oliver Kohll




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Invalid objects

2010-04-24 Thread Oliver Kohll - Mailing Lists
Scott,

I implemented a system exactly like this for the app in my signature below. 
Basically the system stores metadata containing each view definition and the 
joins between them. When someone needs to alter a view, say to remove a column, 
a DROP CASCADE is performed then each view is recreated in turn *in the correct 
order*. Everything's in a transaction, so a failure of recreation will roll 
back to the original state. This can of course happen if for example a column 
is deleted which other views use. This method is only used if the original 
attempt fails due to dependency errors - some view updates can work just fine 
anyway.

I can point you to the relevant code in GitHub if you're interested (it's Java).

Regards
Oliver Kohll

oli...@agilebase.co.uk / +44(0)7814 828608 / skype:okohll
www.agilebase.co.uk - software
www.gtwm.co.uk - company


On 24 Apr 2010, at 13:01, Scott Bailey arta...@comcast.net wrote:

 Using views in Postgres can be a painful process. Changing a column in a base 
 table will require you to drop all views that depend on it, and all views 
 that depend on those views and so on.
 
 My coworker was complaining this morning that he now has a bunch of queries 
 where a view is joined back on the original table to add a column that was 
 missing from the view. It was easier to do this than to drop the view and all 
 of it's dependencies and then find all the source code and rebuild all of the 
 views in the correct order.
 
 So my thought was to create an invalid objects table to store the source and 
 dependencies (and possibly permissions) when a DDL change invalidates a view 
 or a function. And later you can call a procedure that (tries to) rebuild 
 those invalid objects.
 
 My initial plan of attack is to just create a function that stores the 
 information required to rebuild the dependencies before dropping them. 
 Something like:
  store_and_drop('my_view_name')
 
 I'm thinking that ultimately it would be nice if postgres could do this 
 automatically. Maybe:
  DROP my_view_name CASCADE WITH RESTORE
 
 So before I begin, has anyone already done this? And does anyone have any 
 advice as to how it may best be done?
 
 Thanks
 
 Scott Bailey




[GENERAL] Running/cumulative count using windows

2010-03-30 Thread Oliver Kohll - Mailing Lists
Hello,

I'm still reasonably new to windowing functions, having used a few since 8.4 
came out. I wonder if anyone can help with this one.

I've got a table of email addresses in a CRM system similar to the following:

CREATE TABLE test(
  signup_date timestamp,
  email_address varchar(1000)
);
INSERT INTO test(signup_date, email_address) VALUES(now(), 't...@test.com');
INSERT INTO test(signup_date, email_address) VALUES(now(), 't...@test1.com');
INSERT INTO test(signup_date, email_address) VALUES(now() - '1 
month'::interval, 't...@test2.com');

I'd like a running count, or cumulative count of the number of signups per 
month. I'm pretty sure a window function would do it but I can't work it out.

So a plain count by month would be

SELECT date_part('year',signup_date) as year, date_part('month',signup_date) as 
month, count(*)
FROM test
GROUP BY year, month
ORDER BY year, month;

giving

 year | month | count 
--+---+---
 2010 | 2 | 1
 2010 | 3 | 2

How would you make the count a cumulative one? The output should then be

 year | month | count 
--+---+---
 2010 | 2 | 1
 2010 | 3 | 3


Regards
Oliver Kohll


oli...@agilebase.co.uk / +44(0)845 456 1810 / skype:okohll
www.agilebase.co.uk - software
www.gtwm.co.uk - company



Re: [GENERAL] [pgsql-general] looking for a powerful frontend/teport generator

2010-03-29 Thread Oliver Kohll - Mailing Lists
On 29 Mar 2010, at 14:33, Clemens Eisserer linuxhi...@gmail.com wrote:

 Hi,
 
 Sorry for beeing a bit off-topic.
 Recently I've done some database-fontends, which I used
 java+swingset+netbeans-gui-builder for.
 Compared to plain java development is quite fast - however I wonder
 wether you could recommend db-fontend generators like the infamous
 access.
 
 What I've found so far was either:
 - extremly expensive
 - not compatible with free DBs (like postgres)
 - not powerful
 - not cross-platform
 
 Does anybody know tools which don't have properties like listed above?
 
 Thank you in advance, Cleens

Hello, there are one or two (including one I developed) under 'GUI builders' at 
pgFoundry:
http://pgfoundry.org/softwaremap/trove_list.php?form_cat=323

Regards
Oliver Kohll


oli...@agilebase.co.uk / +44(0)845 456 1810
www.agilebase.co.uk - software
www.gtwm.co.uk - company




[GENERAL] pg_dump new version

2010-02-23 Thread Oliver Kohll - Mailing Lists
Hello,

May I take a lull between PG releases to ask an upgrade Q? Documentation and 
people often recommend that when doing a major version upgrade, to dump data 
from the old server with the new version of pg_dump, since the new pg_dump may 
have bugfixes not available to the old.

I've never done this simply because it's not clear from the docs how to - 
perhaps it's platform dependant. The upgrade doc
http://www.postgresql.org/docs/8.4/interactive/install-upgrading.html
is a very clear step by step procedure to follow with the exception of

'To make the backup, you can use the pg_dumpall command from the version you 
are currently running. For best results, however, try to use the pg_dumpall 
command fromPostgreSQL 8.4.2, since this version contains bug fixes and 
improvements over older versions. While this advice might seem idiosyncratic 
since you haven't installed the new version yet, it is advisable to follow it 
if you plan to install the new version in parallel with the old version'.

So for someone using RPM packages to install Postgres, what's the recommended 
sequence to do this?

Regards
Oliver Kohll
www.agilebase.co.uk



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] pg_dump: SQL command failed

2010-02-12 Thread Oliver Kohll - Mailing Lists
Hello,

I've just come across this in an output from a cron backup script:

/etc/cron.hourly/gtwm_backup_databases.sh:

pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  could not open relation with OID 
572838
pg_dump: The command was: SELECT 
pg_catalog.pg_get_viewdef('572838'::pg_catalog.oid) AS viewdef

The complete script is

#!/bin/bash
nice /usr/bin/pg_dump -U postgres -f 
/var/local/backup/agilebaseschema_`/bin/date +%H`.sql agilebaseschema
nice /usr/bin/pg_dump -U postgres --exclude-table=dbint_log_* -f 
/var/local/backup/agilebasedata_`/bin/date +%H`.sql agilebasedata

Running the script again manually, it works with no errors and I haven't 
noticed the error before. Unfortunately I recently deleted my archive of cron 
results so I can't do an exhaustive check and I can't see anything in the 
postgres log either but it's the first time I've noticed it since I can 
remember.

A web search shows other people have come across this before now and again 
though I can't see any resolutions. I'm running postgres v8.4.1.

Running that SELECT statement manually on both databases returns

agilebasedata=# SELECT pg_catalog.pg_get_viewdef('572838'::pg_catalog.oid) AS 
viewdef;
  viewdef   

 Not a view
(1 row)

Views are regularly altered, dropped and created in the agilebasedata database, 
15 times today to date, which have all succeeded without error.

Any help on what could cause that error?

Regards
Oliver Kohll
oli...@agilebase.co.uk / 0845 456 1810 / 07814 828608
www.agilebase.co.uk - software
www.gtwm.co.uk - company


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] pgAdmin in 8.4 installation uses tools from 8.3 installation

2009-12-09 Thread Andy Shellam (Mailing Lists)

 We just upgraded a customer's PostgreSQL installation from 8.3 to 8.4.1.  I 
 wanted to make a small change to the database.  I wanted to take a backup 
 before I did it, just in case.  When I run pgAdmin from the 
 PostgreSQL/8.4/bin folder and try to take a backup, the backup fails.  The 
 output shows me that it is running PostgreSQL/8.3/bin/pg_dump.exe instead of 
 PostgreSQL/8.4/bin/pg_dump.exe.  I found that the psql window that can be 
 invoked from the latest pgAdmin is also running a tool from the 8.3 
 installation.
 
 How do I tell pgAdmin to use 8.4 tools instead of 8.3 tools?


Preferences?  I'm running on a Mac which means the preferences screen is 
accessible using PgAdmin3  Preferences - on Windows it may be something like 
File/Tools  Options/Preferences.

Then set the PG bin path as appropriate.

It may also be that the 8.3 bin folder is set in your PATH variable.

Regards,
Andy

Re: [GENERAL] FATAL: no pg_hba.conf entry for host “::1”

2009-12-05 Thread Andy Shellam (Mailing Lists)
Hi,

 
 So the issue seems to be with the last line on IPv6, to be honest i dont
 even know what IPv6 is, could disabling IPv6 do the trick or do i need to
 add something?

Your machine is connecting to PostgreSQL using IPv6's localhost address.  
IPv6 is a newer format of IP addressing that allows for more IP addresses - ::1 
= 127.0.0.1, basically.

Newer TCP/IP stacks attempt to connect to localhost using the IPv6 address 
first, instead of IPv4 (the current format.)

When you connect, try passing -h 127.0.0.1 to psql or whatever app/library 
you're using.  That should force it to connect over IPv4.  Or just uncomment 
the IPv6 line in your pg_hba.conf and restart PostgreSQL.

 
 The funny thing is with the exact same details a different application
 connects to postgreSQL fine? Any ideas and bare in mind im a newbie when
 explaining

Is the different application on a different server?  If so, it could be that 
the other app's OS is using IPv4 by default instead of IPv6.  Or the app is 
connecting over an IPv4 address (e.g. 127.0.0.1) instead of localhost.

Regards,
Andy

Re: [GENERAL] generic modelling of data models; enforcing constraints dynamically...

2009-09-27 Thread Oliver Kohll - Mailing Lists


On 27 Sep 2009, at 21:10, InterRob rob.mar...@gmail.com wrote:

Peter, may I invite you to privately share some more details on the  
system you are using and the design of it? Did you implement it  
using PostgreSQL? Looking forward to your reply.
(And with respect to your previous message: whom are you actually  
referring to by the acronym OPs?)




Or publicly? I for one would be interested hearing more. From  
situations I've come across, EAV seems to be proposed when either

1) attributes are very numerous and values very sparse
2) people want to be able to quickly add (and remove?) attributes
My feeling is it's probably valid for 1, at least I haven't come  
across anything better, but not for 2.


Regards
Oliver

www.gtwm.co.uk - company
www.gtportalbase.com - product



Re: [GENERAL] generic modelling of data models; enforcing constraints dynamically...

2009-09-25 Thread Oliver Kohll - Mailing Lists

On 25 Sep 2009, at 07:22, InterRob rob.mar...@gmail.com wrote:


I guess it IS quite overengineered indeed...

What I'm trying to do is to facilitate different fieldwork  
methodologies for archaeological research (on project basis); there  
is no final agreement on data structure and semantics; however, on a  
meta-level all choices are rational and can be modelled... Infact,  
all models can be related to each other: that's where the hybrid  
part comes in: I wish to implement the common denominator (90%) and  
then further extend this, enabing specific data model  
implementations -- including checks for data integrity.



Hi Rob,

Just wondering if you've considered rapid prototyping of the core of  
it to try and gain consensus by giving people something they can see  
and talk about, as an alternative to doing a lot of design work up  
front?


Regards
Oliver Kohll
www.gtwm.co.uk - company
www.gtportalbase.com - product



Re: [GENERAL] Viable alternatives to SQL?

2009-08-28 Thread Oliver Kohll - Mailing Lists

On 27 Aug 2009, at 17:11, pgsql-general-ow...@postgresql.org wrote:


From: Kelly Jones kelly.terry.jo...@gmail.com
Date: 27 August 2009 14:43:51 BST
To: pgsql-general@postgresql.org
Subject: Viable alternatives to SQL?


Many sites let you search databases of information, but the search
queries are very limited.

I'm creating a site that'll allow arbitrary SQL queries to my data (I
realize I'll need to handle injection attacks).

Are there other viable ways to query data? I read a little on
Business System 12 (BS12), Tutorial D, and even something called
T-SQL (I think), but they all seem theoretical and not fully
implemented.

I want a query language that non-techies can use easily, but also
supports arbitrarily complex queries. Does such a language exist?


Originally I was using the webexone.com database to allow non-techies  
to create tables and views, however found it too limiting - the  
functional ceiling too low.


So our org. built a web-app for this based on postgres - a GUI.  
However it's not aimed at letting non-techies do everything, which is  
rather ambitious. People familiar with SQL and relational db concepts  
can do practically everything but non-techs can do simpler tasks. For  
example I recently used some of the new windowing functions (yay!) in  
a view but users typically add/remove fields, filters and aggregate  
calculations.


Info at www.gtportalbase.com, it's also just gone on github as open  
source.


Oliver Kohll





[GENERAL] Division by zero

2009-07-31 Thread Oliver Kohll - Mailing Lists

Hello,

Divide by zero errors have come up a couple of times on this list  
(once raised by me). I wonder if I could propose a feature for  
discussion. Could the result of a division by zero be treated as  
infinity or null, rather than raising an error? Floating point types  
already have the concept of infinity.


I'd have thought that there's no reason why a /0 in one row  
necessarily has to be fatal for the whole view. In many cases, you can  
imagine that returning infinity makes more sense. Strictly, I suppose,  
1/0 should return infinity, 0/0 null and -1/0 negative infinity.  
Alternatively, all could return NaN. At least there could be a  
configuration option to turn on this behaviour.


The concern stems from the fact that when a divide by zero occurs in a  
view, no rows at all are returned, just the error message. This makes  
it very difficult to work out where the problem value is, compared to  
other tools like spreadsheets, which return a cell error. A view can  
be very fragile. Further, the Postgres error doesn't give any details  
of the field and of course can't point to the row, it just says

ERROR:  division by zero

There may well be good reasons for not treating this. I've come across  
comments such as 'I think everybody would agree that this would be a  
bad thing to do!' but remain to be convinced.


I know you can use CASE and NULLIF but if you have complex  
calculations, that makes them a lot less readable.


Regards
Oliver Kohll

oli...@gtwm.co.uk / 0845 456 1810 / 07814 828608
www.gtwm.co.uk - company
www.gtportalbase.com - product



Re: [GENERAL] [pgsql-general] Daily digest v1.9081 (14 messages)

2009-06-08 Thread Oliver Kohll - Mailing Lists

On 8 Jun 2009, at 17:23, Merlin Moncure mmonc...@gmail.com wrote:

Is there a way when creating a table to limit it to one row?  That  
is,

without using a stored procedure?

I searched the documentation, but didn't find anything.




CREATE TABLE x (...);

CREATE UNIQUE INDEX x_only_one_row ON ((1));


very clever :D

merlin


To clever for me, I don't understand what's going on and can't  
replicate it in my 8.3. Any expansion?


Rgs
Oliver

oli...@gtwm.co.uk / 0845 456 1810 / 07814 828608
www.gtwm.co.uk - company
www.gtportalbase.com - product



Re: [GENERAL] Sum of multiplied deltas

2009-06-08 Thread Oliver Kohll - Mailing Lists

On 8 Jun 2009, at 19:01, David Fetter wrote:


Hello!

I've the following data:
datetime | val1 | val2
time1|4 | 40%
time2|7 | 30%
time3|   12 | 20%
...

I'd like to sum up the following:

(7-4)*30% + (12-7)*20% + ...


This is best done in 8.4 using Windowing.  Sadly, it's an 8.4-only
feature, and dodgy hacks are the rule until you can use them.

Cheers,
David.




I do this type of thing now and again using a self join with an offset.

select test_a.val1 - test_b.val1
from test test_a inner join test test_b
on test_a.pkey = test_b.pkey - 1;

Thought I was quite clever the first time, didn't know it was a dodgy  
hack! I'm trying to learn more about windowing before 8.4, how would  
this example be done with that?


Regards
Oliver

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Division by zero

2009-06-04 Thread Oliver Kohll - Mailing Lists

On 4 Jun 2009, at 13:11, Sam Mason s...@samason.me.uk wrote:



You need to take care of only one case here: denominator == 0; rest  
of the

cases will be handled sanely by the database.

CASE WHEN packet_size = 0 THEN null ELSE cost/packet_size END as  
unit_cost


Yes; or even shorter:

 cost/nullif(packet_size,0) AS unit_cost


Thanks Sam and others. nullif is a good one to remember. However my  
problem is I want to be able to deal with an arbitrary calculation a  
user may create. In the light of a new day I realise this is obviously  
not trivial and would entail reasonably complex parsing. You'd have to  
find elements that could cause an error (division, some aggregates)  
and insert in the correct place nullif or CASE, taking care of bracket  
matching for starters - a messy workaround to the problem.


I might look into functions if that's the only reasonable way of  
catching exceptions.


Oliver



[GENERAL] Division by zero

2009-06-03 Thread Oliver Kohll - Mailing Lists

Hello,

We have a system that allows users to create views containing  
calculations but divisions by zero are commonly a problem.


An simple example calculation in SQL would be

SELECT cost / pack_size AS unit_cost from products;

Either variable could be null or zero.

I don't think there's a way of returning null or infinity for  
divisions by zero, rather than causing an error but I'd just like to  
check - and put in a vote for that functionality!


If not, I will have to get the system to check for any numeric fields  
in user input calculations and rewrite them similar to


CASE WHEN cost IS NULL THEN null
WHEN pack_size IS NULL THEN null
WHEN cost = 0 THEN null
WHEN pack_size = 0 THEN null
ELSE cost / pack_size
AS unit_cost

I don't want to write new functions, I'd rather keep it in plain SQL.

Best regards

Oliver Kohll


oli...@gtwm.co.uk / 0845 456 1810 / 07814 828608
www.gtwm.co.uk - company
www.gtportalbase.com - product

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Foreign key verification trigger conditions

2009-06-01 Thread j-lists
I have an update statement that affects every row in a given table.
For that table it changes the value in a single column, which itself
has a foreign key constraint. The table has an additional 9 foreign
keys, some of which reference large tables.
My expectation would be that only the changed column would be checked
against the foreign key of interest, instead I find that all the
foreign keys are checked when this statement is executed.
I decided to create a simple test case to demonstrate this behaviour,
but what I found was strange. The first time I created the test cases
the behaviour matches my experience but the second time I created it
the behaviour was then as I would have expected. This is the result I
am experiencing with the unnecessary foreign key verification:
testdb=# explain analyze update t1 set B = 1;
   QUERY PLAN
-
 Seq Scan on t1  (cost=0.00..25.10 rows=1510 width=22) (actual
time=0.026..0.029 rows=1 loops=1)
 Trigger for constraint fk1: time=0.111 calls=1
 Trigger for constraint fk2: time=0.014 calls=1
 Total runtime: 0.259 ms
(4 rows)

Only fk1's column is being updated, not fk2's.

Below is both sessions. Any feedback on this and how to avoid it is
appreciated as well as whether the developers would consider this a
bug, I am inclined to believe so as it hurts performance.

Thanks,
-J


*


testdb=# create table t1 (A BIGINT, B BIGINT);
CREATE TABLE
testdb=# create table t2 (B BIGINT PRIMARY KEY);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
t2_pkey for table t2
CREATE TABLE
testdb=# alter table t1 add constraint fk1 foreign key  (B) references t2 (B);
ALTER TABLE
testdb=# explain analyze insert into t2 values (1);
 QUERY PLAN

 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.008..0.010
rows=1 loops=1)
 Total runtime: 45.508 ms
(2 rows)

testdb=# explain analyze insert into t1 values (1, 1);
 QUERY PLAN

 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.008..0.010
rows=1 loops=1)
 Trigger for constraint fk1: time=0.256 calls=1
 Total runtime: 0.345 ms
(3 rows)

testdb=# explain analyze update t1 set A = 2;
   QUERY PLAN
-
 Seq Scan on t1  (cost=0.00..27.70 rows=1770 width=14) (actual
time=0.019..0.022 rows=1 loops=1)
 Total runtime: 0.125 ms
(2 rows)

testdb=# explain analyze update t1 set B = 1;
   QUERY PLAN
-
 Seq Scan on t1  (cost=0.00..27.70 rows=1770 width=14) (actual
time=0.021..0.024 rows=1 loops=1)
 Total runtime: 0.115 ms
(2 rows)

testdb=# explain analyze update t1 set B = 1;
   QUERY PLAN
-
 Seq Scan on t1  (cost=0.00..27.70 rows=1770 width=14) (actual
time=0.021..0.024 rows=1 loops=1)
 Total runtime: 0.113 ms
(2 rows)

testdb=# explain analyze insert into t2 values (2);
 QUERY PLAN

 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.008..0.010
rows=1 loops=1)
 Total runtime: 0.120 ms
(2 rows)

testdb=# explain analyze update t1 set B = 2;
   QUERY PLAN
-
 Seq Scan on t1  (cost=0.00..27.70 rows=1770 width=14) (actual
time=0.020..0.024 rows=1 loops=1)
 Trigger for constraint fk1: time=0.112 calls=1
 Total runtime: 0.233 ms
(3 rows)

testdb=# explain analyze update t1 set A = 99;
   QUERY PLAN
-
 Seq Scan on t1  (cost=0.00..27.70 rows=1770 width=14) (actual
time=0.021..0.025 rows=1 loops=1)
 Total runtime: 0.117 ms
(2 rows)

testdb=# alter table t1 add b2 bigint;
ALTER TABLE
testdb=# alter table t1 add constraint fk2 foreign key  (B2) references t2 (B);
ALTER TABLE
testdb=# explain analyze update t1 set B = 1;
   QUERY PLAN
-
 Seq Scan on t1  (cost=0.00..25.10 rows=1510 width=22) (actual
time=0.026..0.029 rows=1 loops=1)
 Trigger for constraint fk1: time=0.111 calls=1
 Trigger for constraint fk2: time=0.014 

Re: [GENERAL] Foreign key verification trigger conditions

2009-06-01 Thread j-lists
Hi Tom,
Thank you for pointing out the condition under which this occurs, I
had not made the connection that the check was only occurring when the
value in the other columns with foreign keys are null. I agree 100%
that a strict key equality check that is in general use in the
database should not return true for null = null. But I believe we can
always come to the conclusion that a foreign key constraint is
satisfied if all of the key values are null since that effectively
means that the relationship is not present. Searching for ri_KeysEqual
leads me to this discussion of the same topic:
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00803.php

Would there be any interest in implementing this change? Should I be
reporting a bug to get it into the development queue?
(My apologies that I have neither the skills nor the resources to work
on it myself.)

Thanks,
-J

On Mon, Jun 1, 2009 at 9:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 j-lists jamisonli...@gmail.com writes:
 I have an update statement that affects every row in a given table.
 For that table it changes the value in a single column, which itself
 has a foreign key constraint. The table has an additional 9 foreign
 keys, some of which reference large tables.
 My expectation would be that only the changed column would be checked
 against the foreign key of interest, instead I find that all the
 foreign keys are checked when this statement is executed.

 What your test case actually seems to show is that the skip-the-trigger
 optimization doesn't fire when the column value is NULL.  Which is
 because ri_KeysEqual() doesn't consider two nulls to be equal.  It's
 possible we could change that but I'd be worried about breaking other
 cases that are actually semantically critical...

                        regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] New 8.4 features

2009-04-20 Thread Oliver Kohll - Mailing Lists

Hello,

Some of the new language features in 8.4 seem like pretty major  
additions. I know that the window functions will be very useful. There  
have been many times in the past when I've wanted to aggregate in this  
way:


http://elegantcode.com/2009/01/04/sql-window-clause/

If this is possible now it'll help massively in rapid prototyping - in  
the past, reports involving aggregates have needed the most time to  
develop.


Another interesting feature is recursive SQL.

I know 8.4 is only just in beta but it would be good to learn what is  
(and isn't) possible. These seem like the sorts of things that would  
get good writeups at varlena.com but I see there haven't been any new  
posts there in a couple of years. My question is, is anyone planning  
to blog / write focussing on these features?


Oliver Kohll
www.gtwm.co.uk - company
www.gtportalbase.com - product


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Open sourcing

2009-02-19 Thread Oliver Kohll - Mailing Lists

Hi,

Wondering who has any practical experience of open sourcing a software  
product? I made a small internal database tool open source a while ago  
and that worked quite well - www.gtportalbase.com/opensource. People  
got involved in adding to the code and some people even paid a bit for  
our help. Now considering more. Any larger scale experiences?


Regards
Oliver Kohll

oli...@gtwm.co.uk / 0845 456 1810 / 07814 828608
www.gtwm.co.uk - company
www.gtportalbase.com - product



[GENERAL] What query on system tables retrieves table or queries definitoin

2006-03-04 Thread lists


What is the query for retrieving a tables and a views definition from the system
tables (in 7.4).

I am trying to find my way in the system tables and I am finding very confusing,
especially way in the way the tables are linked.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[GENERAL] What is the syntax for setting a default date in PostgreSQL

2006-02-20 Thread lists


This issue always stumps me. I need to set the default date for a column in
postgres. A value for today, today + x days, now (timestamp), now + x amount of
days.

I never seem to be able to locate it in the documentation or online. Can anyone
post it here once and for all for posterity?

I will be much obliged :)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[GENERAL] Syntax Error Inserting From STDIN?

2005-12-20 Thread Hunter's Lists
I am trying to run the following:

COPY departments (id, issue_id, title, description, feature_type) FROM
stdin;
23  4   Local BuzzThings to do, people to see, places to go.  aspen

I get back:

COPY departments (id, issue_id, title, description, feature_type) FROM
stdin;

23  4   Local BuzzThings to do, people to see, places to go.  aspen

ERROR:  syntax error at or near 23 at character 80

What am I doing wrong? I am on 8.1.1...

The table is defined as:

--++
--
 id   | integer| not null default
 issue_id | integer|
 title| character varying(255) |
 description  | character varying(255) |
 feature_type | character varying(255) |

Thx.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Syntax Error Inserting From STDIN?

2005-12-20 Thread Hunter's Lists
Interesting.

How would I go about solving that?

I inserted an extra line between the two, no dice.

 From: Scott Marlowe [EMAIL PROTECTED]
 Date: Tue, 20 Dec 2005 13:53:37 -0600
 To: Hunter's Lists [EMAIL PROTECTED]
 Cc: PostgreSQL pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Syntax Error Inserting From STDIN?
 
 On Tue, 2005-12-20 at 13:30, Hunter's Lists wrote:
 I am trying to run the following:
 
 COPY departments (id, issue_id, title, description, feature_type) FROM
 stdin;
 23  4   Local BuzzThings to do, people to see, places to go.  aspen
 
 I get back:
 
 COPY departments (id, issue_id, title, description, feature_type) FROM
 stdin;
 
 23  4   Local BuzzThings to do, people to see, places to go.  aspen
 
 ERROR:  syntax error at or near 23 at character 80
 
 Seeing that character 80 I'm gonna guess this is a CR/LF issue.  I.e.
 pgsql on your machine is seeing the 23 as being on the same line as the
 copy departments statement.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] Syntax Error Inserting From STDIN?

2005-12-20 Thread Hunter's Lists
Everything was on OS X.

Looks like it was a problem with spaces vs. tabs. Anyway, I went through and
fixed all the lines and everything went in.

We had a strange problem restoring a 8.0.4 dump to a 8.1.1 server and this
was the last of the data that had to be re-imported.

 From: Scott Marlowe [EMAIL PROTECTED]
 Date: Tue, 20 Dec 2005 16:41:32 -0600
 To: Hunter's Lists [EMAIL PROTECTED]
 Cc: PostgreSQL pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Syntax Error Inserting From STDIN?
 
 quick answer, try a different editor.
 
 Are you editing in the same environment as the database is in, or are
 you editing on windows and feeding the copy data in on another platform?
 
 On Tue, 2005-12-20 at 14:40, Hunter's Lists wrote:
 Interesting.
 
 How would I go about solving that?
 
 I inserted an extra line between the two, no dice.
 
 From: Scott Marlowe [EMAIL PROTECTED]
 Date: Tue, 20 Dec 2005 13:53:37 -0600
 To: Hunter's Lists [EMAIL PROTECTED]
 Cc: PostgreSQL pgsql-general@postgresql.org
 Subject: Re: [GENERAL] Syntax Error Inserting From STDIN?
 
 On Tue, 2005-12-20 at 13:30, Hunter's Lists wrote:
 I am trying to run the following:
 
 COPY departments (id, issue_id, title, description, feature_type) FROM
 stdin;
 23  4   Local BuzzThings to do, people to see, places to go.  aspen
 
 I get back:
 
 COPY departments (id, issue_id, title, description, feature_type) FROM
 stdin;
 
 23  4   Local BuzzThings to do, people to see, places to go.  aspen
 
 ERROR:  syntax error at or near 23 at character 80
 
 Seeing that character 80 I'm gonna guess this is a CR/LF issue.  I.e.
 pgsql on your machine is seeing the 23 as being on the same line as the
 copy departments statement.
 
 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [GENERAL] Beyond the 1600 columns limit on windows

2005-11-09 Thread Evandro's mailing lists (Please, don't send personal messages to this address)
Yes it is exactly that. I will follow you advice and create a abstraction layer for the data access that will return the sparse dataset using the standard dataset as input.

There is just one thing I disagree you said it that the performance is not good, right. However, it is practical! Nothing is easier and more practical thankeeping thesparse representation inside of the database for my application.

On 11/8/05, John D. Burger [EMAIL PROTECTED] wrote:
Evandro's mailing lists (Please, don't send personal messages to thisaddress) wrote: It has nothing to do with normalisation. It is a program for
 scientific applications. Datavalues are broken into column to allow multiple linear regression and multivariate regression trees computations.Having done similar things in the past, I wonder if your current DB
design includes a column for every feature-value combination:instanceIDcolor=redcolor=bluecolor=yellow...height=71height=72-42 True False False
43 False TrueFalse44 False False True...This is likely to be extremely sparse, and you might use a sparserepresentation accordingly.As several folks have suggested, the
representation in the database needn't be the same as in your code. Even SPSSthe most well-known statistic sw uses the same approach and data structure that my software uses. Probably I should use another data structure but would not be as
 eficient and practical as the one I use now.The point is that, if you want to use Postgres, this is not in factefficient and practical.In fact, it might be the case that mappingfrom a sparse DB representation to your internal data structures is
=more= efficient than naively using the same representation in bothplaces.- John D. BurgerMITRE-- Evandro M Leite JrPhD Student  Software developer
University of Southampton, UKPersonal website: http://evandro.orgAcademic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages 


[GENERAL] Beyond the 1600 columns limit on windows

2005-11-08 Thread Evandro's mailing lists (Please, don't send personal messages to this address)
Hi guys,

I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres.

Regards
-Evandro-- Evandro M Leite JrPhD Student  Software developerUniversity of Southampton, UKPersonal website: http://evandro.orgAcademic website: 
http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages 


Re: [GENERAL] Beyond the 1600 columns limit on windows

2005-11-08 Thread Evandro's mailing lists (Please, don't send personal messages to this address)
I'm doing a PhD in data mining and I need more than 1600 columns. I gotan error message saying that I can not use more than 1600 columns.

It is happening because I have to change categorical values to binarycreating new columns. Do you know if oracle can handle it?


-- Evandro M Leite Jr. PhD Student  Software developer University of Southampton, UKPersonal website: http://evandro.orgAcademic website: 
http://www.soton.ac.uk/~evandroMobile 079 068 70740 Office 023 8055 3644 Home 023 8055 9160 
On 11/8/05, Tino Wildenhain [EMAIL PROTECTED] wrote:
Evandro's mailing lists (Please, don't send personal messages to thisaddress) schrieb: Hi guys,
 I would like to know if it is possible to have more than 1600 columns on windows without recompiling postgres.I would like to know who on earth needs 1600 columns and even beyond?Hint: you can have practically unlimited rows in your n:m table :-)
---(end of broadcast)---TIP 4: Have you searched our list archives?http://archives.postgresql.org
-- Evandro M Leite JrPhD Student  Software developerUniversity of Southampton, UKPersonal website: http://evandro.org
Academic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages 


Re: [GENERAL] Beyond the 1600 columns limit on windows

2005-11-08 Thread Evandro's mailing lists (Please, don't send personal messages to this address)
Sorry,

It has nothing to do with normalisation. It is a program for scientific applications.
Datavalues are broken into column to allow multiple linear regression and multivariate regression trees computations.

Even SPSSthe most well-known statistic sw uses the same approach and data structure that my software uses. 
Probably I should use another data structure but would not be as eficient and practical as the one I use now.

Many thanks
-Evandro

On 08 Nov 2005 05:30:07 -0800, Randal L. Schwartz merlyn@stonehenge.com wrote:
 Evandro's == Evandro's mailing lists (Please, don't send personal messages to this address) 
[EMAIL PROTECTED] writes:[I would have replied to your personal address, but I'm not aboutto copy it from a footer.]Evandro's I'm doing a PhD in data mining and I need more than 1600 columns. I got an
Evandro's error message saying that I can not use more than 1600 columns.Evandro'sIt is happening because I have to change categorical values to binaryEvandro's creating new columns. Do you know if oracle can handle it?
/me bogglesYou are doing a PhD in data mining, and you have a table that needsmore than 1600 columns?/me gaspsWhat are they *teaching* these days?If you have a design that has more than 20 or so columns, you're
probably already not normalizing properly.There just aren't *that*many attributes of a object before you should start factoring parts ofit out, even if it means creating some 1-1 tables.In programming, if I ever see someone name a sequence of variables,
like thing1 and thing2, I know there's going to be trouble ahead,because that should have been a different data structure.Similarly,I bet some of your columns are foo1 and foo2.Signs of brokenness
in the design.Or do you really have 1600 *different* attributes, none of which havea number in their name?That requires a serious amount ofcreativity. :)--Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
merlyn@stonehenge.com URL:http://www.stonehenge.com/merlyn/Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!-- Evandro M Leite JrPhD Student  Software developer
University of Southampton, UKPersonal website: http://evandro.orgAcademic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages 


[GENERAL] Seq Scan but I think it should be Index Scan

2005-10-26 Thread Edoceo Lists

So the details of it:
I'm using PostgreSQL 8.0.3 on a Gentoo kernel 2.6.12 on a P4/2.8G+HT proc, with kernel RAID0 on some SATA drives and 1G 
RAM.  Don't know the bus speed.  I'm thinking that my queries are not using indexs correctly and therefore taking longer 
to complete than they should.  I've put the details below, but changed some names.  If anyone could shed some light?


pg_config  --configure
'--prefix=/usr' '--mandir=/usr/share/man' '--host=i686-pc-linux-gnu' '--with-docdir=/usr/share/doc/postgresql-8.0.3' 
'--libdir=/usr/lib' '--enable-depend' '--with-gnu-ld' '--with-perl' '--with-openssl' '--enable-nls' '--with-pam' 
'CFLAGS=-O2 -march=pentium4' 'host_alias=i686-pc-linux-gnu'



data=# explain analyze select count(id) from x_base where x_type  100 and 
x_date='2005-10-26' and x_time'06:00:00';
  QUERY PLAN
---
 Aggregate  (cost=539453.36..539453.36 rows=1 width=4) (actual 
time=66200.763..66200.764 rows=1 loops=1)
   -  Seq Scan on x_base  (cost=0.00..539136.18 rows=126871 width=4) (actual 
time=31618.624..66174.710 rows=37451 loops=1)
 Filter: ((x_type  100) AND (x_date = '2005-10-26'::date) AND (x_time 
 '06:00:00'::time without time zone))
 Total runtime: 66200.811 ms

data=# \d x_base
 Table public.x_base
   Column|  Type  |Modifiers
-++--
 id  | integer| not null default 
nextval('public.x_base_id_seq'::text)
 x_code  | character(8)   |
 x_date  | date   | not null
 x_time  | time without time zone | not null
 a   | character(1)   |
 b   | integer|
 c   | character(5)   |
 d   | character(16)  |
 e   | character(1)   |
 f   | character(1)   |
 g   | character(10)  |
 h   | character(1)   |
 i   | character(1)   |
 j   | character varying(32)  |
 k   | integer|
 l   | integer|
 m   | integer|
 n   | character varying(32)  |
 o   | integer|
 p   | character varying(14)  |
 q   | integer|
Indexes:
x_base_pkey PRIMARY KEY, btree (id)
ix_d_cd btree (x_date)
ix_t_cb btree (x_type)
Foreign-key constraints:
fk_k_id FOREIGN KEY (k) REFERENCES x_file(id)

Now, see that x_type index?  Why didn't this thing Index Scan ix_t_cb on that column?  Me thinks if it had my query 
would be much faster.  Or perhaps if I only where x_type?  I tried that but it still took a minute.  I took out count() 
and it still took a minute.  Always using Seq Scan, am I doing something dumb here?  There are more than six million 
records in that table, maybe thats just how long it takes?  Perhaps I should change architecture or schema to improve 
performance?  Tweak the log? Thanks.


/djb


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[GENERAL] Large Table Performance

2005-10-21 Thread Edoceo Lists

List,
  I've got a problem where I need to make a table that is going to grow by an average of 230,000 records per day. There 
are only 20 columns in the table, mostly char and integer.  It's FK'd in two places to another table for import/export 
transaction id's and I have a serial primary key and an index on a date column for when I need to search (every search 
is done inside a date range).  I thought it would be OK but after a few weeks of operation I have more than five million 
records in there.  Some queries take more than five minutes to complete and I'm sad about that.  How can I make this 
faster?  I could munge dates into integers if their faster, I'm OK with that.  What can I tweak in the configuration 
file to speed things up?  What about some drastic schema change that more experience would have shown me?  I cannot show 
the full schema but it's like this:


-- My import/export data information table
ie_data (id serial primary key, date date, [12 other columns here])

big_transaction_table(id serial primary key, import_id int w/FK, export_id int w/FK, date date, [20 other necessary 
transaction detail columns])


So when I say
select x,y,z from big_transaction_table where date='10/2/2005' and date='10/4/2005' and transaction_status in (1,2,3) 
order by date;

it takes five+ minutes.

TIA for any suggestions.

/djb

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[GENERAL] NTFS partition autodetection during instalation

2005-07-13 Thread Evandro's mailing lists (Please, don't send personal messages to this address)
Dear hackers

I want to embed postgres into a software installation. I will use silent install, how do I detect which partition is NTFS so I can tell the installer to use that partition?


Many thanks
-Evandro-- Evandro M Leite JrPhD Student  Software developer
University of Southampton, UKPersonal website: http://evandro.orgAcademic website: http://www.soton.ac.uk/~evandroPlease, use Jr(at)evandro.org for personal messages 


[GENERAL] How can this be optimized, if possible?

2005-07-04 Thread Net Virtual Mailing Lists
Hello,


My database has grown far faster then expected and a query which used to
run acceptably now does not.   I'm trying to figure out a way to make
this operate faster and scale better.   I'm very open to the idea that
this does not need to be done using a SQL query at all - right now I'm
really just in need of some conceptual/architectural help on this one.


So I have two tables:

 Table category
Column|   Type| Modifiers 
--+---+---
 head_title   | character varying | 
 cat_title| character varying | 
 subcat_title | character varying | 
 category | ltree | 



 Table test
   Column   |   Type   |
   Modifiers
+--
+-
 id | integer  | not null default
nextval('master.test_id_seq'::text)
 category   | ltree[]  | 


... there are other fields in the test table, but these are really the
only two relevant to this.


The query I want to run against these two tables is something like this:

SELECT
 count(*) as count,
 category.category,
 nlevel(category.category) AS level,
 subpath(category.category,0,nlevel(category.category)-1) as parent,
 category.head_title,
 category.cat_title,
 category.subcat_title
FROM
 test,
 category
WHERE
 test.category @ category.category 
GROUP BY
  category.category, category.head_title, category.cat_title,
category.subcat_title | 


Many times the WHERE clause will contain additional search criteria on
the 'test' table.  What I am trying to get is a count of how many rows
from the test table fall into each category, being limited by the search
criteria.

This query is starting to take an enormous amount of time (30+ seconds)
and I really need the results of this in a couple seconds tops.  I can do
a select category from test and it completes in about .5 seconds.  The
category table currently only has 225 rows, the test table having
approximately 30,000.  SELECT count(category,category FROM test GROUP BY
category is quite slow and I thought of making a materialized view of
this, but then of course I don't see any way to make that result limited
by my search criteria.

I am completely open to re-architecting this entirely, performance of
this query is critical to my application - I really just am not sure
where to start.  It seems like everything I do is worse then what I
started with.

... It *almost* seems as if I need to build some sort of search engine
like tool which performs all the queries against the database, has tons
of memory, and cache the category attributes for each record in memory. 
This sure seems like a lot of work though - I sincerely hope there is an
easier way.

Thanks for your help, as always!

- Greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [GENERAL] Peculiar performance observation....

2005-03-15 Thread Net Virtual Mailing Lists
On Mon, 2005-03-14 at 21:14, Net Virtual Mailing Lists wrote:
 On Mon, 2005-03-14 at 06:11, Net Virtual Mailing Lists wrote:
  Hello,
  
  
  I am sorry to bring this up again Does anyone have any idea what
  might be going on here?...   I'm very worried about this situation.. ;-(
 
 It looks to me like either you're not analyzing often enough, or your
 statistics target is too low to get a good sample.  Note your estimated
 versus real rows are off by a factor of 70 (28 est. versus 1943 actual
 rows). That's a pretty big difference, and where you should be looking.
 
 -  Seq Scan on jobdata  (cost=0.00..7567.88 rows=28 width=52)
(actual
  time=11.498..4800.907 rows=1943 loops=1)
 
 Yes, this is because PostgreSQL is using an index to approximate a
 sequential scan, which is not a good thing since PostgreSQL can't get
 all the information it needs from just an index, but has to visit the
 table to check visibility.
 
 
 
 All of these were after a vacuum full analyze, which I actually do
 nightly on the database.
 
 I probably confused the issue with all of my posts, this is the query
 which has me concerned.  When running it on my system here, the disk
 thrashes (and I mean *THRASHES*) the entire 12-20 seconds it takes to
 run...  WHen running on our production servers, I can't hear the disk,
 but see an equally troubling performance loss when using the index.

I'll call this query 1:

 
 database= explain analyze select id from table1 where category @ 'a.b';
 QUERY
 PLAN 
 -
 -
 -
  Index Scan using table1_category_full_gist_idx on jobdata 
 (cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528
 rows=1943 loops=1)
Index Cond: (category @ 'a.b'::ltree)
Filter: (category @ 'a.b'::ltree)
  Total runtime: 1.258 ms
 
 
 I can do this to speed things up (this results in very little disk
 activity, certainly not the thrashing the original query did):
 
 
 create table yuck (id integer, category ltree[]);
 insert into yuck select id, category from table1;
 create index category_idx on yuck using gist(category);
 vacuum analyze yuck;
 jobs= explain analyze select id from table1 where id in (select id from
 yuck where category @ 'a.b');
   QUERY PLAN
   
 -
 -
 -
  Nested Loop  (cost=108.64..114.28 rows=1 width=52) (actual
 time=654.645..1245.212 rows=1943 loops=1)
-  HashAggregate  (cost=108.64..108.64 rows=1 width=4) (actual
 time=654.202..690.709 rows=1943 loops=1)
  -  Index Scan using category_idx on yuck  (cost=0.00..108.57
 rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1)
Index Cond: (category @ 'a.b'::ltree)
Filter: (category @ 'a.b'::ltree)
-  Index Scan using table1_pkey on jobdata  (cost=0.00..5.64 rows=1
 width=52) (actual time=0.219..0.235 rows=1 loops=1943)
  Index Cond: (table1.id = outer.id)
  Total runtime: 1261.551 ms
 (8 rows)


 If I drop the index table1_category_full_gist_idx, the query speeds up
 dramatically (10-15 times faster on both dev and prod uction systems).
 
 So my concern, in short: why is it so much slower when actually using an
 index and why is it trying to make mince meat out of my hard drive?

I'll explain it again, sorry if my quoting originally was a bit of a
mess.  I meant to post the last comment I made after some other comment
in your original post that I think I deleted.

Anyway, the reason it's slow is that PostgreSQL, unlike most other
databases, cannot get the answers from an index.  It can only get a
pointer to the right place in the table to look for the answer.  After
that, due to visibility issues caused by the way postgresql implements
MVCC, it then has to look IN THE TABLE to find out if the value is
visible to your transaction or not.  So it's going Index then table,
then index, then table, then index, then table, for however many rows
it's gonna grab.  In this case 1943.

In query 1, the number of rows being returned by the index scan is 1943,
but the planner only thinks it's gonna get back 28.  So, with a 70:1
ratio of incorrectness here, the planner thinks an index scan is a good
idea.  It's not, it's a terrible idea for your table.  The problem is
likely that the query planner is not getting the right numbers for this
table, and I'm not even sure how accurate statistics can be for ltrees,
as I've only ever used btree indexes in postgresql.  But, upping the
statistics target for the column producing this bad behavior and
rerunning

Re: [GENERAL] Peculiar performance observation....

2005-03-14 Thread Net Virtual Mailing Lists
Hello,


I am sorry to bring this up again Does anyone have any idea what
might be going on here?...   I'm very worried about this situation.. ;-(

- Greg


Something even more peculiar (at least it seems to me..)...


If I drop the index table1_category_gist_idx, I get this:


jobs= explain analyze select id from table1 where category @ 'a.b'
ORDER BY category;
QUERY PLAN  
  
-

--
 Sort  (cost=7568.55..7568.62 rows=28 width=52) (actual
time=4842.691..4854.468 rows=1943 loops=1)
   Sort Key: category
   -  Seq Scan on jobdata  (cost=0.00..7567.88 rows=28 width=52) (actual
time=11.498..4800.907 rows=1943 loops=1)
 Filter: (category @ 'a.b'::ltree)
 Total runtime: 4871.076 ms
(5 rows)


.. no disk thrashing all over the place.. 

I'm really perplexed about this one..;-(

- Greg

I have a rather peculiar performance observation and would welcome any
feedback on this.

First off, the main table (well, part of it.. it is quite large..):


 Table table1
   Column   |   Type   |
   Modifiers
+--
+-
 id | integer  | not null default
nextval('master.id_seq'::text)
 user_id| integer  | 
 ... (skipping about 20 columns)
 category   | ltree[]  | 
 somedata   | text | not null



Indexes:
table1_pkey primary key, btree (id)
table1_category_full_gist_idx gist (category)
table1_id_idx btree (id)
table1_fti_idx gist (fti) WHERE ((status)::text = 'open'::text)
table1_user_id_idx btree (user_id)


database= explain analyze select id from table1 where category @ 'a.b';
QUERY
PLAN 
-
-
-
 Index Scan using table1_category_full_gist_idx on jobdata 
(cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528
rows=1943 loops=1)
   Index Cond: (category @ 'a.b'::ltree)
   Filter: (category @ 'a.b'::ltree)
 Total runtime: 1.258 ms


If I do this:

create table yuck (id integer, category ltree[]);
insert into yuck select id, category from table1;
create index category_idx on yuck using gist(category);
vacuum analyze yuck;
jobs= explain analyze select id from table1 where id in (select id from
yuck where category @ 'a.b');
  QUERY PLAN
  
-
-
-
 Nested Loop  (cost=108.64..114.28 rows=1 width=52) (actual
time=654.645..1245.212 rows=1943 loops=1)
   -  HashAggregate  (cost=108.64..108.64 rows=1 width=4) (actual
time=654.202..690.709 rows=1943 loops=1)
 -  Index Scan using category_idx on yuck  (cost=0.00..108.57
rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1)
   Index Cond: (category @ 'a.b'::ltree)
   Filter: (category @ 'a.b'::ltree)
   -  Index Scan using table1_pkey on jobdata  (cost=0.00..5.64 rows=1
width=52) (actual time=0.219..0.235 rows=1 loops=1943)
 Index Cond: (table1.id = outer.id)
 Total runtime: 1261.551 ms
(8 rows)


In the first query, my hard disk trashes audibly the entire 12 seconds
(this is actually the best run I could get, it is usually closer to 20
seconds), the second query runs almost effortlessly..  I've tried
reindexing, even dropping the index and recreating it but nothing I do
helps at all.

Now keep in mind that I do all of my development on painfully slow
hardware in order to make any performance issues really stand out.  But,
I've done this on production servers too with an equal performance
improvement noticed.

I just can't figure out why this second query is so much faster, I feel
like I must have done something very wrong in my schema design or
something to be suffering this sort of a performance loss.   Any idea
what I can do about this?

Thanks as always!

- Greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [GENERAL] Peculiar performance observation....

2005-03-14 Thread Net Virtual Mailing Lists
On Mon, 2005-03-14 at 06:11, Net Virtual Mailing Lists wrote:
 Hello,
 
 
 I am sorry to bring this up again Does anyone have any idea what
 might be going on here?...   I'm very worried about this situation.. ;-(

It looks to me like either you're not analyzing often enough, or your
statistics target is too low to get a good sample.  Note your estimated
versus real rows are off by a factor of 70 (28 est. versus 1943 actual
rows). That's a pretty big difference, and where you should be looking.

-  Seq Scan on jobdata  (cost=0.00..7567.88 rows=28 width=52) (actual
 time=11.498..4800.907 rows=1943 loops=1)

Yes, this is because PostgreSQL is using an index to approximate a
sequential scan, which is not a good thing since PostgreSQL can't get
all the information it needs from just an index, but has to visit the
table to check visibility.



All of these were after a vacuum full analyze, which I actually do
nightly on the database.

I probably confused the issue with all of my posts, this is the query
which has me concerned.  When running it on my system here, the disk
thrashes (and I mean *THRASHES*) the entire 12-20 seconds it takes to
run...  WHen running on our production servers, I can't hear the disk,
but see an equally troubling performance loss when using the index.

database= explain analyze select id from table1 where category @ 'a.b';
QUERY
PLAN 
-
-
-
 Index Scan using table1_category_full_gist_idx on jobdata 
(cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528
rows=1943 loops=1)
   Index Cond: (category @ 'a.b'::ltree)
   Filter: (category @ 'a.b'::ltree)
 Total runtime: 1.258 ms


I can do this to speed things up (this results in very little disk
activity, certainly not the thrashing the original query did):


create table yuck (id integer, category ltree[]);
insert into yuck select id, category from table1;
create index category_idx on yuck using gist(category);
vacuum analyze yuck;
jobs= explain analyze select id from table1 where id in (select id from
yuck where category @ 'a.b');
  QUERY PLAN
  
-
-
-
 Nested Loop  (cost=108.64..114.28 rows=1 width=52) (actual
time=654.645..1245.212 rows=1943 loops=1)
   -  HashAggregate  (cost=108.64..108.64 rows=1 width=4) (actual
time=654.202..690.709 rows=1943 loops=1)
 -  Index Scan using category_idx on yuck  (cost=0.00..108.57
rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1)
   Index Cond: (category @ 'a.b'::ltree)
   Filter: (category @ 'a.b'::ltree)
   -  Index Scan using table1_pkey on jobdata  (cost=0.00..5.64 rows=1
width=52) (actual time=0.219..0.235 rows=1 loops=1943)
 Index Cond: (table1.id = outer.id)
 Total runtime: 1261.551 ms
(8 rows)



If I drop the index table1_category_full_gist_idx, the query speeds up
dramatically (10-15 times faster on both dev and prod uction systems).

So my concern, in short: why is it so much slower when actually using an
index and why is it trying to make mince meat out of my hard drive?


- Greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[GENERAL] Peculiar performance observation....

2005-03-12 Thread Net Virtual Mailing Lists
I have a rather peculiar performance observation and would welcome any
feedback on this.

First off, the main table (well, part of it.. it is quite large..):


 Table table1
   Column   |   Type   |
   Modifiers
+--
+-
 id | integer  | not null default
nextval('master.id_seq'::text)
 user_id| integer  | 
 ... (skipping about 20 columns)
 category   | ltree[]  | 
 somedata   | text | not null



Indexes:
table1_pkey primary key, btree (id)
table1_category_full_gist_idx gist (category)
table1_id_idx btree (id)
table1_fti_idx gist (fti) WHERE ((status)::text = 'open'::text)
table1_user_id_idx btree (user_id)


database= explain analyze select id from table1 where category @ 'a.b';
QUERY
PLAN 
-
-
-
 Index Scan using table1_category_full_gist_idx on jobdata 
(cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528
rows=1943 loops=1)
   Index Cond: (category @ 'a.b'::ltree)
   Filter: (category @ 'a.b'::ltree)
 Total runtime: 1.258 ms


If I do this:

create table yuck (id integer, category ltree[]);
insert into yuck select id, category from table1;
create index category_idx on yuck using gist(category);
vacuum analyze yuck;
jobs= explain analyze select id from table1 where id in (select id from
yuck where category @ 'a.b');
  QUERY PLAN
  
-
-
-
 Nested Loop  (cost=108.64..114.28 rows=1 width=52) (actual
time=654.645..1245.212 rows=1943 loops=1)
   -  HashAggregate  (cost=108.64..108.64 rows=1 width=4) (actual
time=654.202..690.709 rows=1943 loops=1)
 -  Index Scan using category_idx on yuck  (cost=0.00..108.57
rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1)
   Index Cond: (category @ 'a.b'::ltree)
   Filter: (category @ 'a.b'::ltree)
   -  Index Scan using table1_pkey on jobdata  (cost=0.00..5.64 rows=1
width=52) (actual time=0.219..0.235 rows=1 loops=1943)
 Index Cond: (table1.id = outer.id)
 Total runtime: 1261.551 ms
(8 rows)


In the first query, my hard disk trashes audibly the entire 12 seconds
(this is actually the best run I could get, it is usually closer to 20
seconds), the second query runs almost effortlessly..  I've tried
reindexing, even dropping the index and recreating it but nothing I do
helps at all.

Now keep in mind that I do all of my development on painfully slow
hardware in order to make any performance issues really stand out.  But,
I've done this on production servers too with an equal performance
improvement noticed.

I just can't figure out why this second query is so much faster, I feel
like I must have done something very wrong in my schema design or
something to be suffering this sort of a performance loss.   Any idea
what I can do about this?

Thanks as always!

- Greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [GENERAL] Peculiar performance observation....

2005-03-12 Thread Net Virtual Mailing Lists
Something even more peculiar (at least it seems to me..)...


If I drop the index table1_category_gist_idx, I get this:


jobs= explain analyze select id from table1 where category @ 'a.b'
ORDER BY category;
QUERY PLAN  
  
-
--
 Sort  (cost=7568.55..7568.62 rows=28 width=52) (actual
time=4842.691..4854.468 rows=1943 loops=1)
   Sort Key: category
   -  Seq Scan on jobdata  (cost=0.00..7567.88 rows=28 width=52) (actual
time=11.498..4800.907 rows=1943 loops=1)
 Filter: (category @ 'a.b'::ltree)
 Total runtime: 4871.076 ms
(5 rows)


.. no disk thrashing all over the place.. 

I'm really perplexed about this one..;-(

- Greg

I have a rather peculiar performance observation and would welcome any
feedback on this.

First off, the main table (well, part of it.. it is quite large..):


 Table table1
   Column   |   Type   |
   Modifiers
+--
+-
 id | integer  | not null default
nextval('master.id_seq'::text)
 user_id| integer  | 
 ... (skipping about 20 columns)
 category   | ltree[]  | 
 somedata   | text | not null



Indexes:
table1_pkey primary key, btree (id)
table1_category_full_gist_idx gist (category)
table1_id_idx btree (id)
table1_fti_idx gist (fti) WHERE ((status)::text = 'open'::text)
table1_user_id_idx btree (user_id)


database= explain analyze select id from table1 where category @ 'a.b';
QUERY
PLAN 
-
-
-
 Index Scan using table1_category_full_gist_idx on jobdata 
(cost=0.00..113.48 rows=28 width=4) (actual time=43.814..12201.528
rows=1943 loops=1)
   Index Cond: (category @ 'a.b'::ltree)
   Filter: (category @ 'a.b'::ltree)
 Total runtime: 1.258 ms


If I do this:

create table yuck (id integer, category ltree[]);
insert into yuck select id, category from table1;
create index category_idx on yuck using gist(category);
vacuum analyze yuck;
jobs= explain analyze select id from table1 where id in (select id from
yuck where category @ 'a.b');
  QUERY PLAN
  
-
-
-
 Nested Loop  (cost=108.64..114.28 rows=1 width=52) (actual
time=654.645..1245.212 rows=1943 loops=1)
   -  HashAggregate  (cost=108.64..108.64 rows=1 width=4) (actual
time=654.202..690.709 rows=1943 loops=1)
 -  Index Scan using category_idx on yuck  (cost=0.00..108.57
rows=28 width=4) (actual time=2.046..623.436 rows=1943 loops=1)
   Index Cond: (category @ 'a.b'::ltree)
   Filter: (category @ 'a.b'::ltree)
   -  Index Scan using table1_pkey on jobdata  (cost=0.00..5.64 rows=1
width=52) (actual time=0.219..0.235 rows=1 loops=1943)
 Index Cond: (table1.id = outer.id)
 Total runtime: 1261.551 ms
(8 rows)


In the first query, my hard disk trashes audibly the entire 12 seconds
(this is actually the best run I could get, it is usually closer to 20
seconds), the second query runs almost effortlessly..  I've tried
reindexing, even dropping the index and recreating it but nothing I do
helps at all.

Now keep in mind that I do all of my development on painfully slow
hardware in order to make any performance issues really stand out.  But,
I've done this on production servers too with an equal performance
improvement noticed.

I just can't figure out why this second query is so much faster, I feel
like I must have done something very wrong in my schema design or
something to be suffering this sort of a performance loss.   Any idea
what I can do about this?

Thanks as always!

- Greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[GENERAL] Problem with inherited table, can you help?...

2005-03-10 Thread Net Virtual Mailing Lists
I have the following three tables and my inserts are blocking each other
in a way I just can't understand  Can someone point me in the
direction as to what is causing this?


jobs= \d master.locations
Table master.locations
   Column|Type |
Modifiers  
-+-
+
 location_id | integer | not null default
nextval('master.locations_location_id_seq'::text)
 user_id | integer | 
 addr1   | character varying(50)   | 
 addr2   | character varying(50)   | 
 city| character varying(50)   | not null
 state_id| integer | 
 state_other | character varying(50)   | 
 country_id  | integer | 
 zip | character varying(35)   | not null
 loc_type| character varying(9)| 
 deleted | boolean | not null
 entered_dt  | timestamp without time zone | not null
 updated_dt  | timestamp without time zone | 
Check constraints:
locations_loc_type CHECK (loc_type::text = 'primary'::text OR
loc_type::text = 'secondary'::text)



jobs= \d jl_site1.locations
Table jl_site1.locations
   Column|Type |
Modifiers  
-+-
+
 location_id | integer | not null default
nextval('master.locations_location_id_seq'::text)
 user_id | integer | 
 addr1   | character varying(50)   | 
 addr2   | character varying(50)   | 
 city| character varying(50)   | not null
 state_id| integer | 
 state_other | character varying(50)   | 
 country_id  | integer | 
 zip | character varying(35)   | not null
 loc_type| character varying(9)| 
 deleted | boolean | not null
 entered_dt  | timestamp without time zone | not null
 updated_dt  | timestamp without time zone | 
Indexes:
locations_pkey primary key, btree (location_id)
locations_location_id_key unique, btree (location_id)
locations_country_id_idx btree (country_id)
locations_state_id_idx btree (state_id)
locations_user_id_idx btree (user_id)
locations_zip_idx btree (zip)
Check constraints:
locations_loc_type CHECK (loc_type::text = 'primary'::text OR
loc_type::text = 'secondary'::text)
Foreign-key constraints:
$3 FOREIGN KEY (user_id) REFERENCES jl_site1.customer(id) ON UPDATE
CASCADE ON DELETE CASCADE
$2 FOREIGN KEY (country_id) REFERENCES countries(country_id) ON
DELETE RESTRICT
$1 FOREIGN KEY (state_id) REFERENCES states(state_id) ON DELETE RESTRICT
Inherits: locations


jobs= \d jl_site2.locations
Table jl_site2.locations
   Column|Type |
Modifiers  
-+-
+
 location_id | integer | not null default
nextval('master.locations_location_id_seq'::text)
 user_id | integer | 
 addr1   | character varying(50)   | 
 addr2   | character varying(50)   | 
 city| character varying(50)   | not null
 state_id| integer | 
 state_other | character varying(50)   | 
 country_id  | integer | 
 zip | character varying(35)   | not null
 loc_type| character varying(9)| 
 deleted | boolean | not null
 entered_dt  | timestamp without time zone | not null
 updated_dt  | timestamp without time zone | 
Indexes:
locations_pkey primary key, btree (location_id)
locations_location_id_key unique, btree (location_id)
locations_country_id_idx btree (country_id)
locations_state_id_idx btree (state_id)
locations_user_id_idx btree (user_id)
locations_zip_idx btree (zip)
Check constraints:
locations_loc_type CHECK (loc_type::text = 'primary'::text OR
loc_type::text = 'secondary'::text)
Foreign-key constraints:
$3 FOREIGN KEY (user_id) REFERENCES jl_site2.customer(id) ON UPDATE
CASCADE ON DELETE CASCADE
$2 FOREIGN KEY (country_id) REFERENCES countries(country_id) ON
DELETE RESTRICT
$1 FOREIGN KEY (state_id) REFERENCES states(state_id) ON DELETE RESTRICT
Inherits: locations


(NOTE: at this point, hopefull it is clear that both jl_site1 and
jl_site2 inherit the master.locations table)


In connection #1, I do:

1. set search_path=jl_site1,public;
2. BEGIN;
3. INSERT INTO locations 

Re: [GENERAL] Disabling triggers in a transaction

2005-03-09 Thread Net Virtual Mailing Lists
Net Virtual Mailing Lists wrote:
 All I did was added an extra column to my table (I called it
 batch_process).  Then in
 the trigger do something like (in whichever function you are calling):
 
 IF NEW.batch_update IS NOT NULL AND NEW.batch_update = ''t'' THEN
   NEW.batch_process := NULL;
   RETURN NULL;
 END IF;
 .. whatever the rest of transaction is

Why don't you just set it to false instead of NULL? Wouldn't that reduce 
the condition to just IF NEW.batch_update THEN ...? In that case you 
should default the column to false of course, or the condition will 
always fail (the value being NULL).
Personally, I would use a more descriptive name for the column, 
'disable_triggers' or something like that.


Yeah, that's one improvement I meant to make but just haven't gotten
around to it..  It is just the way this thing got written the first time
during and testing and the oh!  It worked! realization.. ;-)

Also, I find it more convenient to use true and false instead of 
having to escape 't' and 'f' all the time ;)


Yeah.. ;-)

 Then when doing an insert, just:
 
 INSERT INTO table (..., batch_process) VALUES (..., 't') when you want the
 trigger not to fire...
 
 Or an update:
 
 UPDATE TABLE table SET , batch_process = 't' 
 
 
 I'm not sure sure how to make it work on a function called from a delete
 trigger though.. ;-(

The drawbacks of this method are that you'll have to modify all your 
queries when you want to disable triggers (though that can usually be 
solved programatically), and that only the triggers that support this 
method of disabling will be actually disabled.


It seems like you would have to do something programatically anyways in
order to say Okay, now I want to disable the triggers -- go do something

If you work at the same project with multiple people who all write 
triggers from time to time, or when you have to deal with legacy code 
from an older database, I think you'll run into trouble with the above 
quicker than you'd like. However, if you manage to get this into the 
design fase of a project it'll probably work just fine (the delete 
problem you mentioned aside...).


I think any of the solutions I've seen mentioned so far would present the
same problem. :-(

-- 
Alban Hertroys
MAG Productions

T: +31(0)53 4346874
F: +31(0)53 4346876
E: [EMAIL PROTECTED]
W: http://www.magproductions.nl




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [GENERAL] Disabling triggers in a transaction

2005-03-08 Thread Net Virtual Mailing Lists
It is the only known way to control triggers though it isn't regularly
tested by the developers.


I think I've come up with another way.. I posted this recently, but did
not get any feedback on it so I'm not sure how dumb it is...  It is
working really great for me though


All I did was added an extra column to my table (I called it
batch_process).  Then in
the trigger do something like (in whichever function you are calling):

IF NEW.batch_update IS NOT NULL AND NEW.batch_update = ''t'' THEN
  NEW.batch_process := NULL;
  RETURN NULL;
END IF;
.. whatever the rest of transaction is


Then when doing an insert, just:

INSERT INTO table (..., batch_process) VALUES (..., 't') when you want the
trigger not to fire...

Or an update:

UPDATE TABLE table SET , batch_process = 't' 


I'm not sure sure how to make it work on a function called from a delete
trigger though.. ;-(

- Greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


  1   2   >