Re: [HACKERS][OT] somebody could explain this?

2005-11-04 Thread Csaba Nagy
[snip]
 Floating points numbers are accurate but not precise.

OK, now this one beats me... what's the difference between accurate
and exact ? I thought both mean something like correct, but precise
refers to some action and accurate applies to a situation or
description...

I'm actually curios what it means. Merriam-Webster refers for both to
correct as a synonym.

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] somebody could explain this?

2005-11-04 Thread Csaba Nagy
Cristian,

I bet it's related to some rounding issue and the fact that floating
formats are approximative even for small integers.
Probably 12 ands up being slightly less in floating format (something
like 11.999...), and the cast to integer is truncating it.
Not 100% sure though... read up on your API, I'm not a C programmer :-)

HTH,
Csaba.


On Fri, 2005-11-04 at 17:16, Cristian Prieto wrote:
 Hello, I'm using PostgreSQL 8.0.4 in Fedora Core 3, right now I'm learning a
 little about the postgresql internals and the way some kind of SPs could be
 written in c language; I found something really weird and I cannot explain
 to me this behavior:
 
 #include postgres.h
 #include fmgr.h
 
 PG_FUNCTION_INFO_V1(test);
 
 Datum
 repeat_item(PG_FUNCTION_ARGS)
 {
   int num_times;
   
   num_times = PG_GETARG_FLOAT8(0) * 100;
   PG_RETURN_INT32(num_times);
 }
 
 
 Inside psql this happens:
 
 # Create or replace function test(float) returns integer as 'test.so'
 language 'c' stable;
 
 select test(0.1);
 Returns 10
 
 Select test(0.11);
 Returns 11
 
 Select test(0.12)
 Returns 11
 
 Select test(0.13)
 Returns 13
 
 Select test(0.14)
 Returns 14
 
 Select test(0.15)
 Returns 14
 
 
 What Is happening here?
 
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] parameterized limit statements

2005-11-08 Thread Csaba Nagy
On Mon, 2005-11-07 at 18:43, Tom Lane wrote:
[snip]
 If it doesn't have a value for the parameter, it'll assume 10% of table
 rows, which is what it's done for a long time if the LIMIT isn't
 reducible to a constant.

Is 10% a reasonable guess here ?

Here we use limit in combination with prepared statements to get
something like less than 1% of the table. There are no exceptions to
that in our code... even if the limit amount is a parameter.

Furthermore, the limit amount is always a small number, usually ~ 100,
but never more than 1000. So in my case, we could live with a suboptimal
plan when the percentage would be more than 10%, cause then the table
would be small enough not to matter that much. In turn it has a huge
impact to wrongly guess 10% for a huge table...

I think the best would be to guess 5% but maximum say 5000. That could
work well with both small and huge tables. Maybe those values could be
made configurable... just ideas, not like I could implement this...

[snip]

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] someone working to add merge?

2005-11-11 Thread Csaba Nagy
On Fri, 2005-11-11 at 18:15, Jaime Casanova wrote:
 On 11/11/05, Josh Berkus josh@agliodbs.com wrote:
  Jaime,
 
   so i suppose we can reuse many of the code breaking the merge in 3
   pieces... for now they are just thougths, i will think more in this
   and try to implement it...
  
   comments? ideas? suggestions?
 
  Funny, we were just discussing this at OpenDBCon.   Seems that you can't do 
  a
  full implementation of MERGE without Predicate Locking (the ability to say
  lock this table against inserts or updates of any row with key=5).
 
 it isn't what select for update does?

Select for update only works if the row is already there. If there's no
row, you can't lock it. So you want then to insert it, but then it is
possible that somebody inserted it before you, immediately after your
update... so the solution would be more like:

  - try insert;
  - if insert fails, do update;

You can already do that, but you have to place a save-point before the
insert, so you can continue your transaction even if the insert fails.
Without knowledge of postgres internals, the simplest would be to be
able to do the continue transaction if insert fails with the cheapest
prise to pay. This would mean wrap up existing code, except that
continue transaction after failure of insert part.

All this might be completely bull*it of course, I don't know too much
about postgres internals.

[snip]

Cheers,
Csaba.




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] someone working to add merge?

2005-11-11 Thread Csaba Nagy
OK, I'm relatively new on this list, and I might have missed a few
discussions on this topic.
I wonder if doing it this way would not be better than using a table
lock:

 - set a save point;
 - insert the row;
 - on error:
- roll back to the save point;
- update the row;
 - on success release the save point;

This would provide less contention while paying the prise for the save
point. In low contention scenarios the table lock would be better, and I
wonder for high contention scenarios which is better, the table lock, or
the save point version...

Of course the table lock version is the future if predicate locking is
going to be implemented later.

Cheers,
Csaba.


On Fri, 2005-11-11 at 18:37, Peter Eisentraut wrote:
 Josh Berkus wrote:
  Funny, we were just discussing this at OpenDBCon.   Seems that you
  can't do a full implementation of MERGE without Predicate Locking
  (the ability to say lock this table against inserts or updates of
  any row with key=5). However, Peter suggested that we could do a
  proof-of-concept implementation, working out syntax and trigger
  issues, based on a full table lock and do the hard work once it was
  proved to be feasable.
 
 Yes, I've started to work on this.  Realizing that the current way to 
 manually do an UPDATE-else-INSERT or DELETE-then-INSERT involves a 
 table lock anyway, a MERGE implementation using a table lock would at 
 least give some convenience benefit to users. (And possibly some 
 performance, too, if the decision logic is currently run in the 
 client.)
 
 A predicate locking implementation for MERGE might actually not be all 
 that complicated, because you only need to look on pk = constant, not 
 on arbitrary expressions.  Nevertheless, I think it's best to write the 
 MERGE command first and then optimize the locking.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] someone working to add merge?

2005-11-14 Thread Csaba Nagy
On Fri, 2005-11-11 at 20:22, Bruno Wolff III wrote:
 On Fri, Nov 11, 2005 at 18:48:33 +0100,
   Csaba Nagy [EMAIL PROTECTED] wrote:
  OK, I'm relatively new on this list, and I might have missed a few
  discussions on this topic.
  I wonder if doing it this way would not be better than using a table
  lock:
  
   - set a save point;
   - insert the row;
   - on error:
  - roll back to the save point;
  - update the row;
   - on success release the save point;
  
  This would provide less contention while paying the prise for the save
  point. In low contention scenarios the table lock would be better, and I
  wonder for high contention scenarios which is better, the table lock, or
  the save point version...
 
 You may not be able to update the row after the insert fails. If there is
 insert occurring in another transaction, the row may not be visible to
 the current transaction. In which case you can neither insert or update the
 row. You need to wait for the other transaction to commit or rollback.

Are you sure ? From what I understand, the insert will only fail when
the other transaction commits, and actively wait for the commit or
rollback. Look at this:


session_1= create table test (col smallint primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
test_pkey for table test
CREATE TABLE
session_1= begin;
BEGIN
cnagy= insert into test values (1);
INSERT 165068987 1

session_2= begin;
BEGIN
session_2= insert into test values (1);

[session_2 is now waiting]

session_1= commit;
COMMIT

[session_2 wakes up]

ERROR:  duplicate key violates unique constraint test_pkey


So it looks like predicate locking is already in place for primary key
conditions...

Cheers,
Csaba.







---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] someone working to add merge?

2005-11-15 Thread Csaba Nagy
Well, from my point of view it is a special case of predicate locking
supported well by existing code, in this case the unique index (you said
that, I'm not familiar with the code). I don't see why this cannot be
capitalized on, to implement a sub-set of what predicate locking is,
based on the mechanism already existing. I guess most of the users who
need some kind of merge, replace, insert-or-update feature (or other
reasons to block inserts/updates on a specific row) would be happy for
now with the restriction that the condition must be backed by a unique
index.

So basically the only thing I'm trying to say is that a partial
implementation which might be easily implementable (I might be wrong
here), without too big performance penalties and covers a very valid
problem is better than chasing the complete solution which is too
complex to be implemented easily or it could be easily implemented but
then has performance disadvantages. The only thing is to be labeled
correctly so people don't expect something it isn't.

Of course all this is hand-waving, I can't really help even if I wanted,
my C skills are less then mediocre.

Cheers,
Csaba.

On Tue, 2005-11-15 at 12:14, Alvaro Herrera wrote:
 Csaba Nagy wrote:
 
  session_1= create table test (col smallint primary key);
  NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
  test_pkey for table test
  CREATE TABLE
  session_1= begin;
  BEGIN
  cnagy= insert into test values (1);
  INSERT 165068987 1
  
  session_2= begin;
  BEGIN
  session_2= insert into test values (1);
  
  [session_2 is now waiting]
 
 This only happens because of the unique index.  There's no predicate
 locking involved.  The btree code goes some lengths to make this work;
 it would be probably simple to modify this to support MERGE or REPLACE
 on the limited cases where there's a UNIQUE index.  Tom has already said
 this twice (on this thread only; he has already said it before IIRC.)



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Csaba Nagy
Well, from my point of view it is more than delete and insert. That I
can do right now with existing infrastructure. The problem I try to
solve is something along: a bunch of clients try to update a count, and
ONE of them must initialize the count if it does not exist... this can't
be done with current infrastructure without race conditions.
Our current solution is to initialize all the possible counts
beforehand, but that suboptimal as only a few of them will actually have
data coming in later...
And of course the import problem... we can have multiple concurrent
imports, which must insert just once per some unique criteria, and
update if the record is already there, and all this in a batch. This is
also not possible without race conditions or aggressive locking.
So for me the atomic, consistent and without performance penalties
update_or_insert_based_on_unique_criteria does have a value, and
that's coming exactly from the locking of the unique index which
eliminates the race condition from this operation. I don't care about
syntax sugar, just about things I could do more efficiently if this
mechanism were in place...

Cheers,
Csaba.

On Wed, 2005-11-16 at 20:33, Martijn van Oosterhout wrote:
 On Wed, Nov 16, 2005 at 11:06:15AM -0800, Dann Corbit wrote:
   -Original Message-
   From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
   [EMAIL PROTECTED] On Behalf Of Simon Riggs
Conceptually, a MERGE statement is just a long string of INSERTs and
UPDATEs in the same transaction and I think we should treat it as
such.
  
  Merge could also be considered as a long string of deletes and inserts.
  I guess that deleting those records that already exist and then
  inserting all of the records is faster because it could be done like a
  single join to perform the delete and then a single batch insert.
 
 And for us it makes no difference because in MVCC, UPDATE == DELETE +
 INSERT. IMHO it's just a nice construct to specify UPDATEs and INSERTs
 in the same statement.
 
 Have a nice day,


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Csaba Nagy
Yes, these algorithms are clear to me, but they don't work for batch
updates in postgres without savepoints before each row insert/update,
which is not good for performance (not to mention on older postgres
versions without savepoint support it won't work at all). If there is a
way of no race condition, no performance penalty, that would be
something new and useful. I just guess the MERGE would provide that.

Cheers,
Csaba.

On Thu, 2005-11-17 at 12:34, Zeugswetter Andreas DCP SD wrote:
  The problem I try to solve is something along: a bunch of clients try
 to update a count, and ONE of them must initialize  the count if it
 does not exist... this can't be done with current infrastructure without
 race conditions.
 
 The solution without merge but a unique key in other db's is:
 
 update
 if no rows updated
   insert
   if duplicate key
 update
 if no rows updated goto insert
 
 note, that the counter updates need to be of the form set x = x + ?
 where key=y 
 do you see a potential race condition with this ?
 In pg you also need a savepoint before the insert for this to work.
 
 Depending on the ratio of insert vs update we also start with insert
 when 
 the insert succeeds more that 50% (I would use a higher percentage with
 pg though):
 
 insert
   if duplicate key
 update
 if no rows updated goto insert
 
 Andreas


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] MERGE vs REPLACE

2005-11-17 Thread Csaba Nagy
OK, in this case I don't care about either MERGE or REPLACE, but for an
UPSERT which does the locking :-)

Cheers,
Csaba.

On Thu, 2005-11-17 at 13:32, Martijn van Oosterhout wrote:
 On Thu, Nov 17, 2005 at 12:52:53PM +0100, Csaba Nagy wrote:
  Yes, these algorithms are clear to me, but they don't work for batch
  updates in postgres without savepoints before each row insert/update,
  which is not good for performance (not to mention on older postgres
  versions without savepoint support it won't work at all). If there is a
  way of no race condition, no performance penalty, that would be
  something new and useful. I just guess the MERGE would provide that.
 
 Well, then you guess wrong. This isn't what MERGE is for. MERGE is just
 a neat way of specifying the UPDATE and INSERT cases in the same
 statement. It doesn't remove the possibility duplicate inserts and thus
 primary key violations.
 
 If someone wants to make extensions to MERGE so that it can avoid the
 race condition and avoid the duplicate key violations, that's fine. But
 be aware that this is outside of the spec. It may be a useful addition,
 but perhaps we should consider MERGE and REPLACE as completely seperate
 targets.
 
 MERGE has a whole join construction with subqueries that would be a
 pain to make work in a way that is truly serialisable. REPLACE deals
 with only one row and tries to solve the race for that case only. Much
 easier to consider them seperately, no?
 
 I guess what's really irritating is that this clearly exposes the case
 listed in the docs as Why SERIALIZABLE isn't in all cases. If we
 could solve that for MERGE, we could probably solve it in the general
 case too.
 
 Have a nice day,


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS][OT] Doubt

2005-11-28 Thread Csaba Nagy
http://acronymfinder.com/

Cheers,
Csaba.

On Fri, 2005-11-25 at 19:24, Gustavo Tonini wrote:
 What is ISTM?
 
 Sorry,
 Gustavo.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [pgsql-www] [HACKERS] Upcoming PG re-releases

2005-12-01 Thread Csaba Nagy
Maybe mausoleum would be even better name :-D

Cheers,
Csaba.

On Thu, 2005-12-01 at 11:35, Euler Taveira de Oliveira wrote:
 --- Richard Huxton dev@archonet.com escreveu:
 
  If it's practical to keep them, I'd like to suggest doing so. If it's
  not practical, could we have a where_to_find_old_versions.txt file
  and 
  open a project on sourceforge to keep them?
  
 What about an museum.postgresql.org to keep the old releases?
 
 
 Euler Taveira de Oliveira
 euler[at]yahoo_com_br
 
 
   
 
 
 
   
   
 ___ 
 Yahoo! doce lar. Faça do Yahoo! sua homepage. 
 http://br.yahoo.com/homepageset.html
 
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] generalizing the planner knobs

2005-12-02 Thread Csaba Nagy
On Thu, 2005-12-01 at 22:01, Tom Lane wrote:
 Greg Stark [EMAIL PROTECTED] writes:
  On the other hand the type I would prefer to see are hints that feed 
  directly
  into filling in information the planner lacks. This only requires that the
  user understand his own data and still lets the planner pick the best plan
  based on the provided information.
 
 This would avoid some issues, but it still is vulnerable to the problem
 that the hint you put in your code today will fail to track changes in
 your data tomorrow.

Tom, I have to disagree here. At least in our application, we must
provide for an acceptable worst case scenario, and sometimes a slightly
wrong estimate can lead to a plan which is very fast 99% of the time but
completely wrong in 1% of the cases. Sometimes the percentage is 50/50.
I've had this situation with some limit plans where the planner had
chosen a wrong index. The problem there was that the planner had
estimated that the query will have 20 rows as a result, but it had less,
and resulted in the complete scan of the index... as opposed to a much
smaller scan that would have resulted by scanning the other index, as
that one would have provided an end condition orders of magnitudes
sooner. Now the statistics will always be only an estimation, and +/- a
few can really make a big difference in some situations. In this
particular situation the index choice of the planner would have been
faster for all cases where there were really 20 rows returned, but I
forced it to always choose the other plan (by adding the proper order
by) because I can't risk a bad result in any of the cases.
In this particular case I was able to force the planner choose a
specific plan, but that might not be always possible, so I guess it
really would make sense to be able to tell the planner how selective
some conditions are. And yes, sometimes I would like to freeze a
specific safe plan for a specific query, even if it is not optimal.

So for me the hint mechanism is good for telling the server that I'm
not interested at all in the BEST plan but which risks getting very bad
on occasions, but in a good enough plan which is safe.

And as for the selectivity changes over time, the hints will change
along. In most of the situations when selectivity change, the SQL has to
change too, sometimes even the complete workflow. I find that if changed
hints will help in some occasions then having them would mean less
maintenance than the code rewriting that would be otherwise involved...
and I'm completely sure the server can't compensate for the change of
the dynamics of the data all the time. And it definitely can't keep up
with highly dynamic data, where the statistics change constantly in big
tables... 

Our application for example has kind of batch processing, where we
insert smaller or larger batches of data in a HUGE table (~200 millions
of rows), and then that data is immediately used for different
operations and then reports, and furthermore it is heavily updated. I
can't think of any reasonable statistics target and ANALYZE strategy
which could satisfy both small batches and large batches without running
ANALYZE permanently with high statistics targets on the key fields...
and even that would not be specific enough when limit 20 is involved.
For queries involving this table I really would like to freeze plans, as
any misplanning has bad consequences.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] generalizing the planner knobs

2005-12-05 Thread Csaba Nagy
[snip]
 I want to be sure my existing queries keep using the plans they've been using
 until I allow them to change.
 
 I don't want to sit down and type select count(*) from users and have it not
 work correctly (ie, use a sequential scan) because the system is so single
 mindedly tuned for the OLTP application.
 

Now this is exactly what I've had in mind... it would be nice to
fixate a plan for some of the queries, and let the planner choose the
best for all the rest. I think some other data bases have something like
an optimizer plan stability feature, providing outlines of query
plan bundles. This is maybe too much, but specifying that for a certain
query I definitely want to use one index and not the other would be
nice...

On another note, it might be interesting to have some kind of prepare
analyze, where the planner is allowed to go and get some more detailed
estimation from the actual table data based on the hard-coded parameter
values, and produce some more detailed statistics for the parameterized
values so it can then produce hot-shot plans for the actual parameter
values on each execution... I wonder if this makes any sense. This way
we could have some very detailed statistics directly supporting the
queries we actually use. I would call this kind of prepare for the most
used/problematic queries from time to time, and the planner should
decide what statistics it needs to support it and go and get it...

Cheers,
Csaba.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: Concurrent CREATE INDEX, try 2 (was Re: [HACKERS] Reducing

2005-12-08 Thread Csaba Nagy
On Wed, 2005-12-07 at 19:36, Greg Stark wrote:
[snip]
 We periodically ran into problems with load spikes or other performance
 problems causing things to get very slow and stay slow for a while. Letting
 things settle out usually worked but occasionally we had to restart the whole
 system to clear out the queue of requests.

Just as a personal opinion: I would love a REINDEX which does not block
reads/writes, even if writes will be more expensive while it's running.
There's always a period of time I can schedule the REINDEX so there's
very low write activity, but it is impossible to find a time slot when
there's NO write activity... and I think most of the real world
applications are like this. I think it's very rare that an application
is constantly getting high load, but most of them are constantly getting
SOME important activity which makes downtime hard to schedule. Now if
the slowdown of writes is not more than the acceptable service level,
then it is a very workable solution to schedule the REINDEX on a not so
busy time slot.

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Reducing relation locking overhead

2005-12-08 Thread Csaba Nagy
On Thu, 2005-12-08 at 16:05, Tom Lane wrote:
[SNIP]
 There isn't any way for it to do that (ie, be sure everyone else has
 adjusted to the new state of affairs), short of acquiring some sort of
 short-term exclusive lock on the table, which is a really bad idea.
 The pending lock would block other incoming requests on the table until
 all the current users exited their transactions.
 

But it is an acceptable compromise to lock the table until all current
transactions are over... the alternative for reindexing a big table is
usually to schedule a down-time, which is even worse...

REINDEX is usually used to fix a big tables big index bloat, and that
won't fly without a downtime, or, with this short-term full table lock
in a low-traffic time-slot. 

For my usage patterns I would vote with the table lock if it is just a
means of blocking new transactions until the running ones finish. I'll
just make sure there are none long running when I issue the REINDEX...

Cheers,
Csaba.





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Tab completion of SET TRANSACTION ISOLATION

2006-01-31 Thread Csaba Nagy
Is there any chance for psql opening a new session if it's inside a
transaction and use that to do whatever querying is needed ? Just
something like the control connection on ftp (analogy not very good).
That could cause other surprises though (could fail for example due to
too many connections open), and I have no idea about psql internals so
it might be completely against it's philosophy...

Cheers,
Csaba.

On Tue, 2006-01-31 at 15:29, Tom Lane wrote:
 Peter Eisentraut [EMAIL PROTECTED] writes:
  Some time ago, the tab completion code for the SET command was changed
  to read the list of available settings from the pg_settings table.
  This means that by the time you're done completing SET TRANSACTION
  ISOLATION, you've already sent a query and the command will be
  disallowed.  It's not a major issue, but I figured I'd mention it
  since it confused me a while ago.  If someone has an ingenious plan
  for working around this, let me know.
 
 Hm, that's a bit nasty.
 
 The only plan I can think of involves reading the list of available
 variable names in advance and keeping it around.  However, I'm not
 sure I want psql issuing such a query at connection startup whether
 or not the info will ever be used :-(
 
 We also have the ability to check the current in-transaction status,
 so one possibility is to read the variable list only if not within
 a transaction (and we didn't do it already in the current session).
 Making the behavior of tab completion be state-dependent may seem
 like a non-starter, but really it is anyway --- anything involving
 a query will stop working in a failed transaction.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] streamlined standby process

2006-02-02 Thread Csaba Nagy
Hi all,

After spending some time (~2 days to fully(?) understand the process)
setting up a standby machine using WAL shipping and experimentations to
be fairly sure it is working reliably, I started thinking of how I would
like it to be done.

My dream-process of setting up a standby would be:

  - start up the postmaster on an empty directory (except maybe the
postgres.conf and pg_hba.conf to be used after switching to active
mode), with a build standby option;
  - the postmaster would start in a special standby mode, where it
connects to the primary server using a normal connection (which in turn
must be configured for e.g. in the command line, just as with psql);
  - the standby would issue a STANDBY command to the primary (possibly
implemented at low level, i.e. it doesn't have to be a SQL command but
could be at protocol level);
  - the primary would mark the start of the backup, just as today with
the 'pg_start_backup' function;
  - the primary would start to stream on the connection all the files of
the data base, interleaved with all the committed WAL records just as
they commit (of course buffered somehow so it won't affect too much
performance);
  - the standby would save the received files and store the WAL records
for later replay;
  - when all the db files were transmitted, the primary would mark the
end of the backup, just as today with 'pg_stop_backup', and signal this
to the standby;
  - the standby can now start replaying the WAL records;
  - when all the saved WAL records up to the 'pg_stop_backup' mark were
replayed, the standby is operational in the sense that it can be
switched on in normal operation mode;
  - in the meantime the server keeps sending the WAL records as they
become available, and the standby replays them as they arrive. This
would make sure the standby is always as up to date as possible;

  - bringing the standby up would be a matter of loosing the connection
to the master server, or some kind of local signalling;
  - the standby then would continue to work as a normal postmaster,
using the configuration file which must be existing in the standby
directory when starting the process;

This process would only require from the user to know where he wants the
standby and where's the master... no more scripting and worrying about
partial WAL files. And it could provide some degree of automatic
failover if relying on the connection failure is good enough sign of the
primary server being down, but of course this is not mandatory.

Is this scenario viable ? I realize it needs a lot of coding in the
postmaster, and it must get somehow all WAL records as they commit. I'm
mostly ignorant regarding the postgres code structure, so I have no idea
how much work that would involve... so can you gurus please comment ? 

I really have no experience with C, but I would start doing some coding
if it's deemed viable.

Thanks,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] Logging statements and parameter values

2006-02-06 Thread Csaba Nagy
Simon,

For me the usage pattern would be: log all params, bind time values, on
the same log line as log_min_duration entries. That's what I need to
know which are the non-performant queries, and it also helps on
occasions to identify application problems. 

In any case all your plans sound very good, can't wait to have it
working :-)

Thanks,
Csaba.

On Mon, 2006-02-06 at 12:43, Simon Riggs wrote:
 On Mon, 2006-01-30 at 17:19 -0500, Bruce Momjian wrote:
  Ted Powell wrote:
   On Mon, Jan 30, 2006 at 04:31:29PM -0500, Bruce Momjian wrote:

I assume it is this TODO:

* Allow protocol-level BIND parameter values to be logged


---

Ted Powell wrote:
 Our development group needs to have the option of logging all SQL
 statements including substituted parameter values. [...]
   
   That's it! (I should have thought to look in the TODO.)
   
   Has any design work been done on this?
  
  No.  I am with Simon Riggs today at my house and I asked him, hoping he
  can get it done for 8.2.  I don't think it is very hard.
 
 Some more detailed thoughts:
 
 1. Do we want to log parameters at Bind time or at Execution time? Bind
 is easier and more correct, but might look a little strange in the log
 since the parameters would be logged before the execution appears. IMHO
 Bind time is more correct. That would mean we have a separate line for
 logged parameters, e.g.
 parameters: p1=111 p2=hshssh p3=47000.5
 
 2. Should we save them until end of execution, so we can output them on
 the same line as log_min_duration_statement queries? Sounds easier but
 the meaning might be more confused.
 
 3. Do we want to log parameters that are used for planning, but no
 others? Sometimes yes, sometimes no, I think.
 
 Sounds like we need:
 - a log_parameters GUC with settings of: none, plan and all. 
 - output log messages at Bind time on a separate log line, which would
 replace the existing statement: [protocol] BIND message with
 (portalname) parameters: p1=111 p2=hshssh p3=47000.5
 - portalname would be blank if we aren't using named portals
 
 While we're discussing logging, I also want to be able to set
 log_min_duration_statement on a user by user basis (i,e, for individual
 applications). We set this to superuser-only for valid security reasons,
 but I'd like to have the ability for the superuser to relax that
 restriction for short periods, or even permanently on development
 servers. That sounds like another GUC: log_security = on
 which would enforce SUSET/USERSET control (and would need to be a SIGHUP
 parameter).
 
 Best Regards, Simon Riggs
 
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] streamlined standby procedure

2006-02-07 Thread Csaba Nagy
Hi all,

I decided to start implementing a streamlined WAL shipping based standby
building procedure. My aim is fairly simple: to be able to build a
standby as automated as possible. 

The ultimate simplicity would be for me: 
 - install postgres on the standby machine;
 - create a directory for the data base files, containing
postgresql.conf and pg_hba.conf, and a standby.conf file;
 - start up the postmaster with a --build-standby option;

All the rest should be done automatically by postgres.

The procedure should be something similar to the one available today if
you do it manually. The main difference would be that the standby
postmaster should connect to the primary server, and get all table data
and WAL record stream through normal data base connections...

To facilitate this process, I thought about why not expose the WAL files
through a system view ? Something along the lines of:

pg_wal (
  name text,
  walrecords blob,
  iscurrent boolean
)

Then anybody interested in the WAL record stream could easily find out
which is the current WAL record, and get any of the existing WAL records
by streaming the blob. Closed WAL files would be streamed completely,
and the current WAL file could be streamed in realtime as it is
created... this would facilitate an always as up to date as possible
standby, as it could get the WAL records in real time.

To make it possible to reliably get closed WAL records, a WAL
subscription system could be created, where a subscriber (the standby)
could signal which is the oldest WAL file it did not get yet. The
primary machine would keep all the WAL files extending back to the
oldest subscribed one. Then each time the subscriber finishes processing
a WAL file, it can signal it's interest in the next one. This could be
implemented by a table like:

pg_wal_subscription (
  subscriber text,
  name text
)

The subscribers would insert a record in this table, and update it to
the next WAL file after they processed one. The subscriber names should
be unique across subscribers, this should be managed by the admin who
sets up the subscribers. When the subscriber is not interested anymore,
it can delete it's subscription record. That could be done by the DBA
too if things go haywire...

To build a stand by based on log shipping it is necessary to get over
all the data base files too. That could be also done by exposing them
through some view, which in turn might take advantage of knowledge of
the table structure to compress the data to be transferred. The main
idea is to do all transfers through normal DB connections, so the only
configuration to be done is to point the standby to the master
machine...

So, all this said, I'm not too familiar with either C programming or the
postgres sources, but I'm willing to learn. And the project as a whole
seems a bit too much to do it in one piece, so my first aim is to expose
the WAL records in a system view.

I would really appreciate any comments you have...

Thanks,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] streamlined standby procedure

2006-02-07 Thread Csaba Nagy
You obviously did not read further down :-)

I was proposing a subscription system, where the slave can specify the
oldest WAL file it is interested in, and keep that up to date as it
processes them.

That could cause of course trouble if a slave dies and it won't update
the subscription, but that's not any different than the current setup if
the archive_command starts failing constantly because the archive site
is down. In either case human intervention is needed. The DB connection
based approach has the advantage that you don't have to restart the
server if your slave location changes, and you can have multiple slaves
at the same time if you like, e.g. if you want to smoothly move over the
slave to another machine.

Cheers,
Csaba.

On Tue, 2006-02-07 at 16:18, Tom Lane wrote:
 Csaba Nagy [EMAIL PROTECTED] writes:
  The procedure should be something similar to the one available today if
  you do it manually. The main difference would be that the standby
  postmaster should connect to the primary server, and get all table data
  and WAL record stream through normal data base connections...
 
 This is pie-in-the-sky really.  A design like that would mean that the
 master could *never* recycle WAL files, because it could never know when
 some slave would pop up demanding a copy of ancient history.
 
   regards, tom lane


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] streamlined standby procedure

2006-02-07 Thread Csaba Nagy
On Tue, 2006-02-07 at 16:45, Tom Lane wrote:
 Csaba Nagy [EMAIL PROTECTED] writes:
  You obviously did not read further down :-)
  I was proposing a subscription system, where the slave can specify the
  oldest WAL file it is interested in, and keep that up to date as it
  processes them.
 
 And how is that system view going to handle subscriptions?

Not THAT system view will handle the subscription... there would be
this view which exposes the WAL files, which would do exactly that,
expose the existing WAL files, and only those which exist. Of course it
must place some kind of lock on the WAL file it currently streams so it
is not recycled, but other than that this view should not be concerned
with subscription issues.

The subscription system would be a table in which you can insert
(subscriber_id, oldest_WAL_file_name_i'm_interested_in) tuples. When
recycling WAL files, this table will be consulted and only WAL files
older than the oldest entry in the subscription table are allowed to be
recycled.

Slaves will update their subscription line after processing each WAL
file, setting it to the next WAL file name they need. So the oldest WAL
to be kept will actually be in sync with what the slaves really need.

OK, now I start to see what you mean, i.e. if there's no subscription
then all WAL files are immediately recycled, and the view can only show
one entry, the current WAL. But actually that's OK, you still can see
what's the current WAL file, and can subscribe starting with it.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] streamlined standby procedure

2006-02-07 Thread Csaba Nagy
On Tue, 2006-02-07 at 16:58, Andrew Rawnsley wrote:
 IMHO the #1 priority in the current PITR/WAL shipping system is to make the
 standby able to tolerate being shut down and restarted, i.e. actually having
 a true standby mode and not the current method of doing it only on startup.

This fits nicely in what I would like to achieve, and it might be
actually a better start.

 While it is a trivial thing to fool postgres into staying in startup/restore
 mode with a restore_command that blocks until more files are available, if
 the machine needs to be shut down for whatever reason you have to go back to
 the last image and replay to the present, which isn't always convenient. Nor
 are you able to shut down the standby, copy it to a second instance to use
 for testing/development/whatever, and restart the standby.

Why would you shut down the standby to copy it ? It would by nicer to be
able build a standby of the standby ;-)
Even nicer would be to have a stand-by which allows read only access,
but I guess that's a tough call.

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] sql row constructor...works!

2006-02-08 Thread Csaba Nagy
Well, I've tested it a bit:

db=# select version();
  version

 PostgreSQL 8.1.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.5
(Debian 1:3.3.5-13)
(1 row)
 
db=# select (1,3)  (2,3);
 ?column?
--
 f
(1 row)
 
db=# select (3,3)  (2,3);
 ?column?
--
 f
(1 row)
 
db=# select (3,4)  (2,3);
 ?column?
--
 t
(1 row)
 

It seems to me that (x,y)  (a,b) means (x  a AND y  b) ... which is
not exactly what you wanted... or I'm missing something ?

Cheers,
Csaba.


On Wed, 2006-02-08 at 03:35, Merlin Moncure wrote:
  On Feb 8, 2006, at 11:17 , Merlin Moncure wrote:
 
   The proper SQL construct without row constructor is:
  
   select * from t where
 a = a1 and
 (a  a1 or b= b1) and
 (a  a1 or b  b1 or c  c1)
   order by a,b,c limit 1^ no offset necessary
  
   confused yet?
 
  This is interesting! Could you also provide the equivalent *with* a
  row constructor? (or did I miss that somewhere?)
 
 select * from t where (a,b,c)  (a1, b1, c1) order by a,b,c limit 1;
 [plus full usage of key on t(a,b,c)]
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] streamlined standby procedure

2006-02-08 Thread Csaba Nagy
 What do you find difficult about the current method? That's got to be
 the first discussion point.

The main problem I have is the complexity of setup.

It involves a lot of additional scripting which you have to get it right
to be actually reliable. The documentation is giving a rough idea on how
to do it, but it is quite some work to make it work, and you can't
really tell that is reliable...

Another issue is that unless you got the archive_command right in the
master server from the beginning, you will have to restart the server
once you decide to build your standby... the archive_command is a
start-up time parameter. This could be of course alleviated by always
using a stub script as archive command, and let it do nothing if you
don't have a standby, and then modify it to start archiving to the right
place once you start building one, or if you want to move it to another
machine. But this is also not documented, and you have to figure it out
for yourself.

And responding to Tom's other post regarding a real standby mode, where
you could stop the standby and then later resume it still in standby
mode: I would actually have a good use for it tonight :-)
We will migrate our application to a new version, which involves some
changes in the data base. Now it would be nice to stop the standby
BEFORE doing these changes, and if the migration fails for some reason,
start up the standby and use it with our old application version. But if
the migration succeeds, I want to start up the standby still as standby,
and make it resume standby operation... rebuilding it will take half day
at least.
So a standby which can be isolated for a while would actually be useful.
OK, now that I'm thinking a bit more about this, I could achieve this by
fiddling with the restore_command so it stops delivering the logs for a
while. But again it is not straightforward.

The whole point of this is that starting up a standby should be as
simple as pointing the standby machine to the primary server, without
shell scripting gimmicks (which are OS specific and therefore hard to
document in a generic way), without the need of fiddling with the
primary's configuration (see archive command), without the need to
restart the primary if the archive command was not right in the first
place. And to make it easy to start up one more standby if needed, or
isolate it for a while when doing some risky work on the primary.

It's about user friendliness and flexibility. It's not that it can't do
the work right now, but it's really hard to do it...

Cheers,
Csaba.




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_hba.conf alternative

2006-02-08 Thread Csaba Nagy
I think this was discussed many times on this list, and the main
conclusion was: if you don't trust your DB machine's admin, any security
measure against him will be only illusory. The sysadmin can in any case
access the data, you can just make this harder, you can't prevent that.

So you better get admins who you trust...

On Wed, 2006-02-08 at 13:34, Q Beukes wrote:
 Hello,
 
 Is there not some other alternative to pg_hba.conf?
 
 I have the problem where the system administrators at our company
 obviously have access to the whole filesystem, and our database records
 needs to be hidden even from them.
 
 With pg_hba.conf that is not possible, as they just change all the conf
 lines to trust auth and viola they have access to the database without
 passwords.
 
 Is there a more secure alternative to this? The perfect scenario being
 to deny everyone include root access to a database without a password.
 
 regards,
 Quintin Beukes
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] streamlined standby procedure

2006-02-09 Thread Csaba Nagy
OK, this is news to me, I recall that last looking at the configuration
docs it was start-up time, but I might be wrong.

[looking up the docs]

OK, citing the 8.1 online docs:

17.5.3. Archiving

archive_command (string)

The shell command to execute to archive a completed segment of
the WAL file series. If this is an empty string (the default),
WAL archiving is disabled. Any %p in the string is replaced by
the absolute path of the file to archive, and any %f is replaced
by the file name only. Use %% to embed an actual % character in
the command. For more information see Section 23.3.1. This
option can only be set at server start or in the postgresql.conf
   ^^^
file. 

It is important for the command to return a zero exit status if
and only if it succeeds. Examples:

archive_command = 'cp %p /mnt/server/archivedir/%f'
archive_command = 'copy %p /mnt/server/archivedir/%f'  # Windows

It's at least confusing... it does say or in the postgresql.conf file too, 
but I must have overlooked that... and the only word is really confusing 
there.

[looking at: http://www.powerpostgresql.com/Downloads/annotated_conf_80.html]

OK, this is what confused me. The annotated conf file states it's a startup 
time parameter.

Well, good to know it's not...

Actually, my needs of PITR/standby building are mostly solved by now, but it's 
sure not an easy ride, and I really wonder if there is any readily available 
script bundle to do it for a windows server...

Maybe a standby-building-tutorial is all what is needed...

Cheers,
Csaba.


 Much of your difficulty seems to come from your thinking that this
 parameter requires a restart. It doesn't - check it out.
 
 The script need not be complex, you only need to put a wait loop in the
 restore script so that it waits for the next log file.
 
 Best Regards, Simon Riggs
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] fsutil ideas

2006-02-24 Thread Csaba Nagy
On Fri, 2006-02-24 at 19:12, Rod Taylor wrote:
 On Fri, 2006-02-24 at 12:48 -0500, Tom Lane wrote:
  Rod Taylor [EMAIL PROTECTED] writes:
   I watch for table bloat but I haven't figured out a nice way of tracking
   down the postgresql process with the oldest transaction running short of
   patching PostgreSQL to report the XID for a connection in
   pg_stat_activity.

But I'm afraid that a long running transaction with many short queries
will not even show up in pg_stat_activity. So that's not a completely
reliable way of catching long running transactions... but it's true that
most of the time a long running query is the problem, and that is
catchable.

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Automatic free space map filling

2006-03-02 Thread Csaba Nagy
 What bothers me about the TODO item is that if we have to sequentially
 scan indexes, are we really gaining much by not having to sequentially
 scan the heap?  If the heap is large enough to gain from a bitmap, the
 index is going to be large too.  Is disabling per-index cleanout for
 expression indexes the answer?

I guess you're saying that full index scan should only be done when the
index is a functional one, and use index lookup for safe indexes ? That
would be a huge win for most of my vacuum-problematic tables, as I don't
have any functional indexes. But I guess full index scan would still be
faster if the percentage of pages changed is more than some threshold.
On the other hand it would allow very frequent vacuuming even for huge
tables so that situation should not occur. Autovacuum thresholds could
be lowered drastically in that case...

 Anyway, as I remember, if you have a 20gig table, a vacuum / sequential
 scan is painful, but if we have to sequential scan the all indexes, that
 is probably just as painful.  If we can't make headway there and we
 can't cleanout indexes without an sequential index scan, I think we
 should just remove the TODO item and give up on improving vacuum
 performance.

From my POV, there must be a way to speed up vacuums on huge tables and
small percentage of to-be-vacuumed tuples... a 200 million rows table
with frequent updates of the _same_ record is causing me some pain right
now. I would like to have that table vacuumed as often as possible, but
right now it only works to do it once per week due to load problems on
long-running transactions preventing vacuuming other tables.

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Automatic free space map filling

2006-03-03 Thread Csaba Nagy
 Are you running 8.1?  If so, you can use autovacuum and set per table 
 thresholds (read vacuum aggressivly) and per table cost delay settings 
 so that the performance impact is minimal.  If you have tried 8.1 
 autovacuum and found it unhelpful, I would be curious to find out why.

Yes, I'm running 8.1, and I've set up per table auto-vacuum settings :-)
And I lowered the general thresholds too. Generally autovacuum is very
useful from my POV, and in particular the per table settings are so.

But the problem I have is not the performance impact of the vacuum
itself, but the impact of the long running transaction of vacuuming big
tables. I do have big tables which are frequently updated and small
tables which are basically queue tables, so each inserted row will be
updated a few times and then deleted. Those queue tables tend to get
huge unvacuumable dead space during any long running transaction, and
vacuum on the big tables is such a long running transaction. And I have
a few of them, and one is in particular very busy (a task table, all
activities go through that one).

Now when the queue tables get 1000 times dead space compared to their
normal size, I get performance problems. So tweaking vacuum cost delay
doesn't buy me anything, as not vacuum per se is the performance
problem, it's long run time for big tables is.

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] PostgreSQL Anniversary Summit, Call for Contributions

2006-03-03 Thread Csaba Nagy
 Wow, this is a great deal of burden that for sure I didn't have to do
 last time :-(  Not sure why, maybe the laws changed or something.  It is
 crystal clear that I have to do it this time however.

I think you're overreacting guys... I would first try and go to the
nearest Canadian embassy and try to get the visa. I bet in most of the
cases they will just issue it without any invitation letter and the
like... if not, only then worry about it ;-)

I'm also citizen from one of the countries (Romania) which require visas
to most of the world (or it required, the situation's relaxing in this
respect), and I never had any problems getting one. Or maybe it changed
after 9/11 ?

Cheers,
Csaba.




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Automatic free space map filling

2006-03-03 Thread Csaba Nagy
 Ewe.  How expensive is scanning an index compared to the heap?  Does
 anyone have figure on that in terms of I/O and time?

See this post for an example:
http://archives.postgresql.org/pgsql-performance/2006-02/msg00416.php

For my 200 million table, scanning the pk index took ~ 4 hours. And then
there are some more indexes...

So if the index has to be scanned completely, that's still too much.

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Automatic free space map filling

2006-03-03 Thread Csaba Nagy
  I got the impression that Csaba is looking more for multiple 
  simultaneous vacuum more than the partial vacuum.
 
 So he rather needs Hannu Krosing's patch for simultaneous vacuum ...

Well, I guess that would be a good solution to the queue table
problem. The problem is that I can't deploy that patch on our production
systems without being fairly sure it won't corrupt any data... and I
can't rely on non-production testing either. Basically I'm waiting to
see Tom saying it will fly :-)

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Where Can I Find The Code Segment For WAL Control?

2006-03-10 Thread Csaba Nagy
Charlie,

I'm currently working on a similar solution (it's true I'm only in the
thinking phase). I don't have too much time to spend on it (~few hours
per week, on the train during commuting), so it's not progressing too
fast...

Nevertheless, I would like to do a first proof-of-concept version in a
contrib module with minimal coding involved. I don't have time right now
to explain all the details how I would like to do it, but in essence I
want to:

 - create a few functions which expose the WAL files and the data base
files as streams;

 - use the archive_command framework to control when the WAL files can
be recycled, in combination with a WAL subscription system to make sure
the standby won't be missing WAL files (with safeguards so the primary
system will not go out of WAL file system space if the standby can't
keep up with the WAL stream);

 - create a standby manager program which only needs to know how to
access the primary server in order to create the standby (by connecting
to it through normal data base connections and using the above mentioned
functions to stream the files);

These are just the main ideas, the devil is in the details.

As per answering your question, a possible starting point would be to
take a look at:

src/backend/postmaster/pg_arch.c

This one deals with archiving WAL files, so it could give you some
insight in how you can deal with the WAL files.

Or just search the source tree for 'WAL' ... I guess there's quite a few
places in the code where WAL is involved.

Next week I'll post more detailed design (some charts I did to better
understand the idea, and some more text to add details), if anybody is
interested in co-working on it, otherwise I'll just keep working in my
own pace until it gets done...

Cheers,
Csaba.


On Fri, 2006-03-10 at 07:34, 王宝兵 wrote:
  
 I am now trying to develop the PG to support real-time backup.My
 architecture is somehow similar to the Database Mirroring technology of
 SQL Server 2005.The server end of the system is consisted of two DB servers
 one is the Principal server,the other is the Mirror server.whenever the
 Principal flushes its log buffer to the local log file,it must send the
 content of its buffer to the Mirror simultaneously.After the mirror receives
 the buffer,it write to its own log file and send a response message to the
 Principal,then the Mirror redo/undo its log.By this way, we can guarantee
 the database instances of the two servers identical. 
 
 But now I encounter a problem.I don't know where the functions to control
 log buffer are.Which code segment may I refer to? 
 
 I have just participated the PG project for a short time,and I will
 appreciate your help very much! 
 
 Look forward to you all! 
 
  
 
 Charlie Wang 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Where Can I Find The Code Segment For WAL Control?

2006-03-13 Thread Csaba Nagy
[Please use reply to all so the list is CC-d]

Charlie,

I guess what you're after is to make sure the WAL buffers are shipped to
the stand-by at the same time as they are committed to disk. In any
other case your desire to have the stand-by EXACTLY in sync with the
primary server will not gonna work.

But that would mean that the communication to the stand-by will become a
performance bottleneck, as all transactions are only finished after the
WAL records for them are synced to the disk. So if you want your
stand-by completely in sync with your primary, you will want that the
transactions finish only after their WAL records are pushed to the
stand-by too... and then if the communication to the stand-by fails, all
your transactions will wait after it, possibly causing the primary to
stop working properly. So now you have another point of failure, and
instead of making the setup safer, you make it unsafer. What I want to
say is that it is likely not feasible to keep the stand-by completely in
sync.

In practice it is enough to keep the standby NEARLY in sync with the
primary server. That means you will ship the WAL records asynchronously,
i.e. after they are written to the disk, and in a separate thread.

What I'm after is to have a thread which starts streaming the current
WAL file, and keeps streaming it as it grows. I'm not completely sure
how I'll implement that, but I guess it will need to do a loop and
transfer whatever records are available, and then sleep a few seconds if
it reaches the end. It must be prepared to stumble upon partially
written WAL records, and sleep on those too. On the stand-by end, the
current partial WAL will not be used unless the stand-by is fired up...

So I'm after a solution which makes sure the stand-by is as up to date
as possible, with a few seconds allowed gap in normal operation, and
possibly more if the communication channel has bandwidth problems and
the server is very busy. Usually if the server crashes, than there are
worse problems than the few seconds/minutes worth of lost transactions.
To name one, if the server crashes you will have for sure at least a few
minutes of downtime. At least for our application, downtime in a busy
period is actually worse than the lost data (that we can recover from
other logs)...

Cheers,
Csaba.

On Sun, 2006-03-12 at 02:50, 王宝兵 wrote:
  
 Csaba: 
 
  
 
 Firstly I must thank you for your help.Some of our designs are identical
 except the following: 
 
  
 
 - create a standby manager program which only needs to know how to 
 
 Access the primary server in order to create the standby (by connecting 
 
 To it through normal data base connections and using the above mentioned 
 
 Functions to stream the files); 
 
  
 
 In my opinion,if we create a standby manager program and run it as a daemon
 process,it will check the state of the WAL files of the Principal every few
 seconds.But there is a risk for data lost.For an instance,if the Principal
 has flushed its log buffer to the disk and the dirty data are also flushed
 immediately,but the standby manager program is running in its interval.Then
 the Principal fails.In this situation,the Principal has updated its database
 but the log segment hasn't been sent to the Mirror,because the time point
 for the standby manager program to check the WAL files hasn't come.And then
 these data are lost. 
 
  
 
 I think this situation will happen very probably in a big cooperation and it
 s very serious. 
 
  
 
 Perhaps I have misunderstood your opinion.If that,I apologize. 
 
  
 
 Charlie Wang 
 
  
 
 
 
 
 
 
 
 
 
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Automatically setting work_mem

2006-03-17 Thread Csaba Nagy
 My own thoughts about the problems with our work_mem arrangement are
 that the real problem is the rule that we can allocate work_mem per sort
 or hash operation; this makes the actual total memory use per backend
 pretty unpredictable for nontrivial queries.  I don't know how to fix
 this though.  The planner needs to know the work_mem that will be used
 for any one of these operations in order to estimate costs, so simply
 trying to divide up work_mem among the operations of a completed plan
 tree is not going to improve matters.

I know this is not right to the point related to what is discussed in
this thread, and that it would need some serious work, but how about a
mechanism to allow plans some flexibility at run-time ? What I mean is
not to do all the decisions at plan time, but include some branches in
the plan, and execute one branch or the other depending on actual
parameter values, current statistics, current memory available, ...
(name here other run-time resources).

This would make a lot more feasible to long-term cache query plans. For
e.g. you wouldn't have to worry too much about changing statistics if at
runtime you can check them again... and you could put decision points
based on current memory resources. Of course it still must be a balance
between the number of the decision points (which ultimately means the
size of the plan) and robustness against changing conditions, i.e.
branches should only go in for conditions likely to change.

Is this completely not feasible with current postgres architecture ? I
have no idea how the planning/runtime works internally.

It worths a look at how apache Derby does with query planning, where a
planned query is actually a compiled Java class, i.e. the executable
byte code which will run to fetch the results, created and compiled by
the planner... interesting approach, allows for lots of flexibility at
run-time, but probably won't work with C :-)

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 8.2 planning features

2006-03-27 Thread Csaba Nagy
 - Postgres intrinsic log-shipping replication (we have one to contribute)

Are you saying you have a working WAL-shipping based portable (means
working well on all platforms) replication already done ? Cause I was
looking into implementing just this one :-)

Do you have some details how it works ?

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Summary of coverity bugs

2006-04-25 Thread Csaba Nagy
 I wanted to make a link to the cvsweb interface but my XSLT skills
 wern't up to that apparently.

The following XSL fragment will do the trick for the cvsweb link:

  trth align=leftFile/thtd
  xsl:element name=a
  xsl:attribute
name=hrefhttp://developer.postgresql.org/cvsweb.cgi/xsl:value-of
select=file//xsl:attribute
  xsl:value-of select=file/
  /xsl:element
  /td/tr

Cheers,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] error-free disabling of individual child partition

2006-05-23 Thread Csaba Nagy

 ALTER TABLE childN ALTER INHERITS DROP (parent);
 ALTER TABLE childN ALTER INHERITS ADD (parent);

Wouldn't it be possible to allow the ADD/DROP to happen in the same
statement, like:

ALTER TABLE childN ALTER INHERITS DROP crt_parent ADD new_parent;

or:

ALTER TABLE childN ALTER INHERITS DROP crt_parent, ALTER INHERITS ADD
new_parent;

That would still make it one statement, but more explicit. And it would
eliminate the need for parenthesis (I assume they were needed for
supplying more than 1 table to inherit/disinherit).

Cheers,
Csaba.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-22 Thread Csaba Nagy
  [...]
  There has to be a more linear way of handling this scenario.
 
 So vacuum the table often.

Good advice, except if the table is huge :-)

Here we have for example some tables which are frequently updated but
contain 100 million rows. Vacuuming that takes hours. And the dead row
candidates are the ones which are updated again and again and looked up
frequently...

A good solution would be a new type of vacuum which does not need to do
a full table scan but can clean the pending dead rows without that... I
guess then I could vacuum really frequently those tables.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Csaba Nagy
  Good advice, except if the table is huge :-)
 
 ... Then the table shouldn't be designed to be huge.  That represents
 a design error.
[snip]
 This demonstrates that archival material and active data should be
 kept separately.
 
 They have different access patterns; kludging them into the same table
 turns out badly.

Well, then please help me find a better design cause I can't see one...
what we have here is a big membership table of email lists. When
there's a sendout then the memberships of the affected group are heavily
read/updated, otherwise they are idle. None of the memberships is
archive data, they are all active data... the only problem is that they
are so many. Is it so hard to believe that 100 million rows is all
active data, but only used in bursts once per week (that's an example,
some groups are more active, others less) ?

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Csaba Nagy
   I suppose you have a table memberships (user_id, group_id) or something 
  
 like it ; it should have as few columns as possible ; then try regularly  
 clustering on group_id (maybe once a week) so that all the records for a  
 particular group are close together. Getting the members of a group to  
 send them an email should be faster (less random seeks).

It is like this, and some more bookkeeping data which must be there...
we could split the table for smaller records or for updatable/stable
fields, but at the end of the day it doesn't make much sense, usually
all the data is needed and I wonder if more big/shallow tables instead
of one big/wider makes sense...

Regularly clustering is out of question as it would render the system
unusable for hours. There's no 0 activity hour we could use for such
stuff. There's always something happening, only the overall load is
smaller at night...

Thanks,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Csaba Nagy
 Let me ask a question, you have this hundred million row table. OK, how
 much of that table is read/write? Would it be posible to divide the
 table into two (or more) tables where one is basically static, only
 infrequent inserts and deletes, and the other is highly updated?

Well, all of it is read write... some of the data might be updated less
frequently, but there's no way I would know which part of the data is
that. Logically is just the same type of data... so unless I find a way
to continuously move back and forth the data between an archive table
and the live table, based on how active the groups are, I can't imagine
any other way of partitioning it. And that would also mean some quite
big load given the pretty high dynamics of the groups.

 The big thing in performance is the amount of disk I/O, if you have a
 smaller active table with only a single index, then you may be able to cut
 your disk I/O time really down. The smaller the row size, the more rows
 fit into a block. The fewer blocks the less dissk I/O. The less disk I/O
 the bbetter the performance.

I agree, but it is quite hard to achieve that when the data set is both
big AND the partitioning criteria is highly dynamic. Not to mention that
deleting from that table is also a PITA performance-wise, so I wonder
how well the continuous back and forth between the active and inactive
table would do.

 Also, and anyone listening correct me if I'm wrong, you NEED to vacuum
 frequently because the indexes grow and vacuuming them doesnt remove
 everything, sometimes a REINDEX or a drop/recreate is the only way to get
 performance back. So if you wait too long between vacuums, your indexes
 grow  and spread across more disk blocks than they should and thus use
 more disk I/O to search and/or shared memory to cache.

This is nice in theory, but kills performance. I vacuum the big tables
only overnight, otherwise the server is sluggish.

Cheers,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Csaba Nagy
 You said the activity comes in bursts per group, so the obvious
 partitioning would be per group.
 If you have too many groups to have one partition per group you could
 try to find some modulo or other rule to spread them into separate
 partitions.

This could be a solution... but then I'm not sure how well would do
queries which need the first 10 records based on some criteria which
does not include the group id. I guess limit queries across the union of
the partitions don't work too well for now, and we do have such queries.
I'm pretty sure we could work this out, but it would need some big
refactoring of our current code which is not that simple... and it must
work well with oracle too. We do have systems on Oracle too.

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Csaba Nagy
 Well, the only thing left is to cluster the database. There are a couple
 ways to do this, one switch to a platform that supports clustering or
 create an API to wrap multiple databases. If your queries are simple and
 limited, you could create an HTTP/XML service that wraps a number of
 postgresql databases, issues a query across all databases, merges multiple
 query sets, and returns one homoginous stream.
 
Our queries are not simple nor limited :-)

We have a big variety of users, with a big variety of needs and talented
user story writers who have imaginative minds... so the same code must
work in quite a few ways, and some of the resulting queries are
dynamically created. It's a tough job to optimize all the queries we
have.

 Inserts would be handled by hash to machine weighted by number of records
 on each machine.
 
 Updates and deletes would have two keys, machine and ID.
 
Such a setup might work for us but I fear it would be a major PITA to
make it reliable and to maintain it. It's not like I can say let's
allot a month of work for trying out a clustering solution, but I'm not
sure if it will work fine at the end. We still have enough features to
develop, the DB is something to solve part of the problem, not to keep
us busy... the Oracle systems were there first, the application works
more or less fine on them (with occasional need to optimize here and
there). Supporting Postgres was a side-project to see if it works, and
it works decently, so we deployed some systems on it. Both of the DBs
have their quirks, and I can voice here the ones I don't like in
Postgres... and then some developer might do something about it or not,
and I find that OK. If my mind wouldn't refuse so categorically to learn
C style programming (tried that and gave up), I would probably scratch
my itches. I did it for smaller scaled ones, like truncating
timestamp(0) instead of rounding so that it is consistent with what
Oracle does, but that was just a one file modification... I simply don't
find it fun to browse C code, compared to how easy is to understand Java
code which I work with here. So unless somebody ports Postgres to Java,
I'll further need to voice my itches here in the hope that they'll be
solved by others... sorry for the long rant.

 It sounds like you have a big problem and you need a big solution.

Well, Postgres does a decent job as it is. The problem is under peek
load, sometimes it gets bogged down and the usual things like vacuum
will not help immediately. I think a few more features like the dead
space map for quick vacuum and even something like the original post's
proposition would make postgres fly under heavy load too...

Cheers,
Csaba.





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Csaba Nagy
  These sort of solutions, IMHO, don't show how good PostgreSQL is, but show
  where it is very lacking.
 
 We all know Postgres is lacking; some of us try to improve it (some with
 more success than others).  People who know the current limitations but
 like the capabilities, try to find workarounds to the problems. What
 surprises me is that, if you have such a low opinion of Postgres, you
 still use it.

Alvaro, I understand your reaction, you're on the developer side... but
please try to understand us mortals who can't write good C code too:
some of us like what postgres already offers, but we would also like to
not need those workarounds. And the only thing we can do is make big
noise so somebody from the other side (sour side) will notice it and at
one point do something about it... the noise here by no means means we
have a low opinion about postgres. On the contrary, we appreciate enough
postgres and it's developers to ask for what we would like to see, and
we are sure there are some gifted developers out there who can program
those features we scream about...

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Csaba Nagy
First of all, I base my assumptions on what I recall to have read on
this list, as I didn't try yet partitioning using inheritance. It's not
trivial to set up and I didn't have the time to play with it yet. So I
wouldn't know for sure that it won't work fine with our application, and
that will only change when I'll get a few days to experiment. The
experimentation will include the migration of existing data to the
partitioned schema, which will be probably the most difficult part of it
due to the size of the tables which need partitioning...

 You would query the parent (no union). Do you need order by's ?
 Without order by it is currently no problem.

It's clear to me that partitioning by inheritance is transparent to the
application, what worries me is that our application likely has a few
queries which will be equivalent to a union when planning, and I fear
bad performance there.

An I need order by on all queries with limit. The few exceptions where I
wouldn't need order by are when I want to delete/update chunk-wise, but
that's not supported right now... another feature I made noise about ;-)

[snip]

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Csaba Nagy
   Usually it gets really bad if you *don't* run vacuum continuously, maybe
   hopeing to do it in slower times at night. For high-update db you have
   to run it continuously, maybe having some 5-15 sec pauses between runs.
  
  And how much I/O does this take?
 
 Surprisingly its mostly WAL traffic, the heap/index pages themselves are
 often not yet synced to disk by time of vacuum, so no additional traffic
 there. If you had made 5 updates per page and then vacuum it, then you
 make effectively 1 extra WAL write meaning 20% increase in WAL traffic. 

Is this also holding about read traffic ? I thought vacuum will make a
full table scan... for big tables a full table scan is always badly
influencing the performance of the box. If the full table scan would be
avoided, then I wouldn't mind running vacuum in a loop... 

In fact I think that it would make sense to replace the whole current
vacuum stuff with a background thread which does that continuously using
a dead space map. That could be a heap sorted by tuple deletion time,
and always cleaned up up to the oldest running transaction's start
time... there would be no need for any other autovacuum then.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] vacuum row?

2006-06-26 Thread Csaba Nagy
On Sun, 2006-06-25 at 05:29, Alvaro Herrera wrote:
 Mark Woodward wrote:
  I originally suggested a methodology for preserving MVCC and everyone is
  confusing it as update in place, this isnot what I intended.
 
 It doesn't make sense, but maybe vacuuming a page would.  Naturally, it
 would need to wholly scan all the indexes to clean'em up, so it's
 probably not a good idea in general.

But a version of vacuum which does normal index scans when vacuuming
just a small percent of a huge table would make sense wouldn't it ? So
you don't need to make full scans of the vacuumed indexes but look up
the entries based on the vacuumed key.

There were discussions about this I think, and the objection was that it
might be that an index scan might miss index entries, in particular when
badly behaved user defined functions are involved.

Cheers,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Table clustering idea

2006-06-27 Thread Csaba Nagy
 I think one of the issues might have been: how will you handle other
 indexes on the table when you can no longer point them at an item (since
 items will need to move to maintain an IOT).

I guess you shouldn't allow any other indexes. That's a perfectly
acceptable compromise I think... it would be still very useful for big
and narrow tables which would benefit from being clustered.

The other concern is how would you do sequential scans on the table if
items are allowed to move ? I think some other DBs have a facility to
make a fast index scan which is essentially a sequential scan of the
index file, something like that would be needed here too.

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Three weeks left until feature freeze

2006-07-11 Thread Csaba Nagy
  Does our CVS setup cater for seggregated rights like this? Or 
  would that 
  be done on a trust basis?
 
 No, I don't believe you can do this with CVS at all. We'd need something
 like SVN/WebDAV to be able to grant write access just to specific parts
 of the tree to different people.

It is possible using CVS, by carefully managing file system permissions
and assigning different permissions to the OS users of the different
committers. I guess it's also possible using commit scripts... but I
don't think it worths the effort as long as there is a regular backup of
the CVS tree...

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Three weeks left until feature freeze

2006-07-13 Thread Csaba Nagy
On Thu, 2006-07-13 at 15:29, Stephen Frost wrote:
 It's not the PostgreSQL project's problem, that's true, but it certainly
 becomes an issue for distributions.  Java as a PL ends up being a pretty
 odd case..  If there isn't anything in the PL code itself which forces a
 dependency beyond gcj then it might be possible to distribute it.  Also
 allowing the PL to use a different JVM shouldn't be a problem so long as
 nothing is distributed which depends on the alternate JVM.  The GPL is
 all about distribution and so I'm not sure that it would actually be a
 problem for an end-user to use Sun's JVM with GPL'd Java code.

Now I'm completely confused... what GPL code ? Is PL/Java licensed under
the GPL ? Or what GPL code do you talk about ?

The PL/Java code is likely only dependent on the JVM specification,
which does not put any restriction on how you must license your code, so
PL/Java can be licensed in any way the author wants, including BSD.

The distribution part is also no problem as I see it, as only the build
tools are not BSD, and they are available for free (including the Sun
JDK) and they don't restrict what should be the license of the code you
compile. This can only be a problem for purists like GPL zealots or
perhaps debian, otherwise is not that hard to download and install the
SUN JDK on a build machine... you don't need to distribute the JDK, only
the runtime JVM, which you actually can do (including again the Sun
runtime). So I can't see problems again from the packager point of
view... except purists might put a separate pl/Java module in some
non-free repository given the dependency on some non-free runtime...

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Three weeks left until feature freeze

2006-07-13 Thread Csaba Nagy
On Thu, 2006-07-13 at 17:03, Tom Lane wrote:
 [...] I don't know what
 other people who do core development feel about that --- but I dislike
 the idea that when someone changes such an API, the buildfarm will go
 all red because there's only one person with the ability to fix PL/Java.

But the alternative is that nothing is going red, and the PL stays
broken until somebody notices it which might be too late to easily know
which change broke it.

Wouldn't it be possible to separate the red/green lights for the core
and for the PLs ? So the core stays green and the PLs go red... and stay
red until the PL maintainer fixes things. 

And I don't believe there's only one man who knows good Java around...
once PL/Java gets in the core I'm pretty sure there will be a lot of
people using it and caring about it.

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Progress bar updates

2006-07-20 Thread Csaba Nagy
 We already have EXPLAIN ANALYZE. Perhaps the right way to do this is
 something that provides similar output. I could see something that
 looks like EXPLAIN for the parts that have not yet executed, something
 reasonable to show progress of the currently active part of the plan
 (current time, rows, loops), and EXPLAIN ANALYZE output for the parts
 which have been completed.

Now this is something that would really help testing a system, by
dynamically seeing the plans of queries which run too long. That
combined with the ability to see the values of bind parameters would be
a useful debug aid.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Transaction Speed and real time database

2006-07-21 Thread Csaba Nagy
 [snip] Suppose that every
 body say me that POStgres is to slow for real time databases, then I will be
 very full trying to resolve this problems with postgres, don't think that?

I think you didn't understand correctly: postgres is not slow, it is
just not suitable for real RT applications because of a few reasons,
which in fact make other data bases also not suitable for this purpose.

The main concern is that a RT application usually needs predictable
response times, possibly with guaranties for upper bounds of response
times... and most data bases which are transactional and offer
concurrent access won't give you such guaranties, due to locking issues.

The question is, your application is really RT in the proper sense of
the word, or it is just an OLTP application which needs to be fast but
won't cause a nuclear explosion if one response in 100 will be slower
than expected... in that case postgres might be good for you.

Cheers,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Transaction Speed and real time database

2006-07-24 Thread Csaba Nagy
[please use reply to all, otherwise you'll have what you just had: the
guy who you write goes home for the weekend and all the rest of the
people on the list who would answer you won't know there is soemthing to
answer...]

On Fri, 2006-07-21 at 13:39, moises wrote:
 Sorry if I can't explain me clear.
 I want to use an extension of postgres, (because I'm very clear that
 postgres is not a real time database.)
 I want to modify some modules of postgres for this convenience, for example,
 add some time constrains to transactions, I know that is a very hard work, 
 Our Applications are 100% real time works, controlling industrial process,
 and not OLAP or OLTP applications.
 
 My question is how many fast are postgres in transaction actually? Because
 maybe we don't need to work, in looking for more speed, just constrains and
 planners.

I have to admit RT is for me just a vague idea I still remember from
some courses, but I'm not sure if RT and transactional is compatible at
all... do you really need a transactional data base for RT applications
? Cause postgres is just that, an excellent transactional DB, which BTW
is fast enough for transactional work (where you expect that sometimes
operations fail due to others working in parallel). I'm not sure in
industrial RT applications you could afford failures due to
concurrency... and in that case you would be better off using something
non-transactional - but this is just my limited understanding of the
problem. Others on this list might know better...

Cheers,
Csaba.




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Transaction Speed and real time database

2006-07-24 Thread Csaba Nagy
[snip]
 OTOH, one has to be very careful to not mix terms here. In industrial 
 (production floor) applications, the term 'real time database' refers to 
 soemthing completely different than a relational, transactional DB.

But relational and transactional are orthogonal, they don't
imply/require each other... most of the roadblocks you mentioned
(including vacuum) is part of postgres transactional design and a
non-transactional DB won't have that overhead. Your input enforces my
thinking that the transactionality of the DB is the real roadblock...
which means postgres will never really be an RT application in the
proper sense of the word.

 Because of the features of a full-fledged relational database engine, 
 engineers often wish they had one of those instead ;-). Usually, we solve 
 this with some sort of streaming 'frontend', which buffers the data flow 
 (usually to disk) until it's inserted into the database. This lowers the 
 real-time requirement to 'has to be fast enough on average'. We have several 
 of these types of applications in production at various customers, some for 
 6+ years continuously (using PostgreSQL 7.0!).

This sounds the most reasonable approach :-)

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Better name/syntax for online index creation

2006-07-25 Thread Csaba Nagy
 Semantically, NOLOCK (or whatever word) seems most closely associated with 
 CREATE INDEX, so having it as third word would make sense, and it would be 
 quite easy to parse for psql.
 
 Strictly speaking, however, it would have to be NOLOCKLY in that case. :-)

In this case CREATE INDEX CONCURRENTLY ... sounds better to me, although
the whole feature sounds nice any way you will finally call it ;-)

Cheers,
Csaba.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Forcing current WAL file to be archived

2006-07-25 Thread Csaba Nagy
 OK, offset added to TODO item.  What would the offset give us?

The last offset could be remembered by the external program, and it only
has to transfer from the last offset to the new one. It allows
incremental streaming of the WAL files... of course the external program
will be a lot more complex than the current shell scripts which can be
used for WAL archiving...

The problems I see with this is if in this case the normal postgres WAL
archiving won't conflict with this streaming ? And if yes, wouldn't it
be better to have a separate mechanism for the stream based archiving ?
I mean what will happen if postgres successfully archives the WAL file
and then moves it out of way before the streaming process finishes with
it, the streaming process will have a problem... 

A few months ago I spent some time thinking about a solution where a WAL
based standby could be built using only normal data base connections to
the master server, and one of the ideas was to create a WAL subscription
mechanism where the standby subscribes for getting WAL files, and
updates it's subscription status with the last processed WAL file after
each processed file. The master can then recycle the WAL files only
after they were confirmed by all current subscriptions... and to avoid
excessive WAL file bloat if a slave goes offline, the subscription could
be canceled automatically if it gets too much behind.

If this mechanism is in place, it would be also nice if the slave could
ask for the WAL records to be streamed on a normal data base connection.
The function which would do it could be smart enough to stream the
current WAL file too up to the current offset and then wait for new
records. The slave would invoke the function for each WAL file it needs
to transfer, and then when finished it would update it's subscription
status and continue with the next one. The streaming function should not
update the subscription status as this way the slave can ask for the
file again if something goes wrong with the transfer.

The third thing needed to create a facility for one-connection-standby
building is to be able to stream the OS files of the DB through a DB
connection - I guess that can be done with a relatively simple C
function...

With all these things in place, a program could be written which would
run on the standby machine and completely automatically set up the
standby, only needing a simple connection string to the master...

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Forcing current WAL file to be archived

2006-07-25 Thread Csaba Nagy
  The problems I see with this is if in this case the normal postgres
  WAL
  archiving won't conflict with this streaming ? 
 
 You are not forced to use it if your shell scripts do conflict.
 
 What I envisioned, was that the current WAL archiving shell script would
 just do some CRC check over the WAL's already shipped, or as we
 currently use rsync to do the actual shipping this is what happens
 automatically.

Hmm, that sounds pretty smart... the archive process rsyncing over the
file which was previously streamed... I guess this will mean very little
overhead (in fact it only means the WAL archive to be read once more
than absolutely necessary, and the CPU power to compute the CRCs).

  And if yes, wouldn't it
  be better to have a separate mechanism for the stream based
  archiving ?
 
 why separate ? I'm a great believer in doing the minimum useful change,
 at least in systems used in production. We already have a working
 solution for full file shipping, so why not just augment it with
 streaming the currently-written-to file.

That's good so, I also have a working script, so I'm also not very
motivated to do anything more complicated... but 6 months ago I would
have been really glad to have a stand-alone program which I could
install along postgres on the slave, point it to the master, and get a
working WAL shipping based stand-by. Instead I spent a few days setting
up our standby scripts and testing it under load... and never being
certain it really works and it won't break exactly when I need it
most...

  I mean what will happen if postgres successfully archives the WAL file
  and then moves it out of way before the streaming process finishes
  with
  it, the streaming process will have a problem... 
 
 This should not happen. your streaming process should be smart enought
 to guarantee that.

OK, true, the streaming script should always stream only the current
file. If the last offset was from a previous WAL, it can be safely reset
to 0, and stream the new WAL from the beginning. So the streaming script
needs to remember the last WAL and offset, not just the offset.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PATCHES] Resurrecting per-page cleaner for btree

2006-07-26 Thread Csaba Nagy
 [snip] (In fact, it's
 trivial to see how user-defined functions that are mislabeled immutable
 could make this fail.)  So retail vacuum without any cross-check that
 you got all the index tuples is a scary proposition IMHO.

Wouldn't work to restrict that kind of vacuum to only tables which have
no indexes using user defined functions ? That would mean a very small
restriction I guess, probably 99.9% of the indexes won't use user
defined functions...

I actually wonder if such a vacuum would be useful for my scenario,
where I have some pretty big tables, and update a relatively small
percentage of it. Would it be faster to run such a vacuum against the
current one ?
One example would be a ~100 million table where I have 1-4 million
updates per day. Could I run vacuum multiple times a day for this table
and expect that individual runs are relatively fast ?

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] GUC with units, details

2006-07-27 Thread Csaba Nagy
[snip]
 Forcing people to use a specific casing scheme is just going to lead  
 to confusion and user frustration. If there's not a very solid

I guess nobody will force people to use the units at all. 

 *functional* argument for it, we shouldn't do it. Wanting to enforce  
 a convention that people rarely use isn't a good reason.

But if you implement a new feature, history shows that it will stay like
that forever. So if in 5 years everybody will use the ISO stuff, and
postgres will want to do the same, then the users you don't want to
confuse now will be forced to change their config files or be completely
confused. Or it will be as with everything else, an early arbitrary
decision sets everything in stone.

And I do find confusing all these ambiguous meanings of K,G etc., and I
think ISO is the right way to clear out the confusion at the cost of
some inconvenience until the users get used to it. For postgres that
would mean no user resistance anyway, as the possibility of specifying
the unit is new, so who knows about it must have read the docs first,
and the docs must specify the units you can use.

Just my 2c.

Cheers,
Csaba.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] GUC with units, details

2006-07-27 Thread Csaba Nagy
On Thu, 2006-07-27 at 17:57, Peter Eisentraut wrote:
 Florian G. Pflug wrote:
  Rounding up would have the advantage that you could just specify 0
  in the config file, and have postgres use the smallest value
  possible.
 
 In most algebras, dividing zero by something is still zero, so there'd 
 be no need to round anything.

I think he was refering to silently apply the minimum allowed if the
value is less than that... a lot of the settings have a minimum
allowable value. The question is if this can qualify as rounding :-)

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [CORE] Attack against postgresql.org ...

2006-07-28 Thread Csaba Nagy
On Fri, 2006-07-28 at 17:37, Tom Lane wrote:
 Marc G. Fournier [EMAIL PROTECTED] writes:
  The attacking IP, from the logs, appears to be 87.230.6.96 ...
 
 Perhaps a complaint to their ISP is in order --- RIPE suggests
 [EMAIL PROTECTED]

That looks 1 level too high, the immediate source seems to be
http://www.ehost.pl/onas.php

They could probably act faster and more at the source... down on the
page from the link above you can find [EMAIL PROTECTED] for complaints.

Cheers,
Csaba.


$ nslookup 87.230.6.96
Server: 192.168.1.4
Address:192.168.1.4#53
 
Non-authoritative answer:
96.6.230.87.in-addr.arpaname = vpsdws.xip.pl.
 
Authoritative answers can be found from:
6.230.87.in-addr.arpa   nameserver = dns.hosteurope.de.
6.230.87.in-addr.arpa   nameserver = dns2.hosteurope.de.
dns.hosteurope.de   internet address = 80.237.128.156
dns2.hosteurope.de  internet address = 80.237.129.61



$ whois xip.pl
[Querying whois.dns.pl]
[whois.dns.pl]
% This is the NASK WHOIS Server.
% This server provides information only for PL domains.
% For more info please see http://www.dns.pl/english/whois.html
 
Domain object:
domain:   xip.pl
registrant's handle: dinz5du40 (CORPORATE)
nservers: ns1.ehost.pl.[80.237.184.22]
  ns2.ehost.pl.[83.149.119.142]
created:2003.10.06
last modified:  2005.09.19
registrar: Dinfo Systemy Internetowe
ul. Mostowa 5
43-300 Bielsko-Biala
Polska/Poland
+48.33 8225471
[EMAIL PROTECTED]
 
option: the domain name has not option
 
Subscribers Contact object:
company:  eHost s.c.
organization: eHost.pl
street:   Cichockiego 13/6
city: 24-100 Pulawy
location: PL
handle: dinz5du40
phone:  +48.50253
last modified: 2004.11.03
registrar: Dinfo Systemy Internetowe
ul. Mostowa 5
43-300 Bielsko-Biala
Polska/Poland
+48.33 8225471
[EMAIL PROTECTED]
 
Technical Contact:
company:  eHost s.c.
organization: eHost.pl
street:   Cichockiego 13/6
city: 24-100 Pulawy
location: PL
handle: dinz5du40
phone:  +48.50253
last modified: 2004.11.03
registrar: Dinfo Systemy Internetowe
ul. Mostowa 5
43-300 Bielsko-Biala
Polska/Poland
+48.33 8225471
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Connection limit and Superuser

2006-07-31 Thread Csaba Nagy
On Mon, 2006-07-31 at 15:00, Martijn van Oosterhout wrote:
 On Mon, Jul 31, 2006 at 08:47:38AM -0400, Rod Taylor wrote:
  It appears that the superuser does not have connection limit
  enforcement. I think this should be changed.
 
 So if some admin process goes awry and uses up all the connection
 slots, how does the admin get in to see what's happening? If there's a
 limit you're not really superuser, are you?

I thought there is a limit for super-users too... citation from:
http://www.postgresql.org/docs/8.1/static/runtime-config-connection.html#RUNTIME-CONFIG-CONNECTION-SETTINGS

max_connections (integer)

Determines the maximum number of concurrent connections to the
database server. The default is typically 100, but may be less
if your kernel settings will not support it (as determined
during initdb). This parameter can only be set at server start. 

Increasing this parameter may cause PostgreSQL to request more
System V shared memory or semaphores than your operating
system's default configuration allows. See Section 16.4.1 for
information on how to adjust those parameters, if necessary. 


superuser_reserved_connections (integer)

Determines the number of connection slots that are reserved
for connections by PostgreSQL superusers. At most
max_connections connections can ever be active simultaneously.
Whenever the number of active concurrent connections is at least
max_connections minus superuser_reserved_connections, new
connections will be accepted only for superusers. 

The default value is 2. The value must be less than the value of
max_connections. This parameter can only be set at server start.


Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Connection limit and Superuser

2006-07-31 Thread Csaba Nagy
Nevermind, I realized now that you're talking about a different setting.

 I thought there is a limit for super-users too... citation from:
 http://www.postgresql.org/docs/8.1/static/runtime-config-connection.html#RUNTIME-CONFIG-CONNECTION-SETTINGS

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] pg_terminate_backend

2006-08-03 Thread Csaba Nagy
 What I'm looking for is some concentrated testing.  The fact that some
 people once in a while SIGTERM a backend doesn't give me any confidence
 in it.

Now wait a minute, is there some risk of lockup if I kill a backend ?
Cause I do that relatively often (say 20 times a day, when some web
users time out but their query keeps running). Should I rather not do it
?

Thanks,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] pg_terminate_backend

2006-08-03 Thread Csaba Nagy
You didn't answer the original question: is killing SIGTERM a backend
known/suspected to be dangerous ? And if yes, what's the risk (pointers
to discussions would be nice too).

 statement_timeout is your friend.

I know, but unfortunately I can't use it. I did try to use
statement_timeout and it worked out quite bad (due to our usage
scenario).

Some of the web requests which time out on the web should still go
through... and we have activities which should not observe statement
timeout at all, i.e. they must finish however long that takes.

I know it would be possible to use a different user with it's own
statement timeout for those requests, but that means we have to rewrite
a lot of code which is not possible immediately, and our admins would
resist to add even more configuration (additional users=additional
connection pool+caches and all to be configured). We also can fix the
queries so no timeout happens in the first place, but that will take us
even more time.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] pg_terminate_backend

2006-08-03 Thread Csaba Nagy
On Thu, 2006-08-03 at 18:10, Csaba Nagy wrote:
 You didn't answer the original question: is killing SIGTERM a backend
  ^^^
Nevermind, I don't do that. I do 'kill backend_pid' without specifying
the signal, and I'm sufficiently unfamiliar with the unix signal names
to have confused them. Is a plain kill still dangerous ?

Thanks,
Csaba.




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] pg_terminate_backend

2006-08-03 Thread Csaba Nagy
 Stuck?  You have not shown us a case where SIGTERM rather than SIGINT
 is necessary or appropriate.  It seems to me the above is assuming the
 existence of unknown backend bugs, exactly the same thing you think
 I shouldn't be assuming ...

I do know a case where a plain kill will seem to be stucked: on vacuum
of a big table. I guess when it starts an index's cleanup scan it will
insist to finish it before stopping. I'm not sure if that's the cause,
but I have seen delays of 30 minutes for killing a vacuum... it's true
that finally it always did die... but it's also true that I have 'kill
-9'-ed it before because I thought it's stucked.

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] pg_terminate_backend

2006-08-03 Thread Csaba Nagy
 man kill says the default is SIGTERM.

OK, so that means I do use it... is it known to be dangerous ? I thought
till now that it is safe to use. What about select pg_cancel_backend()
?

Thanks,
Csaba.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] PostgreSQL performance enhancement when query

2006-08-08 Thread Csaba Nagy
Constantin,

What binding are you using ? We here use Java+JDBC, and we were able to
get stable query plans by forcing server side prepared statements (using
PGStatement#setPrepareThreshold with 1 as the threshold), where the
query is prepared without knowing the parameter values. This can
backfire too, but for our purposes it was the right thing (probably
sacrificing some performance, but getting a stable system). The plans in
this case are made to work with guessed mean values for the estimates,
and that's usually resulting in a stable plan, so once you got it right
it will stay like that.

Cheers,
Csaba.

On Mon, 2006-08-07 at 22:02, Constantin Teodorescu wrote:
 Hello all, hope you are remembering me, some years ago I've designed the 
 PgAccess , the Tcl/Tk visual interface to PostgreSQL.
 
 Thought you haven't received any news from me, I continued working with 
 PostgreSQL, being involved in very big projects in Romania.
 Right now, the national identification of the cows, sheep, goats and 
 pigs in Romania runs on PostgreSQL on a very big database.
 Once again , I had to thank you all for keeping up maintaining and 
 improving PostgreSQL.
 
 My message to all of you is related to this big project (a government 
 sustained project) and some performance issues.
 
 Very few words about the database: approx. 60 tables, 30 of them 
 containing 10 millions to 50 millions records , the whole database is 
 approx 40 Gb size !
 
 In order to get a good performance, the database is operated on a dual 
 XEON with 6 Gb RAM IBM x235 server, the database with tables and indexes 
 carefully distributed on 6 different SCSI disks, in different 
 tablespaces in such a manner to allow parallelizing reads and HDD head 
 movements on different devices when joining those big tables.
 
 We have tuned every possible parameter in config file, we have 
 reorganized queries, analyzing explains in order to get the best results 
 for all big queries and we succeeded most of the time.
 But we have encountered some problems. Due to constant updates and 
 inserts into the database, it's size is growing continuously.
 Of course we are doing DAILY the needed maintaince, vacuums, analyzes 
 and backups.
 Due to permanent changes in database size and statistics there are 
 queries that sometimes change their execution plan, badly choosing 
 another plan and executing those queries in 2,3 minutes instead of 10 
 seconds, the usual execution time since the query plan is switched. We 
 have done any effort in changing subselects and the query sentence in 
 order to force using some indexes, continuously watching the explain 
 results.
 
 We have faced yesterday with such a problem with a query that switched 
 the query plan to a very bad one, almost putting the whole system down.
 The only way that we have succeeded to make it work again was by using 
 the SET ENABLE_MERGE_JOIN to OFF.
 For the moment it works but in our opinion this is NOT the best approach 
 to guide the planner to a better query-plan variant.
 
 Our suggestion would be : extending the EXPLAIN and SELECT commands like 
 that:
 
 EXPLAIN VARIANTS SELECT .. (and so on) that will display the 
 different query plans analyzed by the planner and their estimated time 
 values , not just the best guess .
 
 assuming that the EXPLAIN VARIANTS will show 3 or 4 different query 
 plans, the database manager will be able to experiment, to test, and to 
 decide by himself what is THE BEST PLAN FOR ME, instead of letting 
 postgresql planner to to that. Doing this, we would be able to clearly 
 specify then in the SELECT statement the version of the query-plan 
 that would be used in execution like in the following example:
 
 SELECT  (very big and complex query) ... USING PLAN 3;
 
 Specifying the desired plan could be of course, different.
 I realise that it would be probably better that the query-plan will 
 guess the right and optimal plan. I agree that this can be done be 
 tweaking parameters and costs BUT THIS CAN TAKE A LOT OF TIME, much more 
 than a couple of tests on the real database. An experimented database 
 admin can detect much easier the appropriate plan and force the 
 executor to select that one that he desires.
 
 In our opinion, this would be the simplest and the most non-intrusive 
 method of manual choosing another query plan rather than indirectly 
 setting ON or OFFS various parameters that could affect badly other 
 queries.
 First of all, it's assumed that the query planner HAS ALREADY evaluated 
 different variants and it decides to use one based upon the statistics 
 informations of the involved tables and costs for various types of 
 access.
 Unfortunately, due to a very difficult adjustment of those costs and 
 timings of the HDD performance, IO transfer speeds, PostgreSQL is 
 choosing sometimes a wrong plan.
 If we would have the power of choosing and experimenting different plans 
 with SELECT  USING PLAN that-one we can select than the 

Re: [HACKERS] PostgreSQL performance enhancement when query

2006-08-08 Thread Csaba Nagy
On Tue, 2006-08-08 at 12:36, Constantin Teodorescu wrote:
 We have tried PGStatement#setPrepareThreshold with 1 as the threshold 
 but it's not a good solution.
 Actually is worst. Considering that you have 5 different query plans, 
 you are selecting approx. random one of them, not taking into account 
 the statistics.

Wrong, you'll select _the same_ plan, that's what matters. If it's not
the plan you wanted, you have to rewrite the query, and try again, but
once you got the plan you wanted, it's pretty much you'll get always the
same plan. So you only need to test as long as you get the right query
to trigger the right plan... but of course this requires that your
queries are so constructed to always be OK with that plan, regardless
the parameter values. Usually this means a suboptimal plan, but stable
execution times.

If you need to give hints to the DB based on the parameter values and
choose different plans for different parameter values, then you
basically do the job of the planner in your application, and I guess
sooner or later you'll make wrong choices too.

Some hinting mechanism would be good for cases where the developer
really know better how the data is laid out (e.g. forcing the use of a
specific access method for one table in a complex join), but that
forcing a complete plan is probably not good. Even the hinting is only a
workaround for the planner fixes which will cannot make it to the stable
version...

On the daydreaming part, how about a 2 phase planner ? 

Modus operandi:

Phase 1: compile and cache plan decision tree:
  - collect all reasonable plans without taking into account the
parameter values;
  - check the parameter bounds where each plan is the fastest;
  - compile a decision tree which based on the parameter values chooses
one plan or the other;
  - cache this plan decision tree;
  - there's no need to cache plans which will always loose to some other
plan no matter what parameter values you give (to limit the size of the
decision tree);

Phase 2: run the decision tree to chose the best cached plan for the
parameter values;

You could use variables coming from the statistics system in the
decision tree so it doesn't have to be recalculated too often on
statistics changes.

With a system like this, you could at system startup make the decision
tree for all your frequently used queries and have fast planning at
runtime which is optimized for the parameter values (takes the decision
tree from the cache, runs it with the current parameters). Or just store
the whole thing in a system table... or tweak the decision tree
manually...

This is actually not addressing the plan stability issue, but if manual
tweaking would be allowed, it would...

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] An Idea for planner hints

2006-08-09 Thread Csaba Nagy
On Tue, 2006-08-08 at 22:14, Tom Lane wrote:
 So some kind of override for statistical guesses doesn't seem completely
 silly to me.  But it needs to be declarative information that's stored
 somewhere out of view of the actual SQL queries.  IMHO anyway.

The real problem is that sometimes there's no way to get a better plan
without some code change in the planner. And given the postgres release
policy, that might be as far as 1 year away for a normal user... of
course it's open source, you can patch, but would I trust a patch which
is not tested by the community ? So mostly I can't wait for code
changes, and then a generic tool to fix _now_ the one bad query which
brings my system down would be nice. This is why hints would be nice, to
quick-fix immediate problems. Of course they can and would be abused, as
anything else.

On the planner improvements part, would it be possible to save
statistics about join criteria between tables ? I'm not sure where that
would belong, but I guess it would be possible to have a special kind of
ANALYZE which analyzes multiple tables and their correlations... this
way the user would not need to hard-code the statistics hints, but the
system could generate them.

Cheers,
Csaba.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] V3 protocol vs INSERT/UPDATE RETURNING

2006-08-11 Thread Csaba Nagy
 1. Define it as a feature not a bug.  People do occasionally ask for
 UPDATE foo ... LIMIT 1 after all.  But this is a pretty klugy way of
 getting that, and the arguments that say allowing LIMIT on updating
 queries would be a bad idea haven't lost their force.

Being one of those who was asking for an UPDATE/DELETE with limit, I
would be very glad if this would be implemented... it would be a big
help for batch-processing data in OLTP environment (no long running
queries allowed). I still don't see why would nondeterminism be
generally a bad thing when there are applications which don't care about
that...

Cheers,
Csaba.




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Prepared statements considered harmful

2006-08-31 Thread Csaba Nagy
 - Redefine prepared to mean parsed rather than parsed and planned.

How about prepared means really prepared... in the sense of parsed,
analyzed all sensible plans, and save a meta-plan which based on current
statistics and parameter values chooses one of the considered (and
cached) plans ?

That would be immune both to statistics changes and parameter value
changes in certain limits. It would be also a lot more complex too than
a simple plan...

Cheers,
Csaba.





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Prepared statements considered harmful

2006-08-31 Thread Csaba Nagy
On Thu, 2006-08-31 at 14:32, Peter Eisentraut wrote:
 Am Donnerstag, 31. August 2006 14:11 schrieb Csaba Nagy:
  How about prepared means really prepared... in the sense of parsed,
  analyzed all sensible plans, and save a meta-plan which based on current
  statistics and parameter values chooses one of the considered (and
  cached) plans ?
 
 I don't think this could solve one particularly frequent problem which is 
 that 
 pattern matching queries don't get along with prepared plans if the search 
 pattern isn't known at planning time.

Why not ? I specifically said you would prepare a few sensible plans
based on statistics/expected variations of the statistics, and parameter
value ranges which would trigger different plans. 

So for the like query case you could save 2 plans, one for the indexable
case, one for the not indexable case. Then at runtime you choose the
proper one based on the pattern value. The meta-plan I mentioned would
be a collection of plans with rules to choose the right one at run time
based on parameter values and perhaps the current statistics.

This of course would need a lot more preparation time than just prepare
one plan, but that's why you want to do it upfront and then cache the
results. A central plan repository mentioned in other posts would fit
nicely here... and you could use prepared plans for non-parameterized
queries too by simply considering the constants as parameters, to
increase the chances for a prepared plan reuse - this of course for
complex enough queries.

Cheers,
Csaba.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Prepared statements considered harmful

2006-08-31 Thread Csaba Nagy
On Thu, 2006-08-31 at 15:19, Peter Eisentraut wrote:
 OK, why don't you work out an example.  Let's look at this query:
 
 SELECT * FROM t1 WHERE a LIKE $1;
 
 What two plans would you prepare?

if substring($1 from 1 for 1) != '%' then
  use plan 1 (see below);
else
  use plan 2 (see below);
end if;

Save both plans from below with the meta-plan from above, and call it a
prepared plan.

cnagy=# create table t1 (a text);
CREATE TABLE
cnagy=# insert into t1 select round(1000 * random()) from
generate_series(1,1);
INSERT 0 1
cnagy=# create index idx_t1_a on t1 (a);
CREATE INDEX
cnagy=# analyze verbose t1;
INFO:  analyzing public.t1
INFO:  t1: scanned 55 of 55 pages, containing 1 live rows and 0
dead rows; 3000 rows in sample, 1 estimated total rows
ANALYZE
cnagy=# explain select a from t1 where a like '121%';
   QUERY PLAN

 Bitmap Heap Scan on t1  (cost=2.06..27.63 rows=10 width=10)
   Filter: (a ~~ '121%'::text)
   -  Bitmap Index Scan on idx_t1_a  (cost=0.00..2.06 rows=10 width=0)
 Index Cond: ((a = '121'::text) AND (a  '122'::text))
(4 rows)
 
cnagy=# explain select a from t1 where a like '%121';
  QUERY PLAN
--
 Seq Scan on t1  (cost=0.00..180.00 rows=80 width=10)
   Filter: (a ~~ '%121'::text)
(2 rows)


Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Prepared statements considered harmful

2006-08-31 Thread Csaba Nagy
On Thu, 2006-08-31 at 15:49, Peter Eisentraut wrote:
 Note that plan 1 can only be created if you know the actual value for $1.

Why would that be so ? The plan can contain functions of $1 (both
constants in plan 1 are a function of $1).

Cheers,
Csaba



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Prepared statements considered harmful

2006-08-31 Thread Csaba Nagy
 Neither of these problems have anything to do with statistics getting
 stale.

... and the second one would benefit from a meta-plan facility which
puts some meta-plan nodes on top of specific plans to dispatch based
on parameter values at runtime.

Incidentally, the dispatch could check the statistics assumptions too.
If you don't need to do the planning for each execution, you could
afford to check the assumptions for each execution instead...

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] FE/BE protocol vs. parameterized queries

2006-09-07 Thread Csaba Nagy
 Although I don't have a clear opinion myself, I sometimes read on this list 
 that people are using prepared statements to get safe, stable plans, i.e. 
 plans that don't depend on the specific parameter input.

I definitely want the possibility of getting stable plans. That's only
possible if the planner does NOT take into account any parameter values.
If the statistics get quicker out of date than it's practical to run
analyze, but the plans would stay stable, it's better not to have
parameter values taken into account.
 
 If you change that, I don't think they will be happy at all. I suggest 
 leaving it as-is for 8.2. I think the user (i.e. driver) should be able to 
 tell the backend, if they want planning for the first bind, or right at 
 prepare.

That would be nice. We would probably use all 3 forms:
  - unnamed statement: prepare based on constant parameters;
  - named statement: prepare based on the first set of parameter values;
  - named statement: prepare generic plan without considering parameter
values;

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Opinion wanted on UUID/GUID datatype output formats.

2006-09-14 Thread Csaba Nagy
 select format_uuid(mypk,'format2') from tbluuid;
 and then get: 6b13c5a1-afb4-dcf5-ce8f-8b4656b6c93c

How about instead of fixed formats, you allow a format string using the
diverse parts of the GUID a la time formatting functions ? Then
everybody can format it as they want.

Just an idea.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] [HACKERS] Incrementally Updated Backup

2006-09-21 Thread Csaba Nagy
 True, but running several dozen instances on a single machine will
 require a lot more memory (or, conversely, each individual database gets
 a lot less memory to use).
 
 Of course, this is all hand-waving right now... it'd be interesting to
 see which approach was actually better.

I'm running 4 WAL logging standby clusters on a single machine. While
the load on the master servers occasionally goes up to 60, the load on
the standby machine have never climbed above 5.

Of course when the master servers are all loaded, the standby gets
behind with the recovery... but eventually it gets up to date again.

I would be very surprised if it would get less behind if I would use it
in the 1 by 1 scenario.

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] DROP FUNCTION IF EXISTS

2006-09-25 Thread Csaba Nagy
 Not being an expert, but to me it looks like the client_encoding being set to 
 UTF8 but the data being sent is something other than UTF8. I've seen this 
 happen on Linux when connecting with PuTTY from Windows (and then psql from 
 the linux machine) and having the wrong encoding set in PuTTY. I'd double and 
 triple-check the client-side stuff first ;-)

I have seen another strange occurrence of such errors... I'm using linux
with UTF8 client encoding, and psql gives me such errors:

dbval=# select 1;
ERROR:  column  1 does not exist

The full story is that I typed 'ü' (u-umlaut if it won't render
correctly) and backspace before the '1'. I guess the backspace will
delete  byte-wise and will so fail to delete properly multi-byte
characters. I have no idea if this is a problem of psql or some other
problem, and it was not annoying enough to report it...

BTW, the space in  1 is something I was not able to copy-paste from
the psql command line into the mail, so I'm pretty sure it's a byte code
which is invalid UTF8.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Block B-Tree concept

2006-09-26 Thread Csaba Nagy
 And we're back to routine REINDEX I guess :-(.  This doesn't seem like a
 satisfactory answer.

If the reindex works online, it could be a satisfactory solution.

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Block B-Tree concept

2006-09-27 Thread Csaba Nagy
  I think you build a whole new index named something like .temp-reindex and
  then as the last step of the second transaction delete the old idnex and
  rename the new index.
 
 That would require getting exclusive lock on the table.

Just out of curiosity, creating a new index concurrently (or online,
whatever you call it) doesn't require to set an exclusive lock on the
table ? I thought it would, at least swiftly at the end of the
operation, after all it's modifying the table...

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [OT][HACKERS] DROP FUNCTION IF EXISTS

2006-09-28 Thread Csaba Nagy
 Backspace deletes character-wise, as long as you have LANG set
 correctly. Check LANG and the LC_* environment variables.

OK, you're right:

$ echo $LANG
[EMAIL PROTECTED]

# show client_encoding ;
 client_encoding
-
 UTF8
(1 row)


But then I wonder why the client encoding is set to UTF-8 ? I did not
fiddle at all with this AFAIK, and I guess psql is the one setting
this...

OTOH, I'm accessing the DB box via ssh, and my local box has:

cnagy echo $LANG
en_US.UTF-8

So it might be some strange interaction between my local locale, the
server's locale and ssh...

BTW, I tried if the same problem happens on the server command line (via
ssh), and it does (while, for completeness, it works fine on my local
box):

[EMAIL PROTECTED]:~$ date
-sh:  date: command not found
 ^
  this here is the result of 'ü + backspace'

So I guess this is not postgres related, and therefore off topic for
this list...

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-06 Thread Csaba Nagy
[Snip explanations]
 Comments?

Would it be reasonable to include one more switch: 'include
dependencies' ?

That would work like this:

* first consider all to be included objects (possibly limited by the
include switches);
* if dependencies are included, add all dependent objects, plus
non-schema objects (which arguably can be considered as dependencies for
the whole data base);
* remove all objects targeted by exclude switches;

This way you won't have any dependency on the ordering, and you could
consider all non-schema objects as dependencies, so they will only be
included if dependencies are to be included. 

Excluding dependencies would be the default if any switches are
specified, including otherwise (not sure how much is this of the
principle of least surprise, but it would be backwards compatible).

The scenario I most care about is to be able to make a complete data
base dump (including non-schema objects) while excluding only a few
tables. If I understood your explanations correctly, this would not be
easily possible with the current implementation. Note that I have a
patch (kludge ?) on the 8.1 pg_dump which does exactly this, it would be
a pity if I would need to patch the 8.2 one again to do that...

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-09 Thread Csaba Nagy
 We are two months past feature freeze ... adding entirely new features
 to pg_dump is *not* on the table for 8.2. 

Ok, clear.

  The scenario I most care about is to be able to make a complete data
  base dump (including non-schema objects) while excluding only a few
  tables.
 
 Isn't this the same as Kris' complaint?  Why do you need additional
 dependency analysis to do the above?

Well, I obviously didn't understand well the complete feature as it is
implemented. Now, is what I want (see above) possible with the new
feature, or if I exclude some tables I implicitly loose some other
things too from the dump which normally would be there ? This is my only
concern...

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_dump exclusion switches and functions/types

2006-10-09 Thread Csaba Nagy
On Mon, 2006-10-09 at 16:24, Tom Lane wrote:
 I think we've agreed that if you use some exclusion switches, but not
 any inclusion switches, then only the specific objects matching your
 switches are excluded.  CVS HEAD gets this wrong, but I'm going to work
 on it today.

Cool, that makes it cover my use case and some more.

Thanks,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Index Tuning Features

2006-10-11 Thread Csaba Nagy
 The above process can be performed without tool support, but its clear
 that further automation will help greatly here. I foresee that the
 development of both server-side and tools will take more than one
 release. Discussion of tool support can begin once we have agreed
 server-side capability.

If it came to automated tools, wouldn't fit in this discussion to give
some performance requirement limits to the RECOMMEND tool ? In a
workload not all queries are real time or high priority, and such a
lesser impact index can help enough sometimes to meet the requirements,
compared to a high impact index which would make the query fly.

Example: inserting in a table must be real time, reporting can be taken
offline...

So it would be nice to have a recommendation tool which can take into
account the performance requirements of the individual queries, possibly
making the right compromises to meat all requirements for all queries.

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-10 Thread Csaba Nagy
On Tue, 2008-09-09 at 20:59 +0200, Zeugswetter Andreas OSB sIT wrote:
 All in all a useful streamer seems like a lot of work.

I mentioned some time ago an alternative idea of having the slave
connect through a normal SQL connection and call a function which
streams the WAL file from the point requested by the slave... wouldn't
that be feasible ? All the connection part would be already there, only
the streaming function should be implemented. It even could use SSL
connections if needed. Then you would have one normal backend per slave,
and they should access either the files directly or possibly some shared
area where the WAL is buffered for this purpose... the streaming
function could also take care of signaling the up-to-dateness of the
slaves in case of synchronous replication.

There could also be some system table infrastructure to track the
slaves. There could also be some functions to stream the files of the DB
through normal backends, so a slave could be bootstrapped all the way
from copying the files through a simple postgres backend connection...
that would make for the easiest possible setup of a slave: configure a
connection to the master, and hit run... and last but not least the
same interface could be used by a PITR repository client for archiving
the WAL stream and occasional file system snapshots.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-11 Thread Csaba Nagy
On Thu, 2008-09-11 at 15:23 +0300, Heikki Linnakangas wrote:
 I'd imagine that even if applying the WAL on the slave is blocked, it's 
 still streamed from the master to the slave, and in case of failover the 
 slave will fast-forward before starting up as the new master.

Which begs the question: what happens with a query which is running on
the slave in the moment when the slave switches from recovery mode and
starts up ? Should the running queries be canceled if they are blocking
applying of WAL, to allow start-up, or let them finish ?

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-11 Thread Csaba Nagy
On Thu, 2008-09-11 at 15:42 +0300, Heikki Linnakangas wrote:
 One problem with this, BTW, is that if there's a continuous stream of 
 medium-length transaction in the slave, each new snapshot taken will 
 prevent progress in the WAL replay, so the WAL replay will advance in 
 baby steps, and can fall behind indefinitely.

Why would it fall behind indefinitely ? It only should fall behind to
the blocking horizon, which should be the start of the longest
currently running transaction... which should be continually advancing
and not too far in the past if there are only medium length transactions
involved. Isn't normal WAL recovery also doing baby-steps, one WAL
record a time ? ;-)

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-11 Thread Csaba Nagy
On Thu, 2008-09-11 at 16:19 +0300, Heikki Linnakangas wrote:
 Well, yes, but you can fall behind indefinitely that way. Imagine that 
 each transaction on the slave lasts, say 10 minutes, with a new 
 transaction starting every 5 minutes. On the master, there's a table 
 that's being vacuumed (or HOT-updated) frequently, say after each 
 transaction for simplicity. What can happen is that every transaction 
 that finishes on the slave will only let the WAL replay advance by one 
 XID before blocking on the snapshot of the next slave transaction. The 
 WAL replay will advance at a rate of 0.2 TPM, while the master is 
 generating 1.0 TPM.

Aha, now I see where I was mistaken... I thought in terms of time and
not transaction IDs. So the time distance between the slave transactions
does not matter at all, only the distance in recovered XIDs matter for
the blocking horizon... and if the WAL recovery is blocked, the
blocking horizon is stalled as well, so the next transaction on the
slave will in fact require the same blocking horizon as all currently
running ones. Now I got it... and that means in fact that if you have
continuously overlapping small transactions, the blocking horizon
could be even blocked forever, as there'll always be a query running,
and the new queries will always have the snapshot of the currently
running ones because WAL recovery is stalled... or at least that's what
I understand from the whole thing...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-11 Thread Csaba Nagy
On Thu, 2008-09-11 at 15:33 +0200, Dimitri Fontaine wrote:
 What would forbid the slave to choose to replay all currently lagging WALs 
 each time it's given the choice to advance a little?

Well now that I think I understand what Heikki meant, I also think the
problem is that there's no choice at all to advance, because the new
queries will simply have the same snapshot as currently running ones as
long as WAL reply is blocked... further blocking the WAL reply. When
saying this I suppose that the snapshot is in fact based on the last
recovered XID, and not on any slave-local XID. In that case once WAL
recovery is blocked, the snapshot is stalled too, further blocking WAL
recovery, and so on...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Csaba Nagy
On Fri, 2008-09-12 at 09:38 +0100, Simon Riggs wrote:
 If you request a block, we check to see whether there is a lookaside
 copy of it prior to the tuple removals. We then redirect the block
 request to a viewpoint relation's block. Each viewpoint gets a separate
 relfilenode. We do the switcheroo while holding cleanup lock on block.

Wouldn't it make sense to also have a hint bit on the pages which are
copied away ? Then instead of looking up a hash table, you first would
look up that bit, and if not set you won't look up the hash table at
all.  Then when you clean up the lookaside copies you clear those bits
too...

That would probably perform somewhat better for reading than always
looking up a potentially big hash table, and the cost of setting the
hint is probably a lot less than copying away the page in the first
place. Resetting the hint bit might be a bit more expensive.

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Csaba Nagy
On Fri, 2008-09-12 at 12:31 +0100, Richard Huxton wrote:
 There was a suggestion (Simon - from you?) of a transaction voluntarily
 restricting itself to a set of tables.

While thinking about how easy it would be for the DBA to specify the set
of tables a single query is accessing, first I thought that it should be
straight enough to look at the query itself for that. Then I thought
what about views, rules, triggers, user functions etc. ? All those have
the potential to access more than you see in the query itself. And then
the actually interesting question: what will the slave do with views,
rules, triggers ? I guess triggers are out of the question to be
executed, what about rules ? Probably must be also ignored... user
functions will probably get errors if they try to update something...
Views should probably function correctly.

So in any case the functionality available for querying slaves would be
less than for the primary. This is probably good enough for most
purposes...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Csaba Nagy
 I think that enabling long-running queries this way is both
 low-hanging
 fruit (or at least medium-height-hanging ;) ) and also consistent to
 PostgreSQL philosophy of not replication effort. As an example we trust
 OS's file system cache and don't try to write our own.

I have again questions (unfortunately I only have questions usually):

* how will the buffers keep 2 different versions of the same page ?
* how will you handle the creation of snapshots ? I guess there's no portable 
and universal API for that (just guessing), or there is some POSIX thing which 
is supported or not by the specific FS ? So if the FS is not supporting it, you 
skip the snapshot step ? And if there's no universal API, will it be handled by 
plugins providing a specified API for snapshotting the FS ?

I hope my continuous questioning is not too annoying...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshots and Hot Standby

2008-09-12 Thread Csaba Nagy
On Fri, 2008-09-12 at 15:08 +0300, Hannu Krosing wrote:
  * how will the buffers keep 2 different versions of the same page ?
 
 As the FS snapshot is mounted as a different directory, it will have
 it's own buffer pages.

Lack of knowledge about this shows my ignorance about the implementation
of the page buffers...

  * how will you handle the creation of snapshots ? 
 
 probably an external command, possibly shell script. 
 similar to current archive_command for wal copying
 
 maybe 'create_fs_snapshot_command' and 'destroy_fs_snapshot_command'
[snip]
 Yes, the simplest one being external command. As FS snapshots are
 supposed to happen not-too often, at least not every second, just having
 external commands may be enough.

You could restrict the creation of snapshots to some minimum amount of
time between them, and maybe also restrict the maximum number of
concurrent snapshots possible. Then if the time limit (as calculated
from the last open snapshot) is currently not met, any new query could
reuse that last snapshot. The time intervals do not need to be evenly
distributed BTW, it could be a function of the already opened snapshots,
like increase the minimum interval exponentially with the number of
already opened snapshots. That would help to catch more long running
queries to just a few snapshots.

  I hope my continuous questioning is not too annoying...
 
 On the contrary, much appreciated. :)

Ok, then I'll continue :-) I would like to see this feature succeed, but
there's slim chance I'll ever code well in C...

Cheers,
Csaba.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >