Re: [PATCHES] [HACKERS] Function structure in formatting.c

2007-08-13 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Brendan Jurd wrote:
> Hello,
> 
> As discussed on -hackers, I've done some refactoring work on
> backend/utils/adt/formatting.c, in an attempt to make the code a bit
> more intelligible before improving handling of bogus formats.
> 
> This is purely a refactor.  The functionality of the file hasn't
> changed; it does the same job as before, but it does it in ~200 fewer
> lines and ~3.5k fewer characters.  The clarity of code is greatly
> improved.  Sadly, performance appears to be unchanged.
> 
> Summary of changes:
> 
>  * Did away with dch_global, dch_date and dch_time.
>  * Replaced DCH_processor with two new functions DCH_to_char and
> DCH_from_char, which now do all the work previously done by
> dch_{global,date,time}.
>  * Removed the 'action' field from the KeyWord struct as it is no longer 
> useful.
>  * Changed the type of the 'character' field in the FormatNode struct
> to char, because ... that's what it is.  The original choice of 'int'
> seems to have been an error.
>  * Removed commented-out function declaration for is_acdc.  According
> to CVS annotate, this hasn't been in use since sometime in the early
> Cretaceous period, and in any case I don't know why you'd want to
> check whether a string was the rock band AC/DC. =)
>  * Reworded some of the comments for clarity.
>  * Didn't touch any of the number formatting routines.
> 
> This compiles cleanly on x86 gentoo and passes check, installcheck and
> installcheck-parallel.
> 
> Thanks for your time,
> BJ

[ Attachment, skipping... ]

> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] default_text_search_config and expression indexes

2007-08-13 Thread Bruce Momjian
Heikki Linnakangas wrote:
> Bruce Momjian wrote:
> > Heikki Linnakangas wrote:
> >> Removing the default configuration setting altogether removes the 2nd
> >> problem, but that's not good from a usability point of view. And it
> >> doesn't solve the general issue, you can still do things like:
> >> SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@
> >> to_tsquery('confB', 'query');
> > 
> > True, but in that case you are specifically naming different
> > configurations, so it is hopefully obvious you have a mismatch.
> 
> There's many more subtle ways to do that. For example, filling a
> tsvector column using a DEFAULT clause. But then you sometimes fill it
> in the application instead, with a different configuration. Or if one of
> the function calls is buried in another user defined function.
> 
> I don't think explicitly naming the configuration gives enough protection.

Oh, wow, OK, well in that case the text search API isn't ready and we
will have to hold this for 8.4.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] default_text_search_config and expression indexes

2007-08-13 Thread Bruce Momjian
Heikki Linnakangas wrote:
> Oleg Bartunov wrote:
> > On Wed, 8 Aug 2007, Bruce Momjian wrote:
> >> Heikki Linnakangas wrote:
> >>> If I understood correctly, the basic issue is that a tsvector datum
> >>> created using configuration A is incompatible with a tsquery datum
> >>> created using configuration B, in the sense that you won't get
> >>> reasonable results if you use the tsquery to search the tsvector, or do
> >>> ranking or highlighting. If the configurations happen to be similar
> >>> enough, it can work, but not in general.
> >>
> >> Right.
> > 
> > not fair. There are many cases when one can intentionally use different
> > configurations. But I agree, this is not for beginners.
> 
> Can you give an example of that?
> 
> I certainly can see the need to use different configurations in one
> database, but what's the use case for comparing a tsvector created with
> configuration A against a tsquery created with configuration B?

I assume you could have a configuration with different stop words or
synonymns and compare them.

> >>> - using an expression index instead of a tsvector-field, and always
> >>> explicitly specifying the configuration, you can avoid that problem (a
> >>> query with a different configuration won't use the index). But an
> >>> expression index, without explicitly specifying the configuration, will
> >>> get corrupted if you change the default configuration.
> >>
> >> Right.
> > 
> > the same problem if you drop constrain from table (accidently) and then
> > gets surprised by select results.
> 
> The difference is that if you change the default configuration, you
> won't expect that your queries start to return funny results. It looks
> harmless, like changing the date style. If you drop a constraint, it's
> much more obvious what the consequences are.
> 
> > We should agree that all you describe is only for DUMMY users. From
> > authors point of view I dislike your approach to treat text searching as
> > a very limited tool. But I understand that we should preserve people
> > from stupid errors.
> > 
> > I want for beginners easy setup and error-prone functionality,
> > but leaving experienced users to develop complex search engines.
> > Can we have separate safe interface for text searching and explicitly
> > recommend it for beginners ?
> 
> I don't see how any of the suggestions limits what you can do with it.
> If we remove the default configuration parameter, you just have to be
> explicit. If we go with the type-system I suggested, you could still add
> casts and conversion functions between different tsvector types, where
> it make sense.

I don't think the type system is workable given the ability to create
new configurations on the fly.  I think the configuration must be
specified each time.

At this point, if we keep discussing the tsearch2 API we are not going
to have this in 8.3.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] 2D partitioning of VLDB - sane or not?

2007-08-13 Thread Josh Berkus
Jason,

> Which brings us back to the original issue. If I decide to stick with
> the current implementation and not "improve our existing partitioning
> mechanisms to scale to 100,000 partitions", I could do something like:
>
> Maintain 2 copies of the parent table (partitioned by 256).
> Inherit from both to a relation table.
>
> Does this get me out of the woods with the query analyzer? Doesn't seem
> like it would, necessarily, at least.

You don't get a table's partitions when you inherit.  Just the schema of 
the master.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Tom Lane
Josh Berkus <[EMAIL PROTECTED]> writes:
> Tom,
>> Also, while we might accept
>> a small hook-function patch for 8.3, there's zero chance of any of that
>> other stuff making it into this release cycle.

> I don't think anyone was thinking about 8.3.  This is pretty much 8.4 
> stuff; Julius is just raising it now becuase they don't want to go down 
> the wrong path and waste everyone's time.

Well, if they get the hook in now, then in six months or so when they
have something to play with, people would be able to play with it.
If not, there'll be zero uptake till after 8.4 is released...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Wrote a connect-by feature

2007-08-13 Thread Andrej Ricnik-Bay
On 8/14/07, Bertram Scharpf <[EMAIL PROTECTED]> wrote:
> I just forgot to refer to--of course--the German knight of
> the iron hand.
Prima donna?  Not quite done yet?

Denk mal nach.  Kritik kann wohl begruendet sein.  Und was die
Jungs gesagt haben war sowohl inhaltlich als auch von Votrag
her vollkommen angemessen.

Oh well.  Pity.  I thought it was an interesting contribution.


> Bertram
Cheers,
Andrej

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] [HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Josh Berkus
Tom,

> Also, while we might accept
> a small hook-function patch for 8.3, there's zero chance of any of that
> other stuff making it into this release cycle.

I don't think anyone was thinking about 8.3.  This is pretty much 8.4 
stuff; Julius is just raising it now becuase they don't want to go down 
the wrong path and waste everyone's time.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Autovacuum and toast tables

2007-08-13 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> I think there is some special code in tqual.c that skips some checks
> assuming that the toast table is vacuumed at the same time as the main
> heap.

I don't believe there is any such connection.  We do use a special
snapshot for toast tables, but it's only needed to make sure VACUUM FULL
on a toast table will work (ie, we have to respect MOVED_IN/MOVED_OFF).

> We go certain lengths in autovacuum to make sure tables are vacuumed
> when their toast table needs vacuuming and the main table does not,
> which is all quite kludgy.

Yeah --- getting rid of that mess would be a plus.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Autovacuum and toast tables

2007-08-13 Thread Alvaro Herrera
Gregory Stark wrote:

> When autovacuum vacuums a table it always vacuums the toast table as well. Is
> there any particular reason to consider the two together? I think this may
> just be a holdover from the interactive command which considers them together
> because the toast table is supposed to be invisible to users. 
> 
> But autovacuum could look at the stats for the toast tables and make decisions
> about them independently from the main tables, no?
> 
> The reason I ask is because looking in the logs from benchmarks I see lots of
> useless vacuuming of empty toast tables which have no dead tuples. Now in this
> case it doesn't cost anything because they're empty. But I could easily see
> situations where the toast tables could be quite large but not be receiving
> any updates when the main table is receiving a large volume of updates on
> other columns.

I think there is some special code in tqual.c that skips some checks
assuming that the toast table is vacuumed at the same time as the main
heap.  I am not sure how vital is that.  Maybe it can be fixed, or maybe
it is not a problem at all.

We go certain lengths in autovacuum to make sure tables are vacuumed
when their toast table needs vacuuming and the main table does not,
which is all quite kludgy.  So we already look at their stats and make
decisions about them.  But what we do after that is force a vacuum to
the main table, even if that one does not need any vacuuming, which is
dumb.

We could certainly fix that, mainly pending analysis of the above
problem.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Tom Lane
Julius Stroffek <[EMAIL PROTECTED]> writes:
> I understood that if the user creates his own implementation of the 
> planner which can be stored in some external library, he have to provide 
> some C language function as a "hook activator" which will assign the 
> desired value to the planner_hook variable. Both, the activator function 
> and the new planner implementation have to be located in the same 
> dynamic library which will be loaded when CREATE FUNCTION statement 
> would be used on "hook activator" function.

You could do it that way if you wanted, but a minimalistic solution is
just to install the hook from the _PG_init function of a loadable
library, and then LOAD is sufficient for a user to execute the thing.
There's a small example at
http://archives.postgresql.org/pgsql-patches/2007-05/msg00421.php

Also, having the loadable module add a custom GUC variable would likely
be a preferable solution for control purposes than making specialized
functions.  I attach another small hack I made recently, which simply
scales all the planner's relation size estimates by a scale_factor GUC;
this is handy for investigating how a plan will change with relation
size, without having to actually create gigabytes of test data.

> There are more things in the proposal as a new pg_optimizer catalog and 
> different way of configuring the hooks. However, this thinks are not 
> mandatory for the functionality but are more user friendly.

Granted, but at this point we are talking about infrastructure for
planner-hackers to play with, not something that's intended to be a
long-term API for end users.  It may or may not happen that we ever
need a user API for this at all.  I think a planner that just "does the
right thing" is far preferable to one with a lot of knobs that users
have to know how to twiddle, so I see this more as scaffolding on which
someone can build and test the replacement for GEQO; which ultimately
would go in without any user-visible API additions.

regards, tom lane

#include "postgres.h"

#include "fmgr.h"
#include "commands/explain.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
#include "utils/guc.h"


PG_MODULE_MAGIC;

void_PG_init(void);
void_PG_fini(void);

static double scale_factor = 1.0;

static void my_get_relation_info(PlannerInfo *root, Oid relationObjectId,
 bool 
inhparent, RelOptInfo *rel);


/*
 * Get control during planner's get_relation_info() function, which sets up
 * a RelOptInfo struct based on the system catalog contents.  We can modify
 * the struct contents to cause the planner to work with a hypothetical
 * situation rather than what's actually in the catalogs.
 *
 * This simplistic example just scales all relation size estimates by a
 * user-settable factor.
 */
static void
my_get_relation_info(PlannerInfo *root, Oid relationObjectId, bool inhparent,
 RelOptInfo *rel)
{
ListCell   *ilist;

/* Do nothing for an inheritance parent RelOptInfo */
if (inhparent)
return;

rel->pages = (BlockNumber) ceil(rel->pages * scale_factor);
rel->tuples = ceil(rel->tuples * scale_factor);

foreach(ilist, rel->indexlist)
{
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ilist);

ind->pages = (BlockNumber) ceil(ind->pages * scale_factor);
ind->tuples = ceil(ind->tuples * scale_factor);
}
}


/*
 * _pg_init()   - library load-time initialization
 *
 * DO NOT make this static nor change its name!
 */
void
_PG_init(void)
{
/* Get into the hooks we need to be in all the time */
get_relation_info_hook = my_get_relation_info;
/* Make scale_factor accessible through GUC */
DefineCustomRealVariable("scale_factor",
 "",
 "",
 &scale_factor,
 0.0001,
 1e9,
 PGC_USERSET,
 NULL,
 NULL);
}


/*
 * _PG_fini()   - library unload-time finalization
 *
 * DO NOT make this static nor change its name!
 */
void
_PG_fini(void)
{
/* Get out of all the hooks (just to be sure) */
get_relation_info_hook = NULL;
}

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Julius Stroffek

Stefan,

thanks for pointing this out. I missed this change.

We would like to place the hooks to a different place in the planner and 
we would like to just replace the non-deterministic algorithm searching 
for the best order of joins and keep the rest of the planner untouched.


I am not quite sure about the usage from the user point of view of what 
got implemented. I read just the code of the patch. Are there more 
explanations somewhere else?


I understood that if the user creates his own implementation of the 
planner which can be stored in some external library, he have to provide 
some C language function as a "hook activator" which will assign the 
desired value to the planner_hook variable. Both, the activator function 
and the new planner implementation have to be located in the same 
dynamic library which will be loaded when CREATE FUNCTION statement 
would be used on "hook activator" function.


Am I correct? Have I missed something?

If the above is the case than it is exactly what we wanted except we 
would like to have the hook also in the different place.


There are more things in the proposal as a new pg_optimizer catalog and 
different way of configuring the hooks. However, this thinks are not 
mandatory for the functionality but are more user friendly.


Thanks

Julo

Stefan Kaltenbrunner wrote:

Julius Stroffek wrote:
  

Hi All,

Tomas Kovarik and I have presented at PGCon 2007 in Ottawa
the ideas about other possible optimizer algorithms to be used
in PostgreSQL.

We are quite new to PostgreSQL project so it took us some
time to go through the sources end explore the possibilities
how things could be implemented.

There is a proposal attached to this mail about the interface
we would like to implement for switching between different
optimizers. Please review it and provide a feedback to us.
Thank You.



hmm - how does is that proposal different from what got implemented with:

http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php


Stefan
  


Re: [HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Tom Lane
Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes:
> Julius Stroffek wrote:
>> There is a proposal attached to this mail about the interface
>> we would like to implement for switching between different
>> optimizers. Please review it and provide a feedback to us.

> hmm - how does is that proposal different from what got implemented with:
> http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php

Well, it's a very different level of abstraction.  The planner_hook
would allow you to replace the *entire* planner, but if you only want to
replace GEQO (that is, only substitute some other heuristics for partial
search of a large join-order space), doing it from planner_hook will
probably require duplicating a great deal of code.  A hook right at the
place where we currently choose "geqo or regular" would be a lot easier
to experiment with.

Replacing GEQO sounds like a fine area for investigation to me; I've
always been dubious about whether it's doing a good job.  But I'd prefer
a simple hook function pointer designed in the same style as
planner_hook (ie, intended to be overridden by a loadable module).
The proposed addition of a system catalog and SQL-level management
commands sounds like a great way to waste a lot of effort on mere
decoration, before ever getting to the point of being able to
demonstrate that there's any value in it.  Also, while we might accept
a small hook-function patch for 8.3, there's zero chance of any of that
other stuff making it into this release cycle.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Testing the async-commit patch

2007-08-13 Thread Andrew Dunstan



Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
  
I have some ideas about testing configuration items. Doing all our tests 
with the default config is not ideal, I think. Essentially we'd put up a 
server that would have sets of . The 
client would connect to the server if it could and get the set(s) of 
lines for the branch on question, and for each set it would try another 
run of installcheck  (I'm also wondering if we should switch to doing 
installcheck-parallel). Anyway, this would be a config option on the 
buildfarm, so we wouldn't overburden hosts with limited run windows 
(e.g. the Solaris boxes Sun has on the farm) or slow run times (e.g. 
some of the old and/or tiny hardware we have).



  

If this seems worth it I'll put it on my TODO.



Sounds like a good plan, except that an extra server seems unnecessary
mechanism (and perhaps an unnecessary security risk).  We can just put
a file into CVS src/test/regress showing what we'd like tested.


  


That could work.

Let's say that this file looks just like a postgresql.conf file, except 
that any line beginning with '[' is a config set name for 
the lines that follow. So we might have:


   [asynch_commit]
   synchronous_commit = off

   [no_fsync]
   fsync = off

   [csvlogs]
   start_log_collector = true
   log_destination = 'stderr, csvlog'

Then there would be an extra installcheck-parallel run for each set. If 
the file isn't there we do nothing.


cheers

andrew




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Stefan Kaltenbrunner
Julius Stroffek wrote:
> Hi All,
> 
> Tomas Kovarik and I have presented at PGCon 2007 in Ottawa
> the ideas about other possible optimizer algorithms to be used
> in PostgreSQL.
> 
> We are quite new to PostgreSQL project so it took us some
> time to go through the sources end explore the possibilities
> how things could be implemented.
> 
> There is a proposal attached to this mail about the interface
> we would like to implement for switching between different
> optimizers. Please review it and provide a feedback to us.
> Thank You.

hmm - how does is that proposal different from what got implemented with:

http://archives.postgresql.org/pgsql-committers/2007-05/msg00315.php


Stefan

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] Proposal: Pluggable Optimizer Interface

2007-08-13 Thread Julius Stroffek

Hi All,

Tomas Kovarik and I have presented at PGCon 2007 in Ottawa
the ideas about other possible optimizer algorithms to be used
in PostgreSQL.

We are quite new to PostgreSQL project so it took us some
time to go through the sources end explore the possibilities
how things could be implemented.

There is a proposal attached to this mail about the interface
we would like to implement for switching between different
optimizers. Please review it and provide a feedback to us.
Thank You.

Regards

Julius Stroffek

Proposal for Pluggable Optimizer Interface
==

Overview


We have presented at PGCon 2007 in Ottawa couple of other
approaches and algorithms that can be used for query optimization,
see 
http://www.pgcon.org/2008/papers/Execution_Plan_Optimization_Techniques_Julius_Stroffek.pdf
We have focused on algorithms for searching the space of
all possible orders of joins including bushy trees.
The possible algorithms include

 * Dynamic Programming (already implemented
   in PostgreSQL)
 * Genetic Algorithm (already implemented
   in PostgreSQL)
 * Dijkstra's Algorithm
 * A* Search Algorithm
 * Greedy “Nearest Neighbor” Algorithm
 * Hill-Climbing
 * Simulated Annealing
 * Iterative Improvement
 * Two Phase Optimization
 * Toured Simulated Annealing

Choosing the best algorithm from the above list is
difficult. We have to consider the length of the optimizer
computation vs. the quality of the solution which we would
like to achieve. We may want to do some hybrid optimization
– run couple of the algorithms from the above and choose
the best solution found. We might run some very fast
algorithm at the beginning and depending on the solution
cost we may decide whether it is worthwhile to try
to optimize the plan even more (using other algorithm
with longer running time but producing better solutions).
Therefore we would like to propose an interface which
can be used to switch between different optimizer algorithms
and/or allow a user to write and use his own implementation.
It would allow the community to ship more optimizer
algorithms as contrib modules and users may then decide
which of those algorithms should be used for their queries.

Creating an optimizer
=

We would propose to create a catalog holding the available
optimizers in the system called pg_optimizer. Users could
than use a SET command to switch between different
optimizers.

postgres=# select * from pg_optimizer;
 optname |  optproc
-+---
 geqo| geqo_optimizer
 dynamic | dynamic_optimizer
 greedy  | greedy_optimizer
(4 rows)

postgres=# set optimizer=greedy;
SET


Optimizer Invocation Point
==
There is a code in function make_rel_from_joinlist which
decides whether to invoke dynamic programming or genetic
algorithm for query optimization. We would propose to place
the invocation of the plugged optimizer to the same place
and with the same parameters as function geqo and
make_one_rel_by_joins are currently invoked.

Creating and dropping an optimizer
==
The optimizer function have to be implemented as a C-Language
Function using “Version 1 Calling Conventions”. The return
type of the function is RelOptInfo * and the arguments
passed to the function are

 1.PlannerInfo *root
 2.int levels_needed
 3.List * initial_rels

The proper “CREATE FUNCTION” statement have to be used
to create the optimizer function.

> CREATE FUNCTION greedyoptimizer(internal, int, internal)
> RETURNS internal
> AS 'mylib', 'greedy_optimizer'
> LANGUAGE C
> ;

Once, the optimizer function is created user may create
an optimizer using the function with the statement

> CREATE OPTIMIZER greedy (
> function = greedyoptimizer
> comment = 'Greedy Nearest Neighbor Optimizer'
> );

If the user decides not to use the optimizer anymore
he can invoke

> DROP OPTIMIZER greedy;

User have to also drop the optimizer function with

> DROP FUNCTION greedyoptimizer;

Project TODO List
=
1.Create a pg_optimizer catalog to hold available
  optimizers.
2.Create wrappers above the current dynamic
  programming and genetic algorithm optimizers
  to be used to call those implementations.
3.Modify the parser and add the functions to handle
  and execute the CREATE/DROP OPTIMIZER statements.
4.Modify GUC that it would be possible to switch
  between optimizers.
5.Change the code at the optimizer invocation
  point that the appropriate optimizer function
  would be called.
6.Handle object dependencies – make an entry
  in pg_depend that optimizer depends on its
  function.
7.Implement '\dO' command that will list the
  available optimizers.
8.Create a contrib module and ship some other
  optimizer algorithms.
9.Any other suggestion, comments and changes
  that will come out from the review of this
  proposal.

Things to Decide

1.Rights. Who can create/drop optimizers?
  Who 

Re: [HACKERS] Testing the async-commit patch

2007-08-13 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> "Tom Lane" <[EMAIL PROTECTED]> writes:
>> I propose that we actually set synchronous_commit
>> off by default for the next little while --- at least up to 8.3beta1,
>> maybe until we approach the RC point.  That will ensure that every
>> buildfarm machine is exercising the async-commit behavior, as well as
>> every developer who's testing HEAD.
>> 
>> Of course the risk is that we might forget to turn it back on before
>> release :-(

> I'll set a cron job to remind us. What date should I set it for? :)

I thought of a low-tech solution for that: put a note in
src/tools/RELEASE_CHANGES about it.  We invariably look at that file
while preparing releases.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Testing the async-commit patch

2007-08-13 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> I have some ideas about testing configuration items. Doing all our tests 
> with the default config is not ideal, I think. Essentially we'd put up a 
> server that would have sets of . The 
> client would connect to the server if it could and get the set(s) of 
> lines for the branch on question, and for each set it would try another 
> run of installcheck  (I'm also wondering if we should switch to doing 
> installcheck-parallel). Anyway, this would be a config option on the 
> buildfarm, so we wouldn't overburden hosts with limited run windows 
> (e.g. the Solaris boxes Sun has on the farm) or slow run times (e.g. 
> some of the old and/or tiny hardware we have).

> If this seems worth it I'll put it on my TODO.

Sounds like a good plan, except that an extra server seems unnecessary
mechanism (and perhaps an unnecessary security risk).  We can just put
a file into CVS src/test/regress showing what we'd like tested.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Testing the async-commit patch

2007-08-13 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> But to get to the point: the urgency of testing the patch more
> extensively has just moved up a full order of magnitude, IMHO anyway.
> I muttered something in the other thread about providing a buildfarm
> option to run the regression tests with synchronous_commit off.  That
> would still be a good idea in the long run, but I want to take some more
> drastic measures now.  I propose that we actually set synchronous_commit
> off by default for the next little while --- at least up to 8.3beta1,
> maybe until we approach the RC point.  That will ensure that every
> buildfarm machine is exercising the async-commit behavior, as well as
> every developer who's testing HEAD.
>
> Of course the risk is that we might forget to turn it back on before
> release :-(

I'll set a cron job to remind us. What date should I set it for? :)

Seems like a fine plan to me. It's supposed to be 100% reliable and have
indistinguishable behaviour barring a system crash and nobody should be
running production data on a beta or pre-beta build, so they should never see
a difference.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Testing the async-commit patch

2007-08-13 Thread Andrew Dunstan



Tom Lane wrote:

But to get to the point: the urgency of testing the patch more
extensively has just moved up a full order of magnitude, IMHO anyway.
I muttered something in the other thread about providing a buildfarm
option to run the regression tests with synchronous_commit off.  That
would still be a good idea in the long run, but I want to take some more
drastic measures now.  I propose that we actually set synchronous_commit
off by default for the next little while --- at least up to 8.3beta1,
maybe until we approach the RC point.  That will ensure that every
buildfarm machine is exercising the async-commit behavior, as well as
every developer who's testing HEAD.

Of course the risk is that we might forget to turn it back on before
release :-(


  


Turn it off, I doubt we'll forget.

I have some ideas about testing configuration items. Doing all our tests 
with the default config is not ideal, I think. Essentially we'd put up a 
server that would have sets of . The 
client would connect to the server if it could and get the set(s) of 
lines for the branch on question, and for each set it would try another 
run of installcheck  (I'm also wondering if we should switch to doing 
installcheck-parallel). Anyway, this would be a config option on the 
buildfarm, so we wouldn't overburden hosts with limited run windows 
(e.g. the Solaris boxes Sun has on the farm) or slow run times (e.g. 
some of the old and/or tiny hardware we have).


If this seems worth it I'll put it on my TODO.

cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] 2D partitioning of VLDB - sane or not?

2007-08-13 Thread Jason Nerothin
Josh,

I think what you are suggesting is something like this:

-- begin SQL --
core=# CREATE TABLE temp_x( x_id BIGINT PRIMARY KEY, x_info VARCHAR(16) NOT
NULL DEFAULT 'x_info');
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "temp_x_pkey"
for table "temp_x"
CREATE TABLE
core=# CREATE TABLE temp_y( y_id BIGINT PRIMARY KEY, y_info VARCHAR(16) NOT
NULL DEFAULT 'y_info');
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "temp_y_pkey"
for table "temp_y"
CREATE TABLE
core=# CREATE TABLE temp_xy() INHERITS (temp_x, temp_y);
CREATE TABLE
core=# \d temp_xy
 Table "core.temp_xy"
 Column | Type  |  Modifiers
+---+--
 x_id   | bigint| not null
 x_info | character varying(16) | not null default 'x_info'::character
varying
 y_id   | bigint| not null
 y_info | character varying(16) | not null default 'y_info'::character
varying
Inherits: temp_x,
  temp_y

-- end code --

The problem with this is what I really want to do is something like this:

-- begin code --
core=# CREATE TABLE temp_xx() INHERITS (temp_x, temp_x);
ERROR:  inherited relation "temp_x" duplicated
-- end code --

The issue is that the relations are in fact reflexive and, due to the sheer
size fo the data I'm trying to warehouse, I'd like not to keep them around
more than once.

I'm sort of thinking aloud here, but based on what you've told me, I guess
I'm left having to choose which direction I want to search in since the
domains and ranges are theoretically the same. On the other hand, perhaps I
could take the overhead impact and just keep two copies of the parent tables
around. The relation table is on the order of about 300-500x as large as the
parent tables and that multiplier is expected to stay relatively constant
over time...?

Which brings us back to the original issue. If I decide to stick with the
current implementation and not "improve our existing partitioning
mechanisms to scale to 100,000 partitions", I could do something like:

Maintain 2 copies of the parent table (partitioned by 256).
Inherit from both to a relation table.

Does this get me out of the woods with the query analyzer? Doesn't seem like
it would, necessarily, at least.

On 8/11/07, Josh Berkus <[EMAIL PROTECTED]> wrote:
>
> Jason,
>
> > Aside from running into a known bug with "too many triggers" when
> creating
> > gratuitous indices on these tables, I feel as it may be possible to do
> what
> > I want without breaking everything. But then again, am I taking too many
> > liberties with technology that maybe didn't have use cases like this one
> in
> > mind?
>
> Well, you're pushing PostgreSQL partitioning further than it's currently
> able
> to go.  Right now our range exclusion becomes too costly to be useful
> somewhere around 300 to 1000 partitions (depending on CPU and other
> issues)
> because the constraints are checked linearly.
>
> To make your scheme work, you'd need to improve our existing partitioning
> mechanisms to scale to 100,000 partitions.  It would also help you to
> implement multiple inheritance so that you could have a partition which
> belonged to two masters.  I'd be very interested in seeing you do so, of
> course, but this may be more hacking than you had in mind.
>
> --
> Josh Berkus
> PostgreSQL @ Sun
> San Francisco
>



-- 

Jason Nerothin
Programmer/Analyst IV - Database Administration
UCLA-DOE Institute for Genomics & Proteomics
Howard Hughes Medical Institute

611 C.E. Young Drive East   | Tel: (310) 206-3907
105 Boyer Hall, Box 951570  | Fax: (310) 206-3914
Los Angeles, CA 90095. USA | Mail: [EMAIL PROTECTED]

http://www.mbi.ucla.edu/~jason



Re: [HACKERS] Wrote a connect-by feature

2007-08-13 Thread Josh Berkus
Bertram,

Just so you know, there is a site for external projects associated with 
PostgreSQL which are not part of the core code:
www.pgfoundry.org

You are quite welcome to propose your connect-by code as a project on 
pgFoundry.  This is how many new features start out before acceptance anyway.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Testing the async-commit patch

2007-08-13 Thread Tom Lane
So I was testing my fix for the problem noted here:
http://archives.postgresql.org/pgsql-hackers/2007-08/msg00196.php
and promptly found *another* bug.  To wit, that repair_frag calls
HeapTupleSatisfiesVacuum without bothering to acquire any buffer
content lock.  This results in an Assert failure inside
SetBufferCommitInfoNeedsSave, if HeapTupleSatisfiesVacuum tries to
update any hint bits for the tuple.  I think that is impossible in
current releases, because the tuple's logical status was fully
determined by the prior call in scan_heap.  But it's possible as of
8.3 because the walwriter or other backends could have moved the WAL
flush point, allowing a previously unhintable XMAX to become hintable.

I think the best solution for this is to acquire the buffer content lock
before calling HeapTupleSatisfiesVacuum --- it's really a pretty ugly
shortcut that the code didn't do that already.  We could alternatively
refuse to do shrinking unless both XMIN and XMAX are correctly hinted
at scan_heap time; but there is not anything else in vacuum.c that seems
to require XMAX_COMMITTED to be set, so I'd rather not make that
restriction.

But to get to the point: the urgency of testing the patch more
extensively has just moved up a full order of magnitude, IMHO anyway.
I muttered something in the other thread about providing a buildfarm
option to run the regression tests with synchronous_commit off.  That
would still be a good idea in the long run, but I want to take some more
drastic measures now.  I propose that we actually set synchronous_commit
off by default for the next little while --- at least up to 8.3beta1,
maybe until we approach the RC point.  That will ensure that every
buildfarm machine is exercising the async-commit behavior, as well as
every developer who's testing HEAD.

Of course the risk is that we might forget to turn it back on before
release :-(

Comments?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] strange behaviour of parser - numeric domain doesn't work phantom

2007-08-13 Thread Pavel Stehule
2007/8/13, Tom Lane <[EMAIL PROTECTED]>:
> "Pavel Stehule" <[EMAIL PROTECTED]> writes:
> > postgres=# select -11::nb;
>
> The operator precedence table at
> http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL-PRECEDENCE
> says perfectly clearly that :: binds more tightly than -.
>
> regards, tom lane
>

My mistake. Sorry for noise

regards
Pavel Stehule

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Wrote a connect-by feature

2007-08-13 Thread Bertram Scharpf
Am Sonntag, 12. Aug 2007, 18:36:47 +0200 schrieb Bertram Scharpf:
> Good bye!

I just forgot to refer to--of course--the German knight of
the iron hand.

Bertram

-- 
Bertram Scharpf
Stuttgart, Deutschland/Germany
http://www.bertram-scharpf.de

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 92DE-7B78-4153 : CONFIRM from pgsql-hackers (subscribe)

2007-08-13 Thread Bertram Scharpf
accept 92DE-7B78-4153

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] 8.3 freeze/release

2007-08-13 Thread Tom Lane
Michael Meskes <[EMAIL PROTECTED]> writes:
> it seems a lot of work is still being done on 8.3. But we are in feature
> freeze, aren't we? I'm wondering because I'm sitting on some major
> changes to ecpg (real prepare amongst them) but haven't committed these
> because of the freeze.

Well, we are in feature freeze, but as far as I'm concerned ecpg is its
own little fiefdom.  If you have enough confidence in these changes to
apply them now, no one is going to question you.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] strange behaviour of parser - numeric domain doesn't work phantom

2007-08-13 Thread Tom Lane
"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> postgres=# select -11::nb;

The operator precedence table at
http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL-PRECEDENCE
says perfectly clearly that :: binds more tightly than -.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Maximum row size

2007-08-13 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> ... it's kind
> of pie-in-the-sky since it depends on every column being toasted (and not
> compressed). It's much more normal to have some columns be quite large and
> others more normal sized or different data types.

Yeah.  I think those numbers should simply be removed from the
discussion entirely, because they are bogus.  There is no real-world
situation where someone is going to put 1Gb into each and every field of
a row.  The limits that should be stated are 1Gb per column and 250-1600
columns per table --- let the readers draw their own conclusions from
those.

As Greg notes, the 400-column figure came from presuming that every
column gets toasted out-of-line; but that's not totally realistic
either.  Furthermore, we have some untoastable types that are wider
than a toast pointer.  The worst case seems to be 32 bytes for box,
line, and lseg (I ignore "name" which is stated not to be intended
for general-purpose use).  If you made a table of only "box" columns
you could get at most a bit over 250 of them on a standard-size page.
So I think that's what we should quote as the lower bound.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] 8.3 freeze/release

2007-08-13 Thread Michael Meskes
Hi,

it seems a lot of work is still being done on 8.3. But we are in feature
freeze, aren't we? I'm wondering because I'm sitting on some major
changes to ecpg (real prepare amongst them) but haven't committed these
because of the freeze. However, it looks as if new patches are still
coming in. But then I might have gotten a false impression.

Anyway, if the release is still far enough away, I would like to commit
my changes too. 

Comments?

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Maximum row size

2007-08-13 Thread Gregory Stark

Devrim GÜNDÜZ <[EMAIL PROTECTED]> writes:

> What is the maximum row size for PostgreSQL? 
>
> http://www.postgresql.org/about/ says 1.6 TB

I think this comes from the maximum of 1600 columns times the maximum of 1GB
per (toasted) datum. 

However 1600 toasted datums won't fit on an 8k page. Each toast pointer is 17
bytes so only 480 toast pointers will fit on a 8k page. Which may be the
rationale for this?

> http://www.postgresql.org/docs/faqs.FAQ.html says 400 GB.

Before packed varlenas We used to be able to fit only 408 minus a few for the
page header. Perhaps it was 400 then, or perhaps they were just rounding down.

So with packed varlenas we can raise the second number to 480GB. But it's kind
of pie-in-the-sky since it depends on every column being toasted (and not
compressed). It's much more normal to have some columns be quite large and
others more normal sized or different data types.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] strange behaviour of parser - numeric domain doesn't work phantom

2007-08-13 Thread Pavel Stehule
Hello

I am testing:

postgres=# create domain nb int check(value > 10);
CREATE DOMAIN
postgres=# select 11::nb;
 nb

 11
(1 row) -- ok

postgres=# select -11::nb;
 ?column?
--
  -11
(1 row) -- ughh

postgres=# select (-11)::nb; -- I understand
ERROR:  value for domain nb violates check constraint "nb_check"

But nobody will write parentheses with alone negative number

Regards
Pavel Stehule

postgres=# select version();
version
---
 PostgreSQL 8.3devel on i686-pc-linux-gnu, compiled by GCC gcc (GCC)
4.1.2 20070502 (Red Hat 4.1.2-12)
(1 row)

---(end of broadcast)---
TIP 6: explain analyze is your friend