date:20080814

Re: [HACKERS] proposal sql: labeled function params

2008-08-14 Thread Pavel Stehule

2008/8/15 Tom Lane <[EMAIL PROTECTED]>:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
>> How is this supposed to interact with argument names ?
>
> Yeah, the real problem with this proposal is that it conscripts a syntax
> that we'll probably want to use in the future for argument-name-based
> parameter matching.  The proposed behavior is not nearly as useful as
> that would be.

It isn't. As Hannu showed these features should live in harmony (if we
will accept Oracle's syntax). I see as real problem new column in
pg_proc, that allow quickly chose between labeled and non labeled
functions - and then collect labels in parse time. This is way for
adding parameter info into variadic function - it's main goal of this
proposal. Without column in pg_proc it could same slowdowns like first
variadic function's implementation.

Regards
Pavel Stehule
>
>regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal sql: labeled function params

2008-08-14 Thread Pavel Stehule

2008/8/14 Hannu Krosing <[EMAIL PROTECTED]>:
> On Thu, 2008-08-14 at 11:56 +0200, Pavel Stehule wrote:
>> Hello
>>
>> I propose enhance current syntax that allows to specify label for any
>> function parameter:
>>
>> fcename(expr [as label], ...)
>> fcename(colname, ...)
>
> also fcename(localvar, ...) if called from another function ?
>
juju

> How is this supposed to interact with argument names ?

There is no interaction between argument names and labels. Primary
case is an using together variadic function where mostly parameters
are in one real parameter.

>
>> I would to allow  same behave of custom functions like xmlforest function:
>> postgres=# select xmlforest(a) from foo;
>>  xmlforest
>> ---
>>  10
>> (1 row)
>>
>> postgres=# select xmlforest(a as b) from foo;
>>  xmlforest
>> ---
>>  10
>> (1 row)
>
> Why not just have two arguments to xmlforest(label text,value text) like
> this:
>
> "select xmlforest('b', a) from foo"
>

syntax ... (a as b) is used now in SQL/XML - it's not new concept.
This concept allows shorter queries, because in some cases (where
column name is same as label name), you don't need write label. So
this is only generalisation of actually used concept.

> ?
>
>> Actually I am not sure what is best way for PL languages for acces to
>> these info. Using some system variables needed new column in pg_proc,
>> because collecting these needs some time and in 99% cases we don't
>> need it.
>
> Exactly, maybe it is just a bad idea in general to pass the label info
> into functions using some special syntax ?
>
> what is wrong with passing it in regular arguments ?
>

I thougs about $0 - but it's occupeted now.


> I see very little gain from complicating the syntax (and function API).
>
> maybe we will some time have keyword arguments as well and then have to
> deal with syntax like
>
> select func(arg4=7 as 'labelfor4')
>

it's inconsistent with column labels :(. So better select func(arg4=7
as labelfor4) or select func(arg4=column4), ... but his syntax
collidate with boolean expression. Oracle use => operator

Sal_raise(Sal_incr=>500, Emp_id=>7369) and I am for respect this syntax.

>
>
>> So I prefere some system function that returns labels for
>> outer function call. Like
>>
>> -- test
>> create function getlabels() returns varchar[] as $$select '{name,
>> age}'::varchar[]$$ language sql immutable;
>>
>> create or replace function json(variadic varchar[])
>> returns varchar as $$
>> select '[' || array_to_string(
>>  array(
>> select (getlabels())[i]|| ':' || $1[i]
>>from generate_subscripts($1,1) g(i))
>>,',') || ']'
>> $$ language sql immutable strict;
>
> just write the function to take arguments as pairs (value, 'label', ...)
>
> select json('Zdenek', 'name','30', 'age');
>
> select json(name, 'name', age, 'age') from person;

it's possible, sure. But then I have to repeat label when label is
known from column name.

>
>
>> postgres=# select json('Zdenek' as name,'30' as age);
>>  json
>> --
>>  [name:Zdenek,age:30]
>> (1 row)
>>
>> postgres=# select json(name, age) from person;
>>  json
>> --
>>  [name:Zdenek,age:30]
>> (1 row)
>
> why special-case table fields ?
>
???
or structured types .. there isn't case, you can put inside only one
scalar attribut - but it's true - it would to accept fields some like:

create type person as (name varchar, age integer);

select json(a.name, a.age, b.name, b.age) from ...
and generate [a.name: , a.age: 

> what if you wanted to rename any table fields ?
>

it's not renaming, It's function metadata (and default is generated
from column names).

>> There are two possibilities
>>   a) collect labels in parse time
>>   b) collect labels in executor time
>>
>> @a needs info in pg_proc, but it is simpler, @b is little bit
>> difficult, but doesn't need any changes in system catalog. I thinking
>> about b now.
>>
>> Necessary changes:
>> =
>> labels are searched in parse tree fcinfo->flinfo->fn_expr. I need
>> insert label into parse tree, so I it needs special node
>> labeled_param, For getting column reference I need to put current
>> exprstate to fcinfo.  Function getlabels() should take code from
>> ExecEvalVar function.
>>
>> Any notes, ideas?
>
> To me, this whole thing feels backwards - in described cases "labels"
> seem to be just like any other data and I don't think it justifies a
> special syntax.
>
> ---
> Hannu
>
>
>
>
>
>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PgFoundry] Unsigned Data Types

2008-08-14 Thread Ryan Bradetich

Hello all,

Here is the first pass at the unsigned data type I have been working on.

I am planning on adding these to the September 2008 commitfest wiki page.
The unsigned data type is not targeted for core, but for the uint PgFoundry
project.

The uint.c.gz file is the main source file for the uint1, uint2, and uint4
data types.
The uing.sql.gz file contains the SQL statements to add the unsigned data
type to the database.
The pg_atoui.c.gz file is based off the function in the PostgreSQL source
code but works for unsigned data types instead of signed data types.
The Makefile is used to build the unsigned data type shared library on
Linux.

The tests.tar.gz is my unit test suit that I worked on to make sure the
unsigned integer types worked as expected.

The tests cover cases like:
* table creation with the unsigned integer types.
* comparision operations.
* INSERT statements (binary and text forms).
* COPY statements (binary and text forms).
* unique btree index support.

In addition to correctness issues, I would also appreciate feedback on best
practices and portability concerns.

For example:
   I doubt my Makefiles are very portable.
   What is the proper solution to handle this?  pgxs?

Thanks,

- Ryan


uint.c.gz
Description: GNU Zip compressed data


uint.sql.gz
Description: GNU Zip compressed data


pg_atoui.c.gz
Description: GNU Zip compressed data


Makefile
Description: Binary data


tests.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal sql: labeled function params

2008-08-14 Thread Tom Lane

Hannu Krosing <[EMAIL PROTECTED]> writes:
> How is this supposed to interact with argument names ?

Yeah, the real problem with this proposal is that it conscripts a syntax
that we'll probably want to use in the future for argument-name-based
parameter matching.  The proposed behavior is not nearly as useful as
that would be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies

2008-08-14 Thread Andreas 'ads' Scherbaum

Hello,

On Sat, 02 Aug 2008 18:37:25 +0200 Magnus Hagander wrote:

> Tom Lane wrote:
> > Magnus Hagander <[EMAIL PROTECTED]> writes:
> 
> > We could catch some simple problems at file load time, perhaps,
> > but those usually aren't the ones that cause trouble for people.
> 
> It would catch things like typos, invalid CIDR address/mask and
> specifying an auth method that doesn't exist. This is the far most
> common errors I've seen - which ones are you referring to?

it may not be the far most common error but of course it's a big
problem.

For the DBA: if the configfile is in a version control system you
first have to edit the file again, search the error, submit the file and
then restart the DB - if you got the syntax error during a database
restart you are cursing all the time because the database is offline
right now.

For an newbie: as mentioned before, this guy doesn't even know where to
look for an error, but the database is offline. "Stupid Postgres, i
want something else which is working."

Of course a syntax check before or on startup cannot check for all
errors, especially not for logic errors but if we can exclude any
syntax error that would be a big help. For myself i don't care which
tool is doing the check as long as it's possible to check the config at
all.

Kind regards

-- 
Andreas 'ads' Scherbaum
German PostgreSQL User Group

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] modifying views

2008-08-14 Thread Decibel!


On Jul 29, 2008, at 9:12 PM, Robert Haas wrote:

Unfortunately, it looks to me like a fully general implementation of
this feature would be Really Hard, because a CREATE OR REPLACE VIEW
command, beyond attempting to add, drop, or retype columns, could also
attempt to reorder them.  A cursory inspection of the situation
suggests this would require modifying the attnum values in
pg_attribute, which looks like a nightmare.



FWIW, there is desire to be able to re-order columns within real  
tables, too. But before that can happen we need to divorce  
presentation order from on-page order (which is actually desirable  
for other reasons), but that's an awfully big task that no one's  
taken on.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] Plugin system like Firefox

2008-08-14 Thread Decibel!


On Aug 12, 2008, at 2:26 AM, Dave Page wrote:
On Tue, Aug 12, 2008 at 4:13 AM, Bruce Momjian <[EMAIL PROTECTED]>  
wrote:
So, ideally, if we do a plug-in system, I think we need some way  
to have
these plugins be very easily installed, perhaps by choosing object  
files

pre-compile by the build farm for each operating system.


Hmm, that idea sounds remarkably familiar 



...and one at Pervasive even before that...
--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] SeqScan costs

2008-08-14 Thread Decibel!


On Aug 13, 2008, at 10:45 PM, Andrew Gierth wrote:

You could likely expose a difference using LIMIT 1 in the subselect,
but that doesn't tell us anything we didn't already know (which is
that yes, index scan is much faster than seqscan even for 1-block
tables, except in the rare case when neither the index page nor the
table page are in cache, causing the indexscan to take two page
fetches rather than just one).

Oddly enough, when I try it with LIMIT 1, it _does_ show a significant
speed difference according to the row position, _but_ the index scan
is still twice as fast even when fetching only row 1 (which is indeed
physically first).



So the question is: why?? How can it be cheaper to hit 2 buffers than 1?

Though, unless we can improve the speed of seqscanning an entire page  
vs pulling the exact row we need it's probably still a moot point.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Alvaro Herrera

Jan Urbański wrote:

> Yeah, I got that idea, but then I thought the chances of touching the  
> same element during binary search twice were very small. Especially now  
> when the detoasting occurs only when we hit a text Datum that has the  
> same length as the sought lexeme.
> Still, I can do it if people feel like it.

Actually, in that light it sounds pretty useless.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Jan Urbański


Alvaro Herrera wrote:

Jan Urbański wrote:

Heikki Linnakangas wrote:


Sounds like a plan. In (2), it's even better to detoast the values  
lazily. For a typical one-word tsquery, the binary search will only 
look at a small portion of the elements.
Hm, how can I do that? Toast is still a bit black magic to me... Do you  
mean I should stick to having Datums in TextFreq?


Store both the Datum and the text *.  If the latter is NULL, then grab
the datum, detoast and store the result in the text *.  Next time you
need to look at it, it's already detoasted.


Yeah, I got that idea, but then I thought the chances of touching the 
same element during binary search twice were very small. Especially now 
when the detoasting occurs only when we hit a text Datum that has the 
same length as the sought lexeme.

Still, I can do it if people feel like it.

Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Alvaro Herrera

Jan Urbański wrote:
> Heikki Linnakangas wrote:

>> Sounds like a plan. In (2), it's even better to detoast the values  
>> lazily. For a typical one-word tsquery, the binary search will only 
>> look at a small portion of the elements.
>
> Hm, how can I do that? Toast is still a bit black magic to me... Do you  
> mean I should stick to having Datums in TextFreq?

Store both the Datum and the text *.  If the latter is NULL, then grab
the datum, detoast and store the result in the text *.  Next time you
need to look at it, it's already detoasted.

I don't know the code so I have no idea if this is applicable.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] API for Managing pg_hba and postgresql.conf

2008-08-14 Thread Andrew Satori

Looking at the list history, I see this has been discussed in the  
past, but it has been long enough that perhaps it is time to revisit it.


It would appear from my own support queues, that one of the most  
prevalent issues with PostgreSQL installations is not a functional  
one, but an administrative one. For obvious reasons, the secure by  
default installation is the correct choice, but it presents a  
problem.  With the proliferation of PostgreSQL onto platforms where  
security isn't natural, there are hurdles that have to be dealt with.


What I'm seeing is a default installation protects the Data directory  
properly, but in so doing means that altering the configuration files,  
pg_hba.conf and postgresql.conf require database administrators, who  
should not necessarily have a level of rights to become superuser at  
the file system level to alter the mentioned files.


Rather than change the fundamental file layout or location, I would  
propose that we expose an API or Schema in the database to better  
allow manipulation of these configuration structure from within  
PostgreSQL.  This would allow a DBA to make changes to the  
configuration without the need to be a machine administrator, or even  
to run with escalated privilege at the OS level.


My concern over tackling this is that of security.

What would be the appropriate way to protect this API.

Should it be a collection of functions or s a schema?

Should it be part of the INFORMATION_SCHEMA?

Should it be an entirely different schema, say CONFIGURATION_SCHEMA?

Should the Schema or functions be restricted to a specific database  
(say postgres) rather than part of every database?


Since most changes would require a SIGHUP, should the server process  
itself be alter to allow for a dynamic restart from within the  
environment?


While I have opinions and have tinkered with the idea a bit, I'll be  
the first to admit that this is functionality that needs to be  
discussed and structure in a generally supportable way rather than a  
platform specific hack, so I'm looking for thoughts and opinions of an  
educated variety.


A huge portion of the motivation here is to allow for easy to  
graphical administration interfaces, making the system more  
approachable, and to make remote administration of these files less  
cumbersome.






--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal sql: labeled function params

2008-08-14 Thread Hannu Krosing

On Thu, 2008-08-14 at 11:56 +0200, Pavel Stehule wrote:
> Hello
> 
> I propose enhance current syntax that allows to specify label for any
> function parameter:
> 
> fcename(expr [as label], ...)
> fcename(colname, ...)

also fcename(localvar, ...) if called from another function ?

How is this supposed to interact with argument names ?

> I would to allow  same behave of custom functions like xmlforest function:
> postgres=# select xmlforest(a) from foo;
>  xmlforest
> ---
>  10
> (1 row)
> 
> postgres=# select xmlforest(a as b) from foo;
>  xmlforest
> ---
>  10
> (1 row)

Why not just have two arguments to xmlforest(label text,value text) like
this:

"select xmlforest('b', a) from foo" 

?

> Actually I am not sure what is best way for PL languages for acces to
> these info. Using some system variables needed new column in pg_proc,
> because collecting these needs some time and in 99% cases we don't
> need it. 

Exactly, maybe it is just a bad idea in general to pass the label info
into functions using some special syntax ? 

what is wrong with passing it in regular arguments ?

I see very little gain from complicating the syntax (and function API).

maybe we will some time have keyword arguments as well and then have to
deal with syntax like

select func(arg4=7 as 'labelfor4')



> So I prefere some system function that returns labels for
> outer function call. Like
> 
> -- test
> create function getlabels() returns varchar[] as $$select '{name,
> age}'::varchar[]$$ language sql immutable;
> 
> create or replace function json(variadic varchar[])
> returns varchar as $$
> select '[' || array_to_string(
>  array(
> select (getlabels())[i]|| ':' || $1[i]
>from generate_subscripts($1,1) g(i))
>,',') || ']'
> $$ language sql immutable strict;

just write the function to take arguments as pairs (value, 'label', ...)

select json('Zdenek', 'name','30', 'age');

select json(name, 'name', age, 'age') from person;


> postgres=# select json('Zdenek' as name,'30' as age);
>  json
> --
>  [name:Zdenek,age:30]
> (1 row)
> 
> postgres=# select json(name, age) from person;
>  json
> --
>  [name:Zdenek,age:30]
> (1 row)

why special-case table fields ?

what if you wanted to rename any table fields ?

> There are two possibilities
>   a) collect labels in parse time
>   b) collect labels in executor time
> 
> @a needs info in pg_proc, but it is simpler, @b is little bit
> difficult, but doesn't need any changes in system catalog. I thinking
> about b now.
> 
> Necessary changes:
> =
> labels are searched in parse tree fcinfo->flinfo->fn_expr. I need
> insert label into parse tree, so I it needs special node
> labeled_param, For getting column reference I need to put current
> exprstate to fcinfo.  Function getlabels() should take code from
> ExecEvalVar function.
> 
> Any notes, ideas?

To me, this whole thing feels backwards - in described cases "labels" 
seem to be just like any other data and I don't think it justifies a
special syntax.

---
Hannu 







-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Jan Urbański


Jan Urbański wrote:

Heikki Linnakangas wrote:

Jan Urbański wrote:

So right now the idea is to:
 (1) pre-sort STATISTIC_KIND_MCELEM values
 (2) build an array of pointers to detoasted values in tssel()
 (3) use binary search when looking for MCELEMs during tsquery analysis


Sounds like a plan. In (2), it's even better to detoast the values 
lazily. For a typical one-word tsquery, the binary search will only 
look at a small portion of the elements.


Here's another version.


Context diff this time, always forget to convert them...

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin
*** a/src/backend/tsearch/Makefile
--- b/src/backend/tsearch/Makefile
***
*** 19,25  DICTFILES=synonym_sample.syn thesaurus_sample.ths 
hunspell_sample.affix \
  OBJS = ts_locale.o ts_parse.o wparser.o wparser_def.o dict.o \
dict_simple.o dict_synonym.o dict_thesaurus.o \
dict_ispell.o regis.o spell.o \
!   to_tsany.o ts_typanalyze.o ts_utils.o
  
  include $(top_srcdir)/src/backend/common.mk
  
--- 19,25 
  OBJS = ts_locale.o ts_parse.o wparser.o wparser_def.o dict.o \
dict_simple.o dict_synonym.o dict_thesaurus.o \
dict_ispell.o regis.o spell.o \
!   to_tsany.o ts_typanalyze.o ts_selfuncs.o ts_utils.o
  
  include $(top_srcdir)/src/backend/common.mk
  
*** /dev/null
--- b/src/backend/tsearch/ts_selfuncs.c
***
*** 0 
--- 1,320 
+ /*-
+  *
+  * ts_selfuncs.c
+  *  Selectivity functions for text search types.
+  *
+  * Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group
+  *
+  *
+  * IDENTIFICATION
+  *  $PostgreSQL$
+  *
+  *-
+  */
+ #include "postgres.h"
+ 
+ #include "miscadmin.h" /* for check_stack_depth() */
+ #include "utils/memutils.h"
+ #include "utils/builtins.h"
+ #include "utils/syscache.h"
+ #include "utils/lsyscache.h"
+ #include "utils/selfuncs.h"
+ #include "catalog/pg_type.h"
+ #include "catalog/pg_statistic.h"
+ #include "nodes/nodes.h"
+ #include "tsearch/ts_type.h"
+ 
+ /* lookup table type for binary searching through MCELEMs */
+ typedef struct
+ {
+   Datum   element;
+   float4  frequency;
+ } TextFreq;
+ 
+ /* type of keys for bsearch()ing through an array of TextFreqs  */
+ typedef struct
+ {
+   char*lexeme;
+   int length;
+ } LexemeKey;
+ 
+ static int
+ compare_lexeme_textfreq(const void *e1, const void *e2);
+ 
+ static Selectivity
+ tsquery_opr_selec(QueryItem *item, char *operand, TextFreq *lookup,
+ int length, float4 minfreq);
+ static Selectivity
+ mcelem_tsquery_selec(TSQuery query, Datum *mcelem, int nmcelem,
+  float4 *numbers, int 
nnumbers);
+ static double
+ tsquerysel(VariableStatData *vardata, Datum constval);
+ 
+ 
+ /* TSQuery traversal function */
+ static Selectivity
+ tsquery_opr_selec(QueryItem *item, char *operand, TextFreq *lookup,
+ int length, float4 minfreq)
+ {
+   LexemeKey   key;
+   TextFreq*searchres;
+   Selectivity s1, s2;
+ 
+   /* since this function recurses, it could be driven to stack overflow */
+   check_stack_depth();
+ 
+   if (item->type == QI_VAL)
+   {
+   QueryOperand *oper = (QueryOperand *) item;
+ 
+   /*
+* Prepare the key for bsearch().
+*/
+   key.lexeme = operand + oper->distance;
+   key.length = oper->length;
+ 
+   searchres = (TextFreq *) bsearch(&key, lookup, length,
+   
 sizeof(TextFreq), compare_lexeme_textfreq);
+ 
+   if (searchres)
+   {
+   /*
+* The element is in MCELEM. Return precise selectivity 
(or at
+* least as precise, as ANALYZE could find out).
+*/
+   return (Selectivity) searchres->frequency;
+   }
+   else
+   {
+   /*
+* The element is not in MCELEM. Punt, but assert that 
the
+* selectivity cannot be more than minfreq / 2.
+*/
+   return (Selectivity) Min(DEFAULT_TS_SEL, minfreq / 2);
+   }
+   }
+ 
+   /* Current TSQuery node is an operator */
+   switch (item->operator.oper)
+   {
+   case OP_NOT:
+   return 1.0 - tsquery_opr_selec(item + 1, operand, 
lookup,
+   
   length, minfreq);
+ 
+   case OP_AND:
+   return
+   tsquery_opr_selec(item + 1, oper

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Jan Urbański


Heikki Linnakangas wrote:

Jan Urbański wrote:

So right now the idea is to:
 (1) pre-sort STATISTIC_KIND_MCELEM values
 (2) build an array of pointers to detoasted values in tssel()
 (3) use binary search when looking for MCELEMs during tsquery analysis


Sounds like a plan. In (2), it's even better to detoast the values 
lazily. For a typical one-word tsquery, the binary search will only look 
at a small portion of the elements.


Here's another version.

Most common lexemes get sorted before storing in pg_statistic. The 
ordering is on length first and value second. That way we can avoid 
strncmp() calls when the lexemes have different lengths (and lexemes 
know their lengths, so the data is readily available). Also, in the 
binary search routine during selectivity estimation we can sometimes 
avoid detoasting (I think) Datums from the pg_statistic MCELEM array. 
See comments in code.


Pre-sorting introduced one problem (see XXX in code): it's not easy 
anymore to get the minimal frequency of MCELEM values. I was using it to 
assert that the selectivity of a tsquery node containing a lexeme not in 
MCELEM is no more that min(MCELEM freqs) / 2. That's only significant 
when the minimum frequency is less than DEFAULT_TS_SEL * 2, so I'm kind 
of inclined to ignore it and maybe drop a comment in the code that this 
may be a potential problem.


If nothing is fundamentally broken with this, I'll repeat my profiling 
tests to see if anything has been gained.


Cheers,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin
diff --git a/src/backend/tsearch/Makefile b/src/backend/tsearch/Makefile
index e20a4a2..ba728eb 100644
--- a/src/backend/tsearch/Makefile
+++ b/src/backend/tsearch/Makefile
@@ -19,7 +19,7 @@ DICTFILES=synonym_sample.syn thesaurus_sample.ths 
hunspell_sample.affix \
 OBJS = ts_locale.o ts_parse.o wparser.o wparser_def.o dict.o \
dict_simple.o dict_synonym.o dict_thesaurus.o \
dict_ispell.o regis.o spell.o \
-   to_tsany.o ts_typanalyze.o ts_utils.o
+   to_tsany.o ts_typanalyze.o ts_selfuncs.o ts_utils.o
 
 include $(top_srcdir)/src/backend/common.mk
 
diff --git a/src/backend/tsearch/ts_selfuncs.c 
b/src/backend/tsearch/ts_selfuncs.c
new file mode 100644
index 000..0fadc60
--- /dev/null
+++ b/src/backend/tsearch/ts_selfuncs.c
@@ -0,0 +1,320 @@
+/*-
+ *
+ * ts_selfuncs.c
+ *   Selectivity functions for text search types.
+ *
+ * Portions Copyright (c) 1996-2008, PostgreSQL Global Development Group
+ *
+ *
+ * IDENTIFICATION
+ *   $PostgreSQL$
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "miscadmin.h" /* for check_stack_depth() */
+#include "utils/memutils.h"
+#include "utils/builtins.h"
+#include "utils/syscache.h"
+#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "catalog/pg_type.h"
+#include "catalog/pg_statistic.h"
+#include "nodes/nodes.h"
+#include "tsearch/ts_type.h"
+
+/* lookup table type for binary searching through MCELEMs */
+typedef struct
+{
+   Datum   element;
+   float4  frequency;
+} TextFreq;
+
+/* type of keys for bsearch()ing through an array of TextFreqs  */
+typedef struct
+{
+   char*lexeme;
+   int length;
+} LexemeKey;
+
+static int
+compare_lexeme_textfreq(const void *e1, const void *e2);
+
+static Selectivity
+tsquery_opr_selec(QueryItem *item, char *operand, TextFreq *lookup,
+ int length, float4 minfreq);
+static Selectivity
+mcelem_tsquery_selec(TSQuery query, Datum *mcelem, int nmcelem,
+  float4 *numbers, int 
nnumbers);
+static double
+tsquerysel(VariableStatData *vardata, Datum constval);
+
+
+/* TSQuery traversal function */
+static Selectivity
+tsquery_opr_selec(QueryItem *item, char *operand, TextFreq *lookup,
+ int length, float4 minfreq)
+{
+   LexemeKey   key;
+   TextFreq*searchres;
+   Selectivity s1, s2;
+
+   /* since this function recurses, it could be driven to stack overflow */
+   check_stack_depth();
+
+   if (item->type == QI_VAL)
+   {
+   QueryOperand *oper = (QueryOperand *) item;
+
+   /*
+* Prepare the key for bsearch().
+*/
+   key.lexeme = operand + oper->distance;
+   key.length = oper->length;
+
+   searchres = (TextFreq *) bsearch(&key, lookup, length,
+   
 sizeof(TextFreq), compare_lexeme_textfreq);
+
+   if (searchres)
+   {
+   /*
+* The element is in MCELEM. Return precise selectivity 
(or at
+* least as precise, as ANALYZE could find out).
+*/
+

Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-14 Thread Tom Lane

"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> wrote: 
>> I can't yet *prove* that I can get better estimates
>> with the added info, but if not, that just means I need to rethink what
>> to pass down exactly.
 
> I'll see if I can do some testing here to confirm plan improvements
> and check estimate accuracy.  This is only on the trunk, not any
> stable branches?  I assume that effort should wait until you have a
> candidate for the estimate improvements?

Yeah.  If you feel like it you can test what I just committed, but it's
definitely not where I expect to end up in another few days.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum and TOAST tables

2008-08-14 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> My first attempt at a fix, which was simply swapping relfilenode for the
> TOAST tables (and its indexes) after the data has been copied, does not
> work, apparently because the TOAST pointers have the toast table ID
> embedded.

Ouch.  Right.

> I wonder if we can get away with simply renaming the new toast table and
> index after the data has been copied.

Yeah, that seems like the best answer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum and TOAST tables

2008-08-14 Thread Alvaro Herrera

Tom Lane wrote:
> I wrote:
> > Hmm, we could probably fix that if we made the cluster operation swap
> > the physical storage of the two toast tables, rather than swapping the
> > tables altogether.  I agree it's not critical but it could be confusing.
> 
> On second thought, I think it *could* lead to a visible failure.
> Suppose the OID counter wraps around and the OID that had been used for
> the temporary CLUSTER table gets assigned to a new table.  If that table
> needs a toast table, it'll try to create one using the name that is
> already in use.  We have defenses against picking an OID that's in use,
> but none for toast table names.  So I think it's indeed worth fixing.

My first attempt at a fix, which was simply swapping relfilenode for the
TOAST tables (and its indexes) after the data has been copied, does not
work, apparently because the TOAST pointers have the toast table ID
embedded.  Since we're intending to keep the pg_class entry for the old
TOAST table, the OID in the toast links is no longer valid.

I'm not sure what can be done about this ...  Obviously we cannot just
swap the toast table before starting to move the data, because then we
cannot read the original data.

I wonder if we can get away with simply renaming the new toast table and
index after the data has been copied.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-14 Thread Kevin Grittner

>>> Tom Lane <[EMAIL PROTECTED]> wrote: 
> I can't yet *prove* that I can get better estimates
> with the added info, but if not, that just means I need to rethink
what
> to pass down exactly.

I'll see if I can do some testing here to confirm plan improvements
and check estimate accuracy.  This is only on the trunk, not any
stable branches?  I assume that effort should wait until you have a
candidate for the estimate improvements?

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] IN vs EXISTS equivalence

2008-08-14 Thread Simon Riggs

On Fri, 2008-08-08 at 16:23 -0400, Tom Lane wrote:

> NOT IN is a lot trickier,
> for the same reason that typically trips up novices who try to use it:
> if any row of the subselect produces a NULL comparison result, then it
> is impossible for the NOT IN to result in TRUE, which means that it
> does not function as a standard antijoin.  I thought about optimizing
> it only in the case where we can prove that the subselect outputs and
> the compared-to values are known NOT NULL (which in typical cases we
> could prove by looking for NOT NULL constraints on those table
> columns).  The trouble with this is that that's not a sufficient
> condition: you must also assume that the comparison operator involved
> never yields NULL for non-null inputs.  That might be okay for btree
> comparison functions but it's not a very comfy assumption in general;
> we certainly haven't got any explicit knowledge that any functions are
> guaranteed to act that way.  So this case might be worth doing later
> but I'm not feeling excited about it. We generally tell people to
> avoid NOT IN and I'm happy to keep on saying that.

Just found this comment, after reading what you said on other thread
about NOT IN.

NOT IN is a serious performance issue for most people. We simply can't
say to people "you were told not to".

If we can fix it easily for the majority of cases, we should. We can't
let the "it won't work in certain cases" reason prevent various
optimizations from going in. There are tons of places where we say "XXX
needs later improvement" in code comments. So lets do that here also. It
certainly wouldn't be the first optimization/feature that went into code
in a restricted way that didn't work for all cases: hash joins, ANALYZE,
partial indexes etc..

Anybody that is writing complex SQL with user defined operators knows
enough to re-write their queries correctly, so there will be almost no
negative effect from making the NOT IN optimisation a special case. And
if there is an effect, the people effected can fix the problem.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-14 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> OK, that sounds good. Are you also working on transforming NOT IN into
> different form? Or is that the same thing as (1)?

I'm not currently thinking about NOT IN.  It could be transformed to
an antijoin if we could prove that no nulls are involved, but that
seems less than trivial as I noted earlier.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-14 Thread Simon Riggs


On Thu, 2008-08-14 at 10:04 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Wed, 2008-08-13 at 23:12 -0400, Tom Lane wrote:
> >> We're just trying to provide better performance for certain common SQL
> >> idioms.
> 
> > Sounds good, but can you explain how this will help?
> 
> 1. Allowing optimization of EXISTS/NOT EXISTS as general-purpose joins.
> Up to now, the best plan you could hope for was the equivalent of a
> nestloop with inner indexscan, with the EXISTS subquery on the inside.
> While that's not necessarily bad, it could be pretty bad for a large
> outer table.  Now we'll have the option to consider merge and hash
> joins too.
> 
> 2. Allowing the planner to get better estimates of the result size of
> these special join types.  In the past we've had to kluge that, and
> the results weren't always good.  Part of what I'm doing (the unfinished
> part ;-)) is to make more information about join context available to
> selectivity estimation functions, which is something we've known we
> needed for awhile.  I can't yet *prove* that I can get better estimates
> with the added info, but if not, that just means I need to rethink what
> to pass down exactly.

OK, that sounds good. Are you also working on transforming NOT IN into
different form? Or is that the same thing as (1)?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: propose to include 3 new functions into intarray and intagg

2008-08-14 Thread Zdenek Kotala


Dmitry Koterov napsal(a):

Hello.

Here are these functions with detailed documentation:
http://en.dklab.ru/lib/dklab_postgresql_patch/


Added to next commit fest patch list.

Zdenek

--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] migrate data 6.5.3 -> 8.3.1

2008-08-14 Thread Kenneth Marshall

When upgrading, you use the pg_dump from the new version to
dump the old database. Then it can take care of incidental
changes during the process. I think that the mailing list
archives have articles on upgrading from v6.5. I do not think
that you can go straight from v6.5 to v8.3. You will almost
certainly need to use v7.2-7.4 as a intermediate step, maybe
v8.0 will work, check the list archives. Good luck.

Cheers,
Ken

On Thu, Aug 14, 2008 at 10:34:11AM -0400, David Blewett wrote:
> On Thu, Aug 14, 2008 at 9:58 AM, alexander lunyov <[EMAIL PROTECTED]> wrote:
> > Hello everybody.
> >
> > We have a dusty old server, FreeBSD 3.3-RELEASE, PostgreSQL 6.5.3.
> > I need to migrate four DBs from old server to new server (FreeBSD 6.2,
> > PostgreSQL 8.3.1).
> 
> Just an FYI: I advised Alexander to post here, because I thought some
> of the devs might have older pg installs/dump tools and might be able
> to give some advice.
> 
> David Blewett
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] migrate data 6.5.3 -> 8.3.1

2008-08-14 Thread David Blewett

On Thu, Aug 14, 2008 at 9:58 AM, alexander lunyov <[EMAIL PROTECTED]> wrote:
> Hello everybody.
>
> We have a dusty old server, FreeBSD 3.3-RELEASE, PostgreSQL 6.5.3.
> I need to migrate four DBs from old server to new server (FreeBSD 6.2,
> PostgreSQL 8.3.1).

Just an FYI: I advised Alexander to post here, because I thought some
of the devs might have older pg installs/dump tools and might be able
to give some advice.

David Blewett

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] migrate data 6.5.3 -> 8.3.1

2008-08-14 Thread alexander lunyov


Hello everybody.

We have a dusty old server, FreeBSD 3.3-RELEASE, PostgreSQL 6.5.3.
I need to migrate four DBs from old server to new server (FreeBSD 6.2, 
PostgreSQL 8.3.1).


I've tried to do pg_dump on old server, transfer it to new one and do 
`psql -f dumpfile dbname`. Well, no surprise, then i see a lot of 
warning and errors. Here they are (only unique error messages, they're 
duplicated actually):


psql:dump:389: WARNING:  aggregate attribute "sfunc2" not recognized
psql:dump:389: WARNING:  aggregate attribute "stype2" not recognized
psql:dump:389: WARNING:  aggregate attribute "initcond2" not recognized
psql:dump:389: ERROR:  function int84div(bigint) does not exist
psql:dump:390: ERROR:  function int4div(integer) does not exist
psql:dump:391: ERROR:  function int2div(smallint) does not exist
psql:dump:392: ERROR:  function float4div(real) does not exist
psql:dump:393: ERROR:  function float8div(double precision) does not exist
psql:dump:394: ERROR:  function cash_div_flt8(money) does not exist
psql:dump:395: ERROR:  type "timespan" does not exist
psql:dump:396: ERROR:  function numeric_div(numeric) does not exist
psql:dump:410: ERROR:  function int4larger(abstime, abstime) does not exist
psql:dump:422: ERROR:  function int4smaller(abstime, abstime) does not exist
psql:dump:413: ERROR:  type "datetime" does not exist
psql:dump:429: ERROR:  aggregate stype must be specified


How can i safely do this migration? Dumps of these four DBs is about 
250Megs in sum.


--
alexander lunyov

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Join Removal/ Vertical Partitioning

2008-08-14 Thread Simon Riggs


On Thu, 2008-08-14 at 09:27 -0400, Robert Haas wrote:
> I'm guessing it's this... looks pretty interesting even if not.
> 
> http://optimizermagic.blogspot.com/2008/06/why-are-some-of-tables-in-my-query.html

Yes, thanks for copying it in.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-14 Thread Tom Lane

Simon Riggs <[EMAIL PROTECTED]> writes:
> On Wed, 2008-08-13 at 23:12 -0400, Tom Lane wrote:
>> We're just trying to provide better performance for certain common SQL
>> idioms.

> Sounds good, but can you explain how this will help?

1. Allowing optimization of EXISTS/NOT EXISTS as general-purpose joins.
Up to now, the best plan you could hope for was the equivalent of a
nestloop with inner indexscan, with the EXISTS subquery on the inside.
While that's not necessarily bad, it could be pretty bad for a large
outer table.  Now we'll have the option to consider merge and hash
joins too.

2. Allowing the planner to get better estimates of the result size of
these special join types.  In the past we've had to kluge that, and
the results weren't always good.  Part of what I'm doing (the unfinished
part ;-)) is to make more information about join context available to
selectivity estimation functions, which is something we've known we
needed for awhile.  I can't yet *prove* that I can get better estimates
with the added info, but if not, that just means I need to rethink what
to pass down exactly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Join Removal/ Vertical Partitioning

2008-08-14 Thread Robert Haas

I'm guessing it's this... looks pretty interesting even if not.

http://optimizermagic.blogspot.com/2008/06/why-are-some-of-tables-in-my-query.html

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compilig libpq with borland 5.5

2008-08-14 Thread Merlin Moncure

On Thu, Aug 14, 2008 at 9:02 AM, claudio lezcano <[EMAIL PROTECTED]> wrote:
> Thank you so much for the comments, he managed to advance the process of
> reconfiguring the directory compilation include Borland, however, has
> emerged another drawback, the problem has drawn up the following message:
>
> Error: Unresolved external '_pgwin32_safestat' referenced from C:\SOURCE
> POSTGRES 8.3\SRC\INTERFACES\LIBPQ\RELEASE\BLIBPQ.LIB|fe-connect
>
> Obs.: Static or dynamic libraries generated by the mvs can not be used to
> compile sources with bcc32, but works for MinGW and others, while trying to
> compile the following problem arises with bcc32:

You are correct about static libraries.  If you have a 'COFF' (usually
microsoft in this context) static library, the only tool I know of to
get it working with the Borland stack is the Digital Mars COFF->OMF
converter, which works, and is the only way to go if you have a static
library which is not a stub for a dll (fully static) and don't
have/can't compile the source.

Dynamic libraries, however, can be shared between the compilers.  You
can either load all the symbols with LoadLibrary, etc, or generate a
static library.   Borland's implib.exe utility generates a static
library from any dll so you don't have to use LoadLibary, .etc, just
the header for the symbols from the library which you want to use.

Format problems between the compilers are one of the many reasons why
static libraries (except for statically loaded .dll) have fallen out
of favor...you hardly ever see them anymore.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compilig libpq with borland 5.5

2008-08-14 Thread claudio lezcano

Thank you so much for the comments, he managed to advance the process of
reconfiguring the directory compilation include Borland, however, has
emerged another drawback, the problem has drawn up the following message:

Error: Unresolved external '_pgwin32_safestat' referenced from C:\SOURCE
POSTGRES 8.3\SRC\INTERFACES\LIBPQ\RELEASE\BLIBPQ.LIB|fe-connect

Obs.: Static or dynamic libraries generated by the mvs can not be used to
compile sources with bcc32, but works for MinGW and others, while trying to
compile the following problem arises with bcc32:

Error: 'C: \ examples \ LIBPQ.LIB' contains invalid FMO record, type 0x21
(possibly COFF)
** error 2 ** deleting ".\Release\blibpq.dll"

Thanks in advance
Claudio Lezcano

Re: [HACKERS] benchmark farm

2008-08-14 Thread Andrew Dunstan




Michael Holzman wrote:

On Wed, Aug 13, 2008 at 7:09 PM, Jaime Casanova wrote:
  

any move in this?



I did some changes to pgbench in February and sent them to Andrew. No
reaction has been got so far.

  


Oops. This completely got by me. I'll try to take a look at it RSN.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parsing of pg_hba.conf and authentication inconsistencies

2008-08-14 Thread Magnus Hagander

Magnus Hagander wrote:

[about the ability to use different maps for ident auth, gss and krb
auth for example]

 It wouldn't be very easy/clean to do that w/o breaking the existing
 structure of pg_ident though, which makes me feel like using seperate
 files is probably the way to go.

Actually, I may have to take that back. We already have support for
multiple maps in the ident file, I'm not really sure anymore of the case
where this wouldn't be enough :-)

That said, I still think we want to parse pg_hba in the postmaster,
because it allows us to not load known broken files, and show errors
when you actually change the file etc. ;-)

I did code up a POC patch for it, and it's not particularly hard to do.
Mostly it's just moving the codepath from the backend to the postmaster.
I'll clean it up a but and post it, just so ppl can see what it looks
like...

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Jan Urbański


Heikki Linnakangas wrote:

Jan Urbański wrote:

So right now the idea is to:
 (1) pre-sort STATISTIC_KIND_MCELEM values
 (2) build an array of pointers to detoasted values in tssel()
 (3) use binary search when looking for MCELEMs during tsquery analysis


Sounds like a plan. In (2), it's even better to detoast the values 
lazily. For a typical one-word tsquery, the binary search will only look 
at a small portion of the elements.


Hm, how can I do that? Toast is still a bit black magic to me... Do you 
mean I should stick to having Datums in TextFreq? And use DatumGetTextP 
in bsearch() (assuming I'll get rid of qsort())? I wanted to avoid that, 
so I won't detoast the same value multiple times, but it's true: a 
binary search won't touch most elements.


Another thing is, how significant is the time spent in tssel() anyway, 
compared to actually running the query? You ran pgbench on EXPLAIN, 
which is good to see where in tssel() the time is spent, but if the time 
spent in tssel() is say 1% of the total execution time, there's no point 
optimizing it further.


Changed to the pgbench script to
select * from manual where tsvector @@ to_tsquery('foo');
and the parameters to
pgbench -n -f tssel-bench.sql -t 1000 postgres

and got

number of clients: 1
number of transactions per client: 1000
number of transactions actually processed: 1000/1000
tps = 12.238282 (including connections establishing)
tps = 12.238606 (excluding connections establishing)

samples  %symbol name
174731   31.6200  pglz_decompress
8810515.9438  tsvectorout
17280 3.1271  pg_mblen
13623 2.4653  AllocSetAlloc
13059 2.3632  hash_search_with_hash_value
10845 1.9626  pg_utf_mblen
10335 1.8703  internal_text_pattern_compare
9196  1.6641  index_getnext
9102  1.6471  bttext_pattern_cmp
8075  1.4613  pg_detoast_datum_packed
7437  1.3458  LWLockAcquire
7066  1.2787  hash_any
6811  1.2325  AllocSetFree
6623  1.1985  pg_qsort
6439  1.1652  LWLockRelease
5793  1.0483  DirectFunctionCall2
5322  0.9631  _bt_compare
4664  0.8440  tsCompareString
4636  0.8389  .plt
4539  0.8214  compare_two_textfreqs

But I think I'll go with pre-sorting anyway, it feels cleaner and neater.
--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Join Removal/ Vertical Partitioning

2008-08-14 Thread Gregory Stark

"Simon Riggs" <[EMAIL PROTECTED]> writes:

> On Thu, 2008-06-26 at 13:42 -0400, Tom Lane wrote:
>> Simon Riggs <[EMAIL PROTECTED]> writes:
>> > We can check for removal of a rel by...
>
> OT comment: I just found a blog about Oracle's optimizermagic, which is
> quite interesting. I notice there is a blog there about join removal,
> posted about 12 hours later than my original post. Seems to validate the
> theory anyway. Our posts have a wider audience than may be apparent :-)

Well turnabout's fair play... what's the URL?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Gregory Stark

Jan Urbański <[EMAIL PROTECTED]> writes:

> Heikki Linnakangas wrote:
>> Speaking of which, a lot of time seems to be spent on detoasting. I'd like to
>> understand that a better. Where is the detoasting coming from?
>
> Hmm, maybe bttext_pattern_cmp does some detoasting? It calls
> PG_GETARG_TEXT_PP(), which in turn calls pg_detoast_datum_packed(). Oh, and
> also I think that compare_lexeme_textfreq() uses DatumGetTextP() and that also
> does detoasting. 

DatumGetTextP() will detoast packed data (ie, 1-byte length headers) whereas
DatumGetTextPP will only detoast compressed or externally stored data. I
suspect you're seeing the former.

> The root of all evil could by keeping a Datum in the TextFreq array, and not
> a "text *", which is something you pointed out earlier and I apparently
> didn't understand.

Well it doesn't really matter which type. If you store Datums which are
already detoasted then the DatumGetTextP and DatumGetTextPP will just be noops
anyways. If you store packed data (from DatumGetTextPP) then it's probably
safer to store it as Datums so if you need to pass it to any functions which
don't expect packed data they'll untoast it.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: patch to create explicit support for semi and anti joins

2008-08-14 Thread Simon Riggs

On Wed, 2008-08-13 at 23:12 -0400, Tom Lane wrote:

> We're just trying to provide better performance for certain common SQL
> idioms.

Sounds good, but can you explain how this will help? Not questioning it,
just after more information about it.

I'm half way through join removal patch, so this work might extend the
join elimination to semi/anti joins also (hopefully), or it might
(hopefully not) prevent the join elimination altogether. I'll let you
know how I get on.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Heikki Linnakangas


Jan Urbański wrote:

So right now the idea is to:
 (1) pre-sort STATISTIC_KIND_MCELEM values
 (2) build an array of pointers to detoasted values in tssel()
 (3) use binary search when looking for MCELEMs during tsquery analysis


Sounds like a plan. In (2), it's even better to detoast the values 
lazily. For a typical one-word tsquery, the binary search will only look 
at a small portion of the elements.


Another thing is, how significant is the time spent in tssel() anyway, 
compared to actually running the query? You ran pgbench on EXPLAIN, 
which is good to see where in tssel() the time is spent, but if the time 
spent in tssel() is say 1% of the total execution time, there's no point 
optimizing it further.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Join Removal/ Vertical Partitioning

2008-08-14 Thread Simon Riggs

On Thu, 2008-06-26 at 13:42 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > We can check for removal of a rel by...

OT comment: I just found a blog about Oracle's optimizermagic, which is
quite interesting. I notice there is a blog there about join removal,
posted about 12 hours later than my original post. Seems to validate the
theory anyway. Our posts have a wider audience than may be apparent :-)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal sql: labeled function params

2008-08-14 Thread Pavel Stehule

Hello

I propose enhance current syntax that allows to specify label for any
function parameter:

fcename(expr [as label], ...)
fcename(colname, ...)

I would to allow  same behave of custom functions like xmlforest function:
postgres=# select xmlforest(a) from foo;
 xmlforest
---
 10
(1 row)

postgres=# select xmlforest(a as b) from foo;
 xmlforest
---
 10
(1 row)

Actually I am not sure what is best way for PL languages for acces to
these info. Using some system variables needed new column in pg_proc,
because collecting these needs some time and in 99% cases we don't
need it. So I prefere some system function that returns labels for
outer function call. Like

-- test
create function getlabels() returns varchar[] as $$select '{name,
age}'::varchar[]$$ language sql immutable;

create or replace function json(variadic varchar[])
returns varchar as $$
select '[' || array_to_string(
 array(
select (getlabels())[i]|| ':' || $1[i]
   from generate_subscripts($1,1) g(i))
   ,',') || ']'
$$ language sql immutable strict;

postgres=# select json('Zdenek' as name,'30' as age);
 json
--
 [name:Zdenek,age:30]
(1 row)

postgres=# select json(name, age) from person;
 json
--
 [name:Zdenek,age:30]
(1 row)

There are two possibilities
  a) collect labels in parse time
  b) collect labels in executor time

@a needs info in pg_proc, but it is simpler, @b is little bit
difficult, but doesn't need any changes in system catalog. I thinking
about b now.

Necessary changes:
=
labels are searched in parse tree fcinfo->flinfo->fn_expr. I need
insert label into parse tree, so I it needs special node
labeled_param, For getting column reference I need to put current
exprstate to fcinfo.  Function getlabels() should take code from
ExecEvalVar function.

Any notes, ideas?

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Jan Urbański


Heikki Linnakangas wrote:

Jan Urbański wrote:
Not good... Shall I try sorting pg_statistics arrays on text values 
instead of frequencies?


Yeah, I'd go with that. If you only do it for the new 
STATISTIC_KIND_MCV_ELEMENT statistics, you shouldn't need to change any 
other code.


OK, will do.

BTW: I just noticed some text_to_cstring calls, they came from 
elog(DEBUG1)s that I have in my code. But they couldn't have skewn the 
results much, could they?


Well, text_to_cstring was consuming 1.1% of the CPU time on its own, and 
presumably some of the AllocSetAlloc overhead is attributable to that as 
well. And perhaps some of the detoasting as well.


Speaking of which, a lot of time seems to be spent on detoasting. I'd 
like to understand that a better. Where is the detoasting coming from?


Hmm, maybe bttext_pattern_cmp does some detoasting? It calls 
PG_GETARG_TEXT_PP(), which in turn calls pg_detoast_datum_packed(). Oh, 
and also I think that compare_lexeme_textfreq() uses DatumGetTextP() and 
that also does detoasting. The root of all evil could by keeping a Datum 
in the TextFreq array, and not a "text *", which is something you 
pointed out earlier and I apparently didn't understand.


So right now the idea is to:
 (1) pre-sort STATISTIC_KIND_MCELEM values
 (2) build an array of pointers to detoasted values in tssel()
 (3) use binary search when looking for MCELEMs during tsquery analysis

Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] autovacuum: use case for indenpedent TOAST table autovac settings

2008-08-14 Thread Simon Riggs


On Wed, 2008-08-13 at 21:30 -0400, Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> It seems like we'll want to do it somehow.  Perhaps the cleanest way is
> >> to incorporate toast-table settings in the reloptions of the parent
> >> table.  Otherwise dump/reload is gonna be a mess.
> 
> > My question is whether there is interest in actually having support for
> > this, or should we just inherit the settings from the main table.  My
> > gut feeling is that this may be needed in some cases, but perhaps I'm
> > overengineering the thing.
> 
> It seems reasonable to inherit the parent's settings by default, in any
> case.  So you could do that now and then extend the feature later if
> there's real demand.

Yeh, I can't really see a reason why you'd want to treat toast tables
differently with regard to autovacuuming. It's one more setting to get
wrong, so no thanks.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gsoc, oprrest function for text search take 2

2008-08-14 Thread Heikki Linnakangas


Jan Urbański wrote:
Not good... Shall I try sorting pg_statistics arrays on text values 
instead of frequencies?


Yeah, I'd go with that. If you only do it for the new 
STATISTIC_KIND_MCV_ELEMENT statistics, you shouldn't need to change any 
other code.


Hmm. There has been discussion on raising default_statistic_target, and 
one reason why we've been afraid to do so has been that it increases the 
cost of planning (there's some O(n^2) algorithms in there). Pre-sorting 
the STATISTIC_KIND_MCV array as well, and replacing the linear searches 
with binary searches would alleviate that, which would be nice.


BTW: I just noticed some text_to_cstring calls, they came from 
elog(DEBUG1)s that I have in my code. But they couldn't have skewn the 
results much, could they?


Well, text_to_cstring was consuming 1.1% of the CPU time on its own, and 
presumably some of the AllocSetAlloc overhead is attributable to that as 
well. And perhaps some of the detoasting as well.


Speaking of which, a lot of time seems to be spent on detoasting. I'd 
like to understand that a better. Where is the detoasting coming from?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

43 matches

Mail list logo