date:20101118

[HACKERS] Fwd: patch: format function - fixed oid

2010-11-18 Thread Pavel Stehule

-- Forwarded message --
From: Pavel Stehule 
Date: 2010/11/18
Subject: Re: patch: format function, next generation
To: Jeff Janes 
Kopie: pgsql-hackers-ow...@postgresql.org


Hello

somebody takes my oid :)

updated patch is in attachment

Regards

Pavel Stehule

2010/11/18 Jeff Janes :
>>
>> Hello
>>
>> I reworked a implementation of format function. This respects last 
>> discussion:
>>
>> * support a positional placeholders - syntax %99$x,
>> * support a tags: %s, I, L,
>> * enhanced documentation,
>> * enhanced reggress tests
>>
>> Regards
>>
>> Pavel Stehule
>
> Dear Pavel,
>
> Your patch no longer passes "make check" against git HEAD.
>
> It looks like the backend sync commit has claimed OID 3063
>
> creating directory
> /home/jjanes/postgres/git/src/test/regress/./tmp_check/data ... ok
> creating subdirectories ... ok
> selecting default max_connections ... 100
> selecting default shared_buffers ... 32MB
> creating configuration files ... ok
> creating template1 database in
> /home/jjanes/postgres/git/src/test/regress/./tmp_check/data/base/1 ...
> FATAL:  could not create unique index "pg_proc_oid_index"
> DETAIL:  Key (oid)=(3063) is duplicated.
> child process exited with exit code 1
> initdb: data directory
> "/home/jjanes/postgres/git/src/test/regress/./tmp_check/data" not
> removed at user's request
>
> Cheers,
>
> Jeff
>
*** ./doc/src/sgml/func.sgml.orig	2010-11-18 11:34:18.183045054 +0100
--- ./doc/src/sgml/func.sgml	2010-11-18 11:35:59.799401973 +0100
***
*** 1272,1277 
--- 1272,1280 
  encode
 
 
+ format
+
+
  initcap
 
 
***
*** 1497,1502 
--- 1500,1524 
   
  

+
+ format(formatstr text 
+ [, str "any" [, ...] ])
+
+text
+
+  This functions can be used to create a formated string or message. There are allowed
+  three types of tags: %s as string, %I as SQL identifiers and %L as SQL literals. Attention:
+  result for %I and %L must not be same as result of quote_ident and 
+  quote_literal functions, because this function doesn't try to coerce 
+  parameters to text type and directly use a type's output functions. The placeholder
+  can be related to some explicit parameter with using a optional n$ specification inside format.
+  See also .
+
+format('Hello %s, %1$s', 'World')
+Hello World, World
+  
+ 
+   
 initcap(string)
 text
 
*** ./doc/src/sgml/plpgsql.sgml.orig	2010-11-18 11:34:35.608335452 +0100
--- ./doc/src/sgml/plpgsql.sgml	2010-11-18 11:35:59.817404340 +0100
***
*** 1250,1255 
--- 1250,1273 
   must use quote_literal,
   quote_nullable, or quote_ident, as appropriate.
  
+ 
+ 
+  Building a string with dynamic SQL statement can be simplified 
+  with using a format function (see ):
+ 
+ EXECUTE format('UPDATE tbl SET %I = %L WHERE key = %L', colname, newvalue, keyvalue);
+ 
+  A use the format format can be together with 
+  USING clause:
+ 
+ EXECUTE format('UPDATE tbl SET %I = $1 WHERE key = $2', colname)
+USING newvalue, keyvalue;
+ 
+  This form is little bit more efective, because a parameters
+  newvalue and keyvalue must not be
+  converted to text.
+ 
 
  
  
*** ./src/backend/utils/adt/varlena.c.orig	2010-11-18 11:34:49.891212809 +0100
--- ./src/backend/utils/adt/varlena.c	2010-11-18 11:35:59.823405128 +0100
***
*** 3702,3704 
--- 3702,3939 
  
  	PG_RETURN_TEXT_P(result);
  }
+ 
+ typedef struct {
+ 	char	field_type;
+ 	int32	position;
+ 	bool	is_positional;
+ } placeholder_desc;
+ 
+ #define INVALID_PARAMETER_ERROR(d)	ereport(ERROR, \
+ 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE), \
+ 			 errmsg("invalid formatting string for function format"), \
+ 			 errdetail(d)))
+ 
+ static const char *
+ parse_format(placeholder_desc *pd, const char *start_ptr, 
+ 			const char *end_ptr, int nparams)
+ {
+ 	bool	is_valid = false;
+ 
+ 	pd->is_positional = false;
+ 
+ 	while (start_ptr <= end_ptr)
+ 	{
+ 		switch (*start_ptr)
+ 		{
+ 			case 's':
+ pd->field_type = 's';
+ is_valid = true;
+ break;
+ 
+ 			case 'I':
+ pd->field_type = 'I';
+ is_valid = true;
+ break;
+ 
+ 			case 'L':
+ pd->field_type = 'L';
+ is_valid = true;
+ break;
+ 
+ 			case '0': case '1': case '2': case '3': case '4':
+ 			case '5': case '6': case '7': case '8': case '9':
+ if (pd->is_positional)
+ 	INVALID_PARAMETER_ERROR("positional parameter is defined already");
+ 
+ pd->position = *start_ptr - '0';
+ while (start_ptr < end_ptr && isdigit(start_ptr[1]))
+ {
+ 	pd->position = pd->position * 10 + *++start_ptr - '0';
+ 	/* don't allow to integer overflow */
+ 	if (pd->position >= nparams)
+ 		INVALID_PARAMETER_ERROR("placeholder is ou

Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)

2010-11-18 Thread David Fetter

On Fri, Nov 19, 2010 at 11:40:05AM +0900, Itagaki Takahiro wrote:
> On Fri, Nov 19, 2010 at 08:33, David Fetter  wrote:
> > In order to get WITH ORDINALITY, would it be better to change
> > gram.y to account for both WITH ORDINALITY and without, or just
> > for the WITH ORDINALITY case?
> 
> We probably need to change gram.y and make UNNEST to be
> COL_NAME_KEYWORD.  UNNEST (without ORDINALITY) will call the
> existing unnest() function, and UNNEST() WITH ORDINALITY will call
> unnest_with_ordinality().

Thanks for sketching that out :)

> BTW, what will we return for arrays with 2 or more dimensions?

At the moment, per the SQL standard, UNNEST without the WITH
ORDINALITY clause flattens all dimensions.

SELECT * FROM UNNEST(ARRAY[[1,2],[3,4]]);
 unnest 

  1
  2
  3
  4
(4 rows)

Unless we want to do something super wacky and contrary to the SQL
standard, UNNEST(array) WITH ORDINALITY should do the same.

> There are no confusion in your two arguments version:
> > UNNEST(anyarray, number_of_dimensions_to_unnest)
> but we will also support one argument version. Array indexes will
> be composite numbers in the cases. The possible design would be just
> return sequential serial numbers of the values -- the following two
> queries return the same results:
> 
> - SELECT i, v FROM UNNEST($1) WITH ORDINALITY AS t(v, i)
> - SELECT row_number() OVER () AS i, v FROM UNNEST($1) AS t(v)

Yes, that's what the standard says.  Possible less-than-total
unrolling schemes include:

- Flatten specified number of initial dimensions into one list, e.g.
  turn UNNEST(array_3d, 2) into SETOF(array_1d) with one column of
  ORDINALITY

- Flatten similarly, but have an ORDINALITY column for each flattened
  dimension.

- More exotic schemes, such as UNNEST(array_3d, [1,3]), with either of
  the two methods above.

And of course the all-important:

- Other possibilities I haven't thought of :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Tom Lane :
> Pavel Stehule  writes:
>> 2010/11/18 Alvaro Herrera :
>>> I fail to see how this supports the FOR-IN-array development though.  It
>>> will just be another unused construct for most people, no?
>
>> maybe I don't understand well, but patch FOR-IN-ARRAY has a documentation
>
> UNNEST is documented too.  Adding still more features doesn't really
> improve matters for people who haven't memorized the documentation;
> it only makes it even harder for them to find out what they should be
> using.  (More features != better)
>

yes, but less user feature doesn't mean less code.  Mainly in little
bit specific environment like plpgsql.

> To my mind, the complaint about subscripting being slow suggests that we
> ought to fix subscripting, not introduce a nonstandard feature that will
> make certain use-cases faster if people rewrite their code to use it.
>
> I think it would probably not be terribly hard to arrange forcible
> detoasting of an array variable's value the first time it gets
> subscripted, for instance.  Of course that only fixes some use-cases;
> but it would help, and it helps without requiring people to change their
> code.
>

This is just one half of problem and isn't simple. Second half is
"array_seek" - So any access with subscripts means seq reading of
array's data. Please, look on this part. I am thinking, so this is
more important, than anything what we discussed before. For fast
access there is necessary to call a deconstruct_array function. Then
you can access to subscripts quickly.  Actually we have not a control
for access to items in array, when subscript is used in expression
(inside PL). So it is very difficult to accelerate speed in area -
probably it means a subscript expr should be evaluated individually.

A deconstruct_area is relative expensive function, so you have to have
a information about a using of array. Without it, and for smaller
arrays, the optimization can be bad. There isn't any backend
infrastructure for this decision now.

I did a profiling

first example: FOR IN ARRAY

samples  %symbol name
336  20.6642  exec_eval_expr
269  16.5437  plpgsql_param_fetch
229  14.0836  exec_stmts
225  13.8376  exec_eval_datum
118   7.2571  exec_assign_value
915.5966  exec_eval_cleanup.clone.10
885.4121  setup_param_list
724.4280  __i686.get_pc_thunk.bx
653.9975  exec_eval_boolean
472.8905  exec_simple_cast_value
432.6445  free_var.clone.2
281.7220  exec_cast_value

samples  %image name   symbol name
1064 16.1188  postgres pglz_decompress
410   6.2112  postgres AllocSetAlloc
353   5.3477  postgres MemoryContextAllocZero
293   4.4387  postgres GetSnapshotData
290   4.3933  postgres AllocSetFree
281   4.2569  postgres ExecEvalParamExtern
223   3.3783  postgres ExecMakeFunctionResultNoSets
220   3.3328  postgres AllocSetReset
212   3.2116  postgres UTF8_MatchText
210   3.1813  postgres LWLockAcquire
195   2.9541  postgres AllocSetCheck
195   2.9541  postgres LWLockRelease
172   2.6057  postgres pfree
163   2.4693  postgres CopySnapshot
162   2.4542  postgres list_member_ptr
144   2.1815  postgres RevalidateCachedPlan
133   2.0148  postgres PushActiveSnapshot
121   1.8331  postgres PopActiveSnapshot
121   1.8331  postgres bms_is_member
118   1.7876  postgres MemoryContextAlloc
108   1.6361  postgres textlike
105   1.5907  postgres AcquireExecutorLocks
791.1968  postgres TransactionIdPrecedes
761.1513  postgres pgstat_end_function_usage
751.1362  postgres pgstat_init_function_usage
721.0907  postgres check_list_invariants

sample01 - FOR i IN array_lowe()..array_upper() for t1000

Profiling through timer interrupt
samples  %symbol name
1039 29.4084  exec_stmts
723  20.4642  exec_eval_expr
587  16.6148  exec_eval_datum
408  11.5483  plpgsql_param_fetch
176   4.9816  exec_eval_cleanup.clone.10
167   4.7269  setup_param_list
159   4.5004  exec_eval_boolean
128   3.6230  __i686.get_pc_thunk.bx
661.8681  exec_simple_cast_value

samples  %image name   symbol name
312604   84.1141  postgres pglz_decompress
4800  1.2916  postgres hash_search_with_hash_value
4799  1.2913  postgres array_seek.clone.0
2935  0.7897  postgres LWLockAcquire
2399  0.6455  postgres _bt_compare
2219  0.5971  p

Re: [HACKERS] Improving prep_buildtree used in VPATH builds

2010-11-18 Thread Greg Smith


Gurjeet Singh wrote:
Seems like it would improve performance in general, but more so under 
load conditions when you actually need it. I am not sure if -depth 
option is supported by all incarnations of 'find'.


Given the way directory writes are cached by the filesystem, I'm not 
sure why the load at the time matters so much.  If what you mean by that 
is you're low on memory, that would make more sense to me.


Anyway, "-depth" is in POSIX as of 2001.  It seems to be in all the 
major SysV UNIX variants going back further then that, and of course 
it's in GNU find.  But it looks like BSD derived systems from before 
that POSIX standard originally called this "-d" instead.  So there's 
some potential for this to not work on older systems; it works fine on 
my test FreeBSD 7.2 system.  Maybe someone who has access to some 
ancient BSD-ish system can try this out?  The simplest test case similar 
to what the patch adds seems to be if this runs, returning 
subdirectories in depth-first order before their parent:


$ find / -depth -type d -print

If that fails somewhere, it may turn out to require another configure 
check just to determine whether you can use this configuration time 
optimization.  That's certainly possible to add to the patch if it got 
committed and turns out to break one of the buildfarm members.  It seems 
to me like this whole thing may be a bit more work than it's worth, 
given this is a fairly small and difficult to reproduce speedup in only 
one stage of the build process.  I'd think that if configure takes 
longer than it has to because the system is heavily loaded, the amount 
compilation time is going to suffer from that would always dwarf this 
component of total build time.  But if this was slow enough at some 
point to motivate you to write a patch for it, maybe that assumption is 
wrong.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-18 Thread Tom Lane

Robert Haas  writes:
> I'm all in favor of having some memory ordering primitives so that we
> can try to implement better algorithms, but if we use it here it
> amounts to a fairly significant escalation in the minimum requirements
> to compile PG (which is bad) rather than just a performance
> optimization (which is good).

I don't believe there would be any escalation in compilation
requirements: we already have the ability to invoke stronger primitives
than these.  What is needed is research to find out what the primitives
are called, on platforms where we aren't relying on direct asm access.

My feeling is it's time to bite the bullet and do that work.  We
shouldn't cripple the latch operations because of laziness at the
outset.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] duplicate connection failure messages

2010-11-18 Thread Bruce Momjian

Alvaro Herrera wrote:
> Excerpts from Bruce Momjian's message of mi?? nov 17 13:04:46 -0300 2010:
> 
> > OK, I doubt we want to add complexity to improve this, so I see our
> > options as:
> > 
> > o  ignore the problem
> > o  display IPv4/IPv6 labels
> > o  display only an IPv6 label
> > o  something else
> 
> I think we should use inet_ntop where available to print the address.

FYI, I see we use inet_ntoa() in getaddrinfo.c.  That is not thread-safe
and I should replace it with inet_ntop().

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] duplicate connection failure messages

2010-11-18 Thread Bruce Momjian

Alvaro Herrera wrote:
> Excerpts from Bruce Momjian's message of mi?? nov 17 13:04:46 -0300 2010:
> 
> > OK, I doubt we want to add complexity to improve this, so I see our
> > options as:
> > 
> > o  ignore the problem
> > o  display IPv4/IPv6 labels
> > o  display only an IPv6 label
> > o  something else
> 
> I think we should use inet_ntop where available to print the address.

Good idea because inet_ntop() is thread-safe.  Does that work on IPv6? 
You indicated that inet_ntoa() does not.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Changes to Linux OOM killer in 2.6.36

2010-11-18 Thread Greg Smith

Last month's new Linux kernel 2.6.36 includes a rewrite of the out of 
memory killer: 


http://lwn.net/Articles/391222/
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a63d83f427fbce97a6cea0db2e64b0eb8435cd10

The new "badness" method totals the task's RSS and swap as a percentage 
of RAM, where the old one scored starting with the total memory used by 
the process.  I *think* that this is an improvement for PostgreSQL, 
based on the sort of data I see with:


ps -o pid,rss,size,vsize,args -C postgres

But I haven't tested with one of the new kernels yet to be sure.  
Something to look at next time I get in that bleeding edge kernel kind 
of mood.


One thing that's definitely changed is the interface used to control 
turning off the OOM killer.  There's a backward compatibility 
translation right now that maps the current "-17" bit mask value the 
PostgreSQL code sends to /proc//oom_adj into the new units scale.  
However, oom_adj is deprecated, scheduled for removal in August 2010:  
http://www.mjmwired.net/kernel/Documentation/feature-removal-schedule.txt


So eventually, if the OOM disabling code is still necessary in 
PostgreSQL, it will need to do this sort of thing instead:


echo -1000 > /proc//oom_score_adj

I've seen kernel stuff get deprecated before the timeline before for 
code related reasons (when the compatibility bits were judged too 
obtrusive to keep around anymore), but since this translation bit is 
only a few lines of code I wouldn't expect that to happen here.


I don't think it's worth doing anything to the database code until tests 
on the newer kernel confirm whether this whole thing is even necessary 
anymore.  Wanted to pass along the info while I was staring at it 
though.  Thanks to Daniel Farina for pointing this out.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)

2010-11-18 Thread Itagaki Takahiro

On Fri, Nov 19, 2010 at 08:33, David Fetter  wrote:
> In order to get WITH ORDINALITY, would it be better to change gram.y
> to account for both WITH ORDINALITY and without, or just for the WITH
> ORDINALITY case?

We probably need to change gram.y and make UNNEST to be COL_NAME_KEYWORD.
UNNEST (without ORDINALITY) will call the existing unnest() function,
and UNNEST() WITH ORDINALITY will call unnest_with_ordinality().

BTW, what will we return for arrays with 2 or more dimensions?
There are no confusion in your two arguments version:
> UNNEST(anyarray, number_of_dimensions_to_unnest)
but we will also support one argument version. Array indexes will
be composite numbers in the cases. The possible design would be just
return sequential serial numbers of the values -- the following two
queries return the same results:

- SELECT i, v FROM UNNEST($1) WITH ORDINALITY AS t(v, i)
- SELECT row_number() OVER () AS i, v FROM UNNEST($1) AS t(v)

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-18 Thread Robert Haas

On Thu, Nov 18, 2010 at 5:17 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Nov 15, 2010 at 11:12 AM, Tom Lane  wrote:
>>> Hmm ... I just remembered the reason why we didn't use a spinlock in
>>> these functions already.  Namely, that it's unsafe for a signal handler
>>> to try to acquire a spinlock that the interrupted code might be holding.
>
>> The signal handler just checks a process-local, volatile variable
>> called "waiting" (which should be fine) and then sends a self-pipe
>> byte.  It deliberately *doesn't* take a spinlock.
>
> I'm not talking about latch_sigusr1_handler.  I'm talking about the
> several already-existing signal handlers that call SetLatch.  Now maybe
> it's possible to prove that none of those can fire in a process that's
> mucking with the same latch in-line, but that sounds fragile as heck;
> and anyway what of future usage?  Given that precedent, somebody is
> going to write something unsafe at some point, and it'll fail only often
> enough to be seen in the field but not in our testing.

Oh, I get it.  You're right.  We can't possibly assume that that we're
not trying to set the latch for our own process, because that's the
whole point of having the self-pipe code in the first place.

How about changing the API so that the caller must use one function,
say, SetOwnedLatch(), to set a latch that they own, and another
function, say, SetNonOwnedLatch(), to set a latch that they do not
own?  The first can simply set is_set (another process might fail to
see that the latch has been set, but the worst thing they can do is
set it over again, which should be fairly harmless) and send a
self-pipe byte.  The second can take the spin lock, set is_set, read
the owner PID, release the spin lock, and send a signal.  WaitLatch()
can similarly take the spin lock before reading is_set.  This requires
the caller to know whether they are setting their own latch or someone
else's latch, but I don't believe that's a problem given current
usage.

I'm all in favor of having some memory ordering primitives so that we
can try to implement better algorithms, but if we use it here it
amounts to a fairly significant escalation in the minimum requirements
to compile PG (which is bad) rather than just a performance
optimization (which is good).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Cédric Villemain

2010/11/18 Tom Lane :
> Pavel Stehule  writes:
>> what is a slow:
>
>> a) repeated detoasting - access with subscripts - maybe detoasted
>> values can be cached?
>> b) evaluation of SRF expression - maybe call of SRF function can be
>> simple expression,
>> c) faster evaluation ro query
>
>> The most important is @a.
>
> Really?  Becase AFAICS array_unnest only detoasts the source array once,
> and saves the value between calls.
>
> array_unnest doesn't currently have any smarts about fetching slices
> of an array.  I'm not sure how useful that would be in practice, since
> (1) in most usages you probably run the function to the end and fetch
> all the values anyway; (2) it's hard to see how to optimize that way
> if the elements are varlena, which they most likely are in most usages
> where this could possibly be a win.  But if Cedric's use-case is really
> worth optimizing, I'd sure rather see the smarts for it in the general
> purpose array_unnest function instead of buried in plpgsql's FOR logic.

My use case is the following:

I have text[] data containing around 50k field.
Executing my array_filter (which is probably not as fast as what Pavel
did) is the same thing as executing unnest, except that during the
array walk, I apply a callback function and increment an internal
counter up to the p_limit parameter when callback function success.
So that it stops walking the array as soon as p_limit counter is full,
or there are no more elements to walk to in the array.

1) At a maximum it is slow like unnest (plus callback overhead), at a
minimum it find quickly and the gain is huge. Don't have the exact
numbers right here, but (from memory) the durations are between Oms
and 45 milliseconds (20M rows, of 50k field text array, not very long
text < 100 char, 15k calls per second, depending on items to walk in
the array, linear). WIth just unnest, a minimum is 45 ms per query.

2) DETOAST_SLICE I don't know the internals here

I have a concrete usage of what Pavel did.
And I agree that this is fast and easy way to handle the real issues behind :/

>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Andrew Dunstan




On 11/18/2010 06:06 PM, Andres Freund wrote:

Well, a good reason for that might be that unnest() is pretty new... Most code
I read has been initially written quite a bit earlier. Seeing 8.4 in
production is only starting to get common.


I guess we must have more adventurous customers than you :-)

(Incidentally, there is a (slow but still very useful) userland version 
of unnest that works with 8.3.)


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)

2010-11-18 Thread David Fetter

Folks,

I'd like to see about getting an enhanced UNNEST, first the one
according to the SQL standard, namely with an optional WITH ORDINALITY
clause, and possibly some extra enhancements.

In order to get WITH ORDINALITY, would it be better to change gram.y
to account for both WITH ORDINALITY and without, or just for the WITH
ORDINALITY case?

Also, there's been some enthusiasm for a different kind of enhancement
of UNNEST, which would go something like this:

UNNEST(anyarray, number_of_dimensions_to_unnest)

That'd probably be a separate patch, though.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Andres Freund

On Thursday 18 November 2010 21:11:32 Alvaro Herrera wrote:
> Excerpts from Pavel Stehule's message of jue nov 18 17:00:04 -0300 2010:
> > 2010/11/18 Andrew Dunstan :
> > >> I didn't say so nobody use it. You, me, David. But I really didn't see
> > >> this pattern here in real applications.
> > > 
> > > Lots of people are told to use it on IRC. Trust me, it's getting well
> > > known.
> > 
> > can be. but people on IRC are not representative.
> 
> Yeah, that's true.  I point out usage of unnest to our customers too,
> but it's much more common to see people not using it, instead relying on
> subscripts.  People using Postgres show up unexpectedly from under
> rocks, in the weirdest corners; they rarely consult documentation and
> even more rarely get into IRC or mailing list to get help.
Well, a good reason for that might be that unnest() is pretty new... Most code 
I read has been initially written quite a bit earlier. Seeing 8.4 in 
production is only starting to get common.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Tom Lane

Pavel Stehule  writes:
> 2010/11/18 Alvaro Herrera :
>> I fail to see how this supports the FOR-IN-array development though. Â It
>> will just be another unused construct for most people, no?

> maybe I don't understand well, but patch FOR-IN-ARRAY has a documentation

UNNEST is documented too.  Adding still more features doesn't really
improve matters for people who haven't memorized the documentation;
it only makes it even harder for them to find out what they should be
using.  (More features != better)

To my mind, the complaint about subscripting being slow suggests that we
ought to fix subscripting, not introduce a nonstandard feature that will
make certain use-cases faster if people rewrite their code to use it.

I think it would probably not be terribly hard to arrange forcible
detoasting of an array variable's value the first time it gets
subscripted, for instance.  Of course that only fixes some use-cases;
but it would help, and it helps without requiring people to change their
code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-18 Thread Tom Lane

Robert Haas  writes:
> On Mon, Nov 15, 2010 at 11:12 AM, Tom Lane  wrote:
>> Hmm ... I just remembered the reason why we didn't use a spinlock in
>> these functions already.  Namely, that it's unsafe for a signal handler
>> to try to acquire a spinlock that the interrupted code might be holding.

> The signal handler just checks a process-local, volatile variable
> called "waiting" (which should be fine) and then sends a self-pipe
> byte.  It deliberately *doesn't* take a spinlock.

I'm not talking about latch_sigusr1_handler.  I'm talking about the
several already-existing signal handlers that call SetLatch.  Now maybe
it's possible to prove that none of those can fire in a process that's
mucking with the same latch in-line, but that sounds fragile as heck;
and anyway what of future usage?  Given that precedent, somebody is
going to write something unsafe at some point, and it'll fail only often
enough to be seen in the field but not in our testing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: possible concurrency bug or mistake in understanding read-committed behavior

2010-11-18 Thread Kevin Grittner

"Kevin Grittner"  wrote:
> Jignesh Shah  wrote:
>  
>> The question is should the delete fail if it doesn't exist and
>> cause a rollback or succeed with DELETE 0 ?
>  
> I think existing behavior is consistent with both the standard and
> the other behaviors of PostgreSQL at the READ COMMITTED isolation
> level.

I actually woke up in the middle of the night after sending this
with the realization that I was wrong in that statement.  Well, half
wrong -- it *is* entirely consistent with the other behaviors of
PostgreSQL at the READ COMMITTED isolation level, but it is *not*
consistent with the standard.  Since the row existed both before and
after the first transaction; for the other transaction to see zero
rows it has to violate atomicity, and see *part* of the work of the
transaction but not *all* of it.

This issue has been discussed on the list before, regarding other
curious behaviors, and what came of it was that to prevent such
behavior PostgreSQL would need to wrap each statement within a READ
COMMITTED transaction with an implicit subtransaction and upon
detecting that a transaction on which it was blocked due to a write
had committed, it would need to roll back the subtransaction and
acquire a new snapshot.  Nobody seemed excited about either doing
the work to make that happen or what the performance implications
might be.  The upshot was more or less that if you care about such
anomalies you should be running at a more strict isolation level --
what you describe is not a problem at REPEATABLE READ or
SERIALIZABLE isolation levels.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-18 Thread Robert Haas

On Mon, Nov 15, 2010 at 11:12 AM, Tom Lane  wrote:
> Heikki Linnakangas  writes:
>> In SetLatch, is it enough to add the SpinLockAcquire() call *after*
>> checking that is_set is not already set? Ie. still do the quick exit
>> without holding a lock. Or do we need a memory barrier operation before
>> the fetch, to ensure that we see if the other process has just cleared
>> the flag with ResetLatch() ? Presumable ResetLatch() needs to call
>> SpinLockAcquire() anyway to ensure that other processes see the clearing
>> of the flag.
>
> Hmm ... I just remembered the reason why we didn't use a spinlock in
> these functions already.  Namely, that it's unsafe for a signal handler
> to try to acquire a spinlock that the interrupted code might be holding.
> So I think a bit more thought is needed here.  Maybe we need to bite the
> bullet and do memory barriers ...

The signal handler just checks a process-local, volatile variable
called "waiting" (which should be fine) and then sends a self-pipe
byte.  It deliberately *doesn't* take a spinlock.  So unless I'm
missing something (which is perfectly possible) protecting a few more
things with a spinlock should be safe enough.

Of course, there's still a potential *performance* problem if we end
up doing a kernel call while holding a spin lock, but I'm uncertain
whether we should worry about that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Alvaro Herrera :
> Excerpts from Pavel Stehule's message of jue nov 18 17:00:04 -0300 2010:
>> 2010/11/18 Andrew Dunstan :
>
>> >> I didn't say so nobody use it. You, me, David. But I really didn't see
>> >> this pattern here in real applications.
>> >>
>> >
>> > Lots of people are told to use it on IRC. Trust me, it's getting well 
>> > known.
>>
>> can be. but people on IRC are not representative.
>
> Yeah, that's true.  I point out usage of unnest to our customers too,
> but it's much more common to see people not using it, instead relying on
> subscripts.  People using Postgres show up unexpectedly from under
> rocks, in the weirdest corners; they rarely consult documentation and
> even more rarely get into IRC or mailing list to get help.
>
> I fail to see how this supports the FOR-IN-array development though.  It
> will just be another unused construct for most people, no?

maybe I don't understand well, but patch FOR-IN-ARRAY has a documentation


+ Looping Through Array
+.
+ 
+  The syntax is:
+ 
+  <> 
+ FOR target IN array
expressionstatements
+ END LOOP  label ;
+ 
+.
+  The target is a record variable, row variable,
+  or comma-separated list of scalar variables.
+  The target is successively assigned each item
+  of result of the array_expression
and the loop body
+  executed for each item. Here is an example:
+.
+ 
+ CREATE TYPE mypt AS (x int, y int);
+.
+ CREATE FUNCTION iterate_over_points() RETURNS void AS $$
+ DECLARE
+ x int; y int;
+ a mypt[] = ARRAY[(10,11),(20,21),(30,31)];
+ BEGIN
+ FOR x, y IN ARRAY a
+ LOOP
+ RAISE NOTICE 'x = %, y = %', x, y;
+ END LOOP;
+ END;
+ $$ LANGUAGE plpgsql;
+ 
+.
+  If the loop is terminated by an EXIT statement, the last
+  assigned item value is still accessible after the loop.
+ 
+
+.

Pavel
>
> --
> Álvaro Herrera 
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Alvaro Herrera

Excerpts from Pavel Stehule's message of jue nov 18 17:00:04 -0300 2010:
> 2010/11/18 Andrew Dunstan :

> >> I didn't say so nobody use it. You, me, David. But I really didn't see
> >> this pattern here in real applications.
> >>
> >
> > Lots of people are told to use it on IRC. Trust me, it's getting well known.
> 
> can be. but people on IRC are not representative.

Yeah, that's true.  I point out usage of unnest to our customers too,
but it's much more common to see people not using it, instead relying on
subscripts.  People using Postgres show up unexpectedly from under
rocks, in the weirdest corners; they rarely consult documentation and
even more rarely get into IRC or mailing list to get help.

I fail to see how this supports the FOR-IN-array development though.  It
will just be another unused construct for most people, no?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Andrew Dunstan :
>
>
> On 11/18/2010 02:39 PM, Pavel Stehule wrote:
>>
>> 2010/11/18 Andrew Dunstan:
>>>
>>> On 11/18/2010 02:17 PM, Pavel Stehule wrote:

  -only a few people use FOR IN SELECT UNNEST for iteration over array.
>>>
>>> How on earth do you know that? I use it a lot and I was just
>>> demonstrating
>>> it to a client yesterday, and I'm quite sure he will use it a lot too. I
>>> bet
>>> I'm far from alone.
>>>
>> how much people are active readers and writers in pg_hackers like you? :)
>>
>> I didn't say so nobody use it. You, me, David. But I really didn't see
>> this pattern here in real applications.
>>
>
> Lots of people are told to use it on IRC. Trust me, it's getting well known.

can be. but people on IRC are not representative.

I have about 10 courses of PL/pgSQL per year (about 100 people) -
almost all my students newer visited IRC. 30% of my students has a
problem to write a bublesort or some little bit complex code. I meet
this people. There can be a language barrier or some laziness. Really
it is  surprisingly how too less people are interesting about coding.
This people has own problems, and usually uses a most classic patter
that know from programming languages.

These peoples are "normal" and typical. Some courses I have under some
big Czech agency - so there are people from banks, industry. Actually
only I do a courses of PLpgSQL in Czech language - so I think I know
what people use. For example - only a few people know and use a
generate_series functions.

sorry for offtopic :)

Pavel

>
> cheers
>
> andrew
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Tom Lane :
> Pavel Stehule  writes:
>> "unnest" returns all fields, but
>> these fields should not be used. There isn't possible to say - stop, I
>> don't need other fields. It's possible just with special PL statement,
>> because it is controlled by PL. So it is reason why I don't believe in
>> optimizations on PL level.
>
> That is complete nonsense.  array_unnest doesn't return the whole array
> contents at once, so it's just as capable of being optimized as any
> single-purpose implementation.  If you exit the loop early, you just
> don't call it anymore.

no it isn't - actually you cannot to limit a returned set when you
call SRF function in expression context - if I remember well. We can
change it - but this is other complexity.

Pavel



>
>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Andrew Dunstan




On 11/18/2010 02:39 PM, Pavel Stehule wrote:

2010/11/18 Andrew Dunstan:


On 11/18/2010 02:17 PM, Pavel Stehule wrote:

  -only a few people use FOR IN SELECT UNNEST for iteration over array.

How on earth do you know that? I use it a lot and I was just demonstrating
it to a client yesterday, and I'm quite sure he will use it a lot too. I bet
I'm far from alone.


how much people are active readers and writers in pg_hackers like you? :)

I didn't say so nobody use it. You, me, David. But I really didn't see
this pattern here in real applications.



Lots of people are told to use it on IRC. Trust me, it's getting well known.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Andrew Dunstan :
>
>
> On 11/18/2010 02:17 PM, Pavel Stehule wrote:
>>
>>  -only a few people use FOR IN SELECT UNNEST for iteration over array.
>
> How on earth do you know that? I use it a lot and I was just demonstrating
> it to a client yesterday, and I'm quite sure he will use it a lot too. I bet
> I'm far from alone.
>

how much people are active readers and writers in pg_hackers like you? :)

I didn't say so nobody use it. You, me, David. But I really didn't see
this pattern here in real applications.

Regards

Pavel



> cheers
>
> andrew
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-column collation

2010-11-18 Thread Heikki Linnakangas


On 15.11.2010 21:42, Peter Eisentraut wrote:

On mån, 2010-11-15 at 11:34 +0100, Pavel Stehule wrote:

I am checking a patch. I found a problem with initdb


Ah, late night brain farts, it appears.  Here is a corrected version.


Some random comments:

In syntax.sgml:


+The COLLATE clause overrides the collation of
+an expression.  It is appended to the expression at applies to:


That last sentence doesn't parse.


Would it be possible to eliminate the ExecEvalCollateClause function 
somehow? It just calls through the argument. How about directly 
returning the argument ExprState in ExecInitExpr?


get_collation_name() returns the plain name without schema, so it's not 
good enough for use in ruleutils.c. pg_dump is also ignoring collation's 
schema.


Have you done any performance testing? Functions like text_cmp can be a 
hotspot in sorting, so any extra overhead there might be show up in tests.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Tom Lane :
> Pavel Stehule  writes:
>> this note was a different -only  a few people use FOR IN SELECT UNNEST
>> for iteration over array. So from Robert's question (what is important
>> for current code?) perspective the more significant is access to
>> individual fields via subscripts. For example:
>
>> for i in 1..1 loop
>>   s := s + A[i];
>> end loop
>
>> is slow, when high limit of array is some bigger number > 1000.
>
> True, but inventing new FOR syntax isn't going to help people who are
> used to doing that.

sure - I don't try it. Any change of this mean significant plpgsql's
refactoring and significant increasing the size and complexity of
code. More there can be still some overhead, because subscript can be
expression. And in almost all cases people dislike to write
subscripts.

>
>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Tom Lane :
> Pavel Stehule  writes:
>> 2010/11/18 Tom Lane :
>>> The problem here is that FOR is a syntactic choke point: it's already
>>> overloaded with several different sub-syntaxes that are quite difficult
>>> to separate.  Adding another one makes that worse, with the consequences
>>> that we might misinterpret the user's intent, leading either to
>>> misleading/unhelpful error messages or unexpected runtime behavior.
>
>> yes, this argument is correct - but we can rearange a parser rules
>> related to FOR statement. It can be solved.
>
> No, it can't.  The more things that can possibly follow FOR, the less
> likely that you correctly guess which one the user had in mind when
> faced with something that's not quite syntactically correct.  Or maybe
> it *is* syntactically correct, only not according to the variant that
> the user thought he was invoking.  We've seen bug reports of this sort
> connected with FOR already; in fact I'm pretty sure you've responded to
> a few yourself.  Adding more variants *will* make it worse.  We need
> a decent return on investment for anything we add here, and this
> proposal just doesn't offer enough benefit.
>

yes I reported a allowing a labels on "wrong" position. But minimally
this patch must not to change a current behave. It's your idea to use
keyword "ARRAY" there. Maybe we have just only different view on
complexity. My proposal increase complexity in parser, your proposal
in executor. Anybody thinking so other variant is worst. I don't speak
so we have to have a just FOR IN ARRAY syntax - I though so there was
a agreement on last discus. We can use a different syntax - but should
be readable.

Regards

Pavel Stehule

>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Andrew Dunstan




On 11/18/2010 02:17 PM, Pavel Stehule wrote:

 -only a few people use FOR IN SELECT UNNEST for iteration over array.


How on earth do you know that? I use it a lot and I was just 
demonstrating it to a client yesterday, and I'm quite sure he will use 
it a lot too. I bet I'm far from alone.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indent authentication overloading

2010-11-18 Thread Magnus Hagander

On Thu, Nov 18, 2010 at 19:36, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Thu, Nov 18, 2010 at 19:21, Tom Lane  wrote:
>>> I thought the proposal on the table was to add "peer" (or some other
>>> name) to refer to the unix-socket auth method, and use that term
>>> preferentially in the docs, while continuing to accept "ident" as an
>>> old name for it.  Is that really too confusing?
>
>> Yes, that's the current proposal - and also have the system log that
>> "ident is deprecated, use peer" when it's found in the files.
>
> Personally I could do without that little frammish.  We don't issue
> wrist-slaps for other obsolete usages; why single out this one?

Fair enough. I may be guilty of thinking we should do it more often
;), but I agree that being consistent is more important.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Tom Lane

Pavel Stehule  writes:
> "unnest" returns all fields, but
> these fields should not be used. There isn't possible to say - stop, I
> don't need other fields. It's possible just with special PL statement,
> because it is controlled by PL. So it is reason why I don't believe in
> optimizations on PL level.

That is complete nonsense.  array_unnest doesn't return the whole array
contents at once, so it's just as capable of being optimized as any
single-purpose implementation.  If you exit the loop early, you just
don't call it anymore.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Tom Lane

Pavel Stehule  writes:
> this note was a different -only  a few people use FOR IN SELECT UNNEST
> for iteration over array. So from Robert's question (what is important
> for current code?) perspective the more significant is access to
> individual fields via subscripts. For example:

> for i in 1..1 loop
>   s := s + A[i];
> end loop

> is slow, when high limit of array is some bigger number > 1000.

True, but inventing new FOR syntax isn't going to help people who are
used to doing that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] describe objects, as in pg_depend

2010-11-18 Thread Tom Lane

Alvaro Herrera  writes:
> I have to say that I'm baffled as to why has_database_privilege et al
> were declared in builtins.h but pg_table_is_visible et al in dependency.c.

builtins.h is probably the right place, on the whole ... I agree that
consistency is somewhat lacking in this area.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] describe objects, as in pg_depend

2010-11-18 Thread Tom Lane

Alvaro Herrera  writes:
> I just noticed that this is leaking a bit of memory, because we call
> getObjectDescription (which pallocs its result) and then
> cstring_to_text which pallocs a copy.  This is not a lot and it's not
> going to live much anyway, is it worrying about?

No.  The extra string will go away at the end of the tuple cycle.
I don't think it's a particularly good idea for this code to assume
that getObjectDescription's result is freeable, anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Tom Lane :
> Pavel Stehule  writes:
>> what is a slow:
>
>> a) repeated detoasting - access with subscripts - maybe detoasted
>> values can be cached?
>> b) evaluation of SRF expression - maybe call of SRF function can be
>> simple expression,
>> c) faster evaluation ro query
>
>> The most important is @a.
>
> Really?  Becase AFAICS array_unnest only detoasts the source array once,
> and saves the value between calls.

I know.

this note was a different -only  a few people use FOR IN SELECT UNNEST
for iteration over array. So from Robert's question (what is important
for current code?) perspective the more significant is access to
individual fields via subscripts. For example:

for i in 1..1 loop
  s := s + A[i];
end loop

is slow, when high limit of array is some bigger number > 1000. But
almost all stored procedures used this pattern. I know so some people
use a pattern FOR IN SELECT UNNEST, but (for example) I didn't meet
that developer in Czech Rep. It isn't usual so people can mix SQL and
PL well.

It has a practical reasons - using a UNNEST for small arrays is slower.

>
> array_unnest doesn't currently have any smarts about fetching slices
> of an array.  I'm not sure how useful that would be in practice, since
> (1) in most usages you probably run the function to the end and fetch
> all the values anyway; (2) it's hard to see how to optimize that way
> if the elements are varlena, which they most likely are in most usages
> where this could possibly be a win.  But if Cedric's use-case is really
> worth optimizing, I'd sure rather see the smarts for it in the general
> purpose array_unnest function instead of buried in plpgsql's FOR logic.
>

Probably - example with LIKE filter is really specific. But there can
be a tasks, where you can early break a iteration where you find a
value higher or less then some constant - it's not too artificial -
test "IS MEMBER OF"

Regards

Pavel

>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Tom Lane

Pavel Stehule  writes:
> 2010/11/18 Tom Lane :
>> The problem here is that FOR is a syntactic choke point: it's already
>> overloaded with several different sub-syntaxes that are quite difficult
>> to separate. Â Adding another one makes that worse, with the consequences
>> that we might misinterpret the user's intent, leading either to
>> misleading/unhelpful error messages or unexpected runtime behavior.

> yes, this argument is correct - but we can rearange a parser rules
> related to FOR statement. It can be solved.

No, it can't.  The more things that can possibly follow FOR, the less
likely that you correctly guess which one the user had in mind when
faced with something that's not quite syntactically correct.  Or maybe
it *is* syntactically correct, only not according to the variant that
the user thought he was invoking.  We've seen bug reports of this sort
connected with FOR already; in fact I'm pretty sure you've responded to
a few yourself.  Adding more variants *will* make it worse.  We need
a decent return on investment for anything we add here, and this
proposal just doesn't offer enough benefit.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] describe objects, as in pg_depend

2010-11-18 Thread Alvaro Herrera

Excerpts from Tom Lane's message of jue nov 18 14:49:21 -0300 2010:

> Please do not do this:
> 
> +/* this doesn't really need to appear in any header file */
> +Datumpg_describe_object(PG_FUNCTION_ARGS);
> 
> Put the extern declaration in a header file, don't be cute.

Oh, I forgot to comment on this.  I had initially put the declaration in
builtins.h, but then I noticed that namespace.c contains a bunch of
declarations -- I even copied the comment almost verbatim:

/* These don't really need to appear in any header file */
Datum   pg_table_is_visible(PG_FUNCTION_ARGS);
Datum   pg_type_is_visible(PG_FUNCTION_ARGS);
Datum   pg_function_is_visible(PG_FUNCTION_ARGS);
Datum   pg_operator_is_visible(PG_FUNCTION_ARGS);
Datum   pg_opclass_is_visible(PG_FUNCTION_ARGS);
Datum   pg_conversion_is_visible(PG_FUNCTION_ARGS);
Datum   pg_ts_parser_is_visible(PG_FUNCTION_ARGS);
Datum   pg_ts_dict_is_visible(PG_FUNCTION_ARGS);
Datum   pg_ts_template_is_visible(PG_FUNCTION_ARGS);
Datum   pg_ts_config_is_visible(PG_FUNCTION_ARGS);
Datum   pg_my_temp_schema(PG_FUNCTION_ARGS);
Datum   pg_is_other_temp_schema(PG_FUNCTION_ARGS);


This seems to have originated in this commit:

commit 4ab8e69094452286a5894f1b2b237304808f4391
Author: Tom Lane 
Date:   Fri Aug 9 16:45:16 2002 +

has_table_privilege spawns scions has_database_privilege, 
has_function_privilege,
has_language_privilege, has_schema_privilege to let SQL queries test
all the new privilege types in 7.3.  Also, add functions 
pg_table_is_visible,
pg_type_is_visible, pg_function_is_visible, pg_operator_is_visible,
pg_opclass_is_visible to test whether objects contained in schemas are
visible in the current search path.  Do some minor cleanup to centralize
accesses to pg_database, as well.


I have to say that I'm baffled as to why has_database_privilege et al
were declared in builtins.h but pg_table_is_visible et al in dependency.c.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indent authentication overloading

2010-11-18 Thread Greg Stark

On Thu, Nov 18, 2010 at 6:36 PM, Tom Lane  wrote:
> It's also warning about the wrong thing.  IMO the real subtext to this
> discussion is that we're afraid people are using ident-over-TCP
> insecurely because they've confused it with ident-over-socket.
> Which is a legitimate concern, but issuing warnings about
> ident-over-socket configurations will accomplish nothing whatsoever
> to wake up the guy at risk, because he's not using one.  It will only
> make us look like pedantic nannies annoying people whose configurations
> are perfectly fine.

Perhaps we should rename both then? Then we could warn if someone is
using ident to refer to identd authentication but not if they're using
it to refer to peer authentication.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] describe objects, as in pg_depend

2010-11-18 Thread Alvaro Herrera

I just noticed that this is leaking a bit of memory, because we call
getObjectDescription (which pallocs its result) and then
cstring_to_text which pallocs a copy.  This is not a lot and it's not
going to live much anyway, is it worrying about?  I reworked it like
this:

{
char   *description = NULL;
text   *tdesc;

...

description = getObjectDescription(&address);
tdesc = cstring_to_text(description);
pfree(description);

PG_RETURN_TEXT_P(tdesc);
}

I notice that ruleutils.c has a convenience function string_to_text which is
designed to avoid this problem:

static text *
string_to_text(char *str)
{
text   *result;

result = cstring_to_text(str);
pfree(str);
return result;
}

So I could just make that non-static (though I'd move it to varlena.c
while at it) and use that instead.

I wonder if it's worth going through some of the other callers of
cstring_to_text and change them to use this wrapper.  There are some
that are leaking some memory, though it's a tiny amount and I'm not
sure it's worth the bother.  (But if so, why is ruleutils going the
extra mile?)

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Tom Lane

Pavel Stehule  writes:
> what is a slow:

> a) repeated detoasting - access with subscripts - maybe detoasted
> values can be cached?
> b) evaluation of SRF expression - maybe call of SRF function can be
> simple expression,
> c) faster evaluation ro query

> The most important is @a.

Really?  Becase AFAICS array_unnest only detoasts the source array once,
and saves the value between calls.

array_unnest doesn't currently have any smarts about fetching slices
of an array.  I'm not sure how useful that would be in practice, since
(1) in most usages you probably run the function to the end and fetch
all the values anyway; (2) it's hard to see how to optimize that way
if the elements are varlena, which they most likely are in most usages
where this could possibly be a win.  But if Cedric's use-case is really
worth optimizing, I'd sure rather see the smarts for it in the general
purpose array_unnest function instead of buried in plpgsql's FOR logic.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indent authentication overloading

2010-11-18 Thread Josh Berkus


> We should've done that long ago - it's already used for things that
> aren't ident. If anything, it should be pg_usermap.conf.

That would be nice.  How would we handle the backwards compatibility?
Accept pg_ident files also for 2 versions with a warning in the logs,
and then stop reading them?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Robert Haas :
> On Thu, Nov 18, 2010 at 1:03 PM, Pavel Stehule  
> wrote:
>> 2010/11/18 Robert Haas :
>>> On Thu, Nov 18, 2010 at 12:36 PM, Tom Lane  wrote:
 I would *much* rather we get the performance benefit by internal
 optimization, and forego inventing syntax.
>>>
>>> +1.
>>
>> any optimization will be about 10-20% slower than direct access. See
>> my tests: on large arrays isn't significant if you use a simple
>> expression or full query. This is just overhead from building a
>> "tuplestore" and access to data via cursor. And you cannot to change a
>> SRF functions to returns just array. I would to see any optimization
>> on this level, but I think so it's unreal expecting.
>
> How can you possibly make a general statement like that?  What's slow
> is not the syntax; it's what the syntax is making happen under the
> hood.
>

ok, it is based on my tests, but it can be subjective. Probably is
possible to work with a tuplestore as result of SRF function. And
probably we can read from it without cursor. Maybe we can to teach a
SRF functions to store values as scalars not as tuple - tuplestore can
do it, but the we have to have a global state and we must to modify
buildin functions (not just "unnest" - if the feature should be
general). But newer we can to ensure a working with only necessary
data like a special PL statement. "unnest" returns all fields, but
these fields should not be used. There isn't possible to say - stop, I
don't need other fields. It's possible just with special PL statement,
because it is controlled by PL. So it is reason why I don't believe in
optimizations on PL level.

Regards

Pavel Stehule

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indent authentication overloading

2010-11-18 Thread Magnus Hagander

On Thu, Nov 18, 2010 at 19:41, Josh Berkus  wrote:
>
>>> I thought the proposal on the table was to add "peer" (or some other
>>> name) to refer to the unix-socket auth method, and use that term
>>> preferentially in the docs, while continuing to accept "ident" as an
>>> old name for it.  Is that really too confusing?
>
> What about the pg_ident file?  Are we going to rename it?  Are we

We should've done that long ago - it's already used for things that
aren't ident. If anything, it should be pg_usermap.conf.

> (better) going to have separate files for pg_peer and pg_ident?

Why? It already supports multiple maps...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indent authentication overloading

2010-11-18 Thread Josh Berkus


>> I thought the proposal on the table was to add "peer" (or some other
>> name) to refer to the unix-socket auth method, and use that term
>> preferentially in the docs, while continuing to accept "ident" as an
>> old name for it.  Is that really too confusing?

What about the pg_ident file?  Are we going to rename it?  Are we
(better) going to have separate files for pg_peer and pg_ident?

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indent authentication overloading

2010-11-18 Thread Andrew Dunstan




On 11/18/2010 01:21 PM, Tom Lane wrote:

I thought the proposal on the table was to add "peer" (or some other
name) to refer to the unix-socket auth method, and use that term
preferentially in the docs, while continuing to accept "ident" as an
old name for it.  Is that really too confusing?


Not to me. And I think that's a good proposal.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indent authentication overloading

2010-11-18 Thread Tom Lane

Magnus Hagander  writes:
> On Thu, Nov 18, 2010 at 19:21, Tom Lane  wrote:
>> I thought the proposal on the table was to add "peer" (or some other
>> name) to refer to the unix-socket auth method, and use that term
>> preferentially in the docs, while continuing to accept "ident" as an
>> old name for it.  Is that really too confusing?

> Yes, that's the current proposal - and also have the system log that
> "ident is deprecated, use peer" when it's found in the files.

Personally I could do without that little frammish.  We don't issue
wrist-slaps for other obsolete usages; why single out this one?

It's also warning about the wrong thing.  IMO the real subtext to this
discussion is that we're afraid people are using ident-over-TCP
insecurely because they've confused it with ident-over-socket.
Which is a legitimate concern, but issuing warnings about
ident-over-socket configurations will accomplish nothing whatsoever
to wake up the guy at risk, because he's not using one.  It will only
make us look like pedantic nannies annoying people whose configurations
are perfectly fine.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indent authentication overloading

2010-11-18 Thread Magnus Hagander

On Thu, Nov 18, 2010 at 19:21, Tom Lane  wrote:
> Josh Berkus  writes:
>>> We use it. Do you have an alternative that doesn't lower security
>>> besides Kerberos? Anti-ident arguments are straw man arguments - "If
>>> you setup identd badly or don't trust remote root or your network,
>>> ident sucks as an authentication mechanism".
>
>> Actually, you're trusting that nobody can add their own machine as a
>> node on your network.  All someone has to do is plug their linux laptop
>> into a network cable in your office and they have free access to the
>> database.
>
> You're assuming the OP is using ident for wild-card IP ranges rather
> than specific IP addresses.  I agree that ident is *hard* to set up
> securely, but that doesn't mean it's entirely insecure.

If you can get on the network, you can take out that single IP as
well, in most networks. (Yes, you can protect against that, but it's
not the default by any means). It takes a little bit more work, but
it's really not that hard.

OTOH, if you can get on the network in *that* way, you should be using
SSL or ipsec.

But I definitely agree that it can be used in secure ways, depending
on the circumstances. If it wans't clear, my "suggestion" to remove it
completely really wasn't serious.


>> I don't think anyone is talking about eliminating it, just
>> distinguishing ident-over-TCP from unix-socket-same-user, which are
>> really two different authentication mechanisms.
>
>> HOWEVER, I can't see any way of doing this which wouldn't cause a
>> significant amount of backwards-compatibility confusion.
>
> I thought the proposal on the table was to add "peer" (or some other
> name) to refer to the unix-socket auth method, and use that term
> preferentially in the docs, while continuing to accept "ident" as an
> old name for it.  Is that really too confusing?

Yes, that's the current proposal - and also have the system log that
"ident is deprecated, use peer" when it's found in the files.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indent authentication overloading

2010-11-18 Thread Tom Lane

Josh Berkus  writes:
>> We use it. Do you have an alternative that doesn't lower security
>> besides Kerberos? Anti-ident arguments are straw man arguments - "If
>> you setup identd badly or don't trust remote root or your network,
>> ident sucks as an authentication mechanism".

> Actually, you're trusting that nobody can add their own machine as a 
> node on your network.  All someone has to do is plug their linux laptop 
> into a network cable in your office and they have free access to the 
> database.

You're assuming the OP is using ident for wild-card IP ranges rather
than specific IP addresses.  I agree that ident is *hard* to set up
securely, but that doesn't mean it's entirely insecure.

> I don't think anyone is talking about eliminating it, just 
> distinguishing ident-over-TCP from unix-socket-same-user, which are 
> really two different authentication mechanisms.

> HOWEVER, I can't see any way of doing this which wouldn't cause a 
> significant amount of backwards-compatibility confusion.

I thought the proposal on the table was to add "peer" (or some other
name) to refer to the unix-socket auth method, and use that term
preferentially in the docs, while continuing to accept "ident" as an
old name for it.  Is that really too confusing?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Robert Haas

On Thu, Nov 18, 2010 at 1:03 PM, Pavel Stehule  wrote:
> 2010/11/18 Robert Haas :
>> On Thu, Nov 18, 2010 at 12:36 PM, Tom Lane  wrote:
>>> I would *much* rather we get the performance benefit by internal
>>> optimization, and forego inventing syntax.
>>
>> +1.
>
> any optimization will be about 10-20% slower than direct access. See
> my tests: on large arrays isn't significant if you use a simple
> expression or full query. This is just overhead from building a
> "tuplestore" and access to data via cursor. And you cannot to change a
> SRF functions to returns just array. I would to see any optimization
> on this level, but I think so it's unreal expecting.

How can you possibly make a general statement like that?  What's slow
is not the syntax; it's what the syntax is making happen under the
hood.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indent authentication overloading

2010-11-18 Thread Aidan Van Dyk

On Thu, Nov 18, 2010 at 1:01 PM, Josh Berkus  wrote:
>
>> We use it. Do you have an alternative that doesn't lower security
>> besides Kerberos? Anti-ident arguments are straw man arguments - "If
>> you setup identd badly or don't trust remote root or your network,
>> ident sucks as an authentication mechanism".
>
> Actually, you're trusting that nobody can add their own machine as a node on
> your network.  All someone has to do is plug their linux laptop into a
> network cable in your office and they have free access to the database.

I think you need to give him a little more credit than that... From
the description he gave, I wouldn't be surprised if the networks he's
using ident on, he's got switch ports locked, limited server access,
etc...

His whole point was that in his locked down network, ident is *better*
that giving everybody "yet another password" they have to manage, have
users not mis-manage, and make sure users don't mis-use...

So, yes, ident is only as secure as the *network and machines* it's
used on.  Passwords are only as secure as the users managing them, and
the machines/filesystems containing .pgpass ;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Robert Haas :
> On Thu, Nov 18, 2010 at 12:36 PM, Tom Lane  wrote:
>> I would *much* rather we get the performance benefit by internal
>> optimization, and forego inventing syntax.
>
> +1.

any optimization will be about 10-20% slower than direct access. See
my tests: on large arrays isn't significant if you use a simple
expression or full query. This is just overhead from building a
"tuplestore" and access to data via cursor. And you cannot to change a
SRF functions to returns just array. I would to see any optimization
on this level, but I think so it's unreal expecting.

Regards

Pavel Stehule

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Indent authentication overloading

2010-11-18 Thread Josh Berkus




We use it. Do you have an alternative that doesn't lower security
besides Kerberos? Anti-ident arguments are straw man arguments - "If
you setup identd badly or don't trust remote root or your network,
ident sucks as an authentication mechanism".


Actually, you're trusting that nobody can add their own machine as a 
node on your network.  All someone has to do is plug their linux laptop 
into a network cable in your office and they have free access to the 
database.



Ident is great as you don't have to lower security by dealing with
keys on the client system (more management headaches == lower
security), or worry about those keys being reused by accounts that
shouldn't be reusing them. Please don't deprecate it unless there is
an alternative. And if you are a pg_pool or pgbouncer maintainer,
please consider adding support :)


I don't think anyone is talking about eliminating it, just 
distinguishing ident-over-TCP from unix-socket-same-user, which are 
really two different authentication mechanisms.


HOWEVER, I can't see any way of doing this which wouldn't cause a 
significant amount of backwards-compatibility confusion.  Given that 
users can distinguish between local and TCP ident in pg_hba.conf already 
(and the default pg_hba.conf does) it is worth the confusion it will cause?



--
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Tom Lane :
> Andrew Dunstan  writes:
>> Syntactic sugar is not entirely to be despised, anyway.
>
> If it were harmless syntactic sugar I wouldn't be objecting so loudly.
> The problem here is that FOR is a syntactic choke point: it's already
> overloaded with several different sub-syntaxes that are quite difficult
> to separate.  Adding another one makes that worse, with the consequences
> that we might misinterpret the user's intent, leading either to
> misleading/unhelpful error messages or unexpected runtime behavior.
> If you consult the archives you can find numerous past instances of
> complaints from people who hit such problems with the existing set of
> FOR sub-syntaxes, so this isn't hypothetical.
>

yes, this argument is correct - but we can rearange a parser rules
related to FOR statement. It can be solved.

> I'm also quite afraid of having a conflict with other future extensions
> of FOR, which we might want to introduce either on our own hook or for
> compatibility with something Oracle might add to PL/SQL in future.

we talked about it last time - and I respect it - syntax is FOR IN
>>>ARRAY<<< expression

>
> This might not be the worst place in the entire system to be introducing
> inessential syntactic sugar, but it's certainly one of the top ten.
> I would *much* rather we get the performance benefit by internal
> optimization, and forego inventing syntax.
>

Regards

Pavel


>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Robert Haas

On Thu, Nov 18, 2010 at 12:36 PM, Tom Lane  wrote:
> I would *much* rather we get the performance benefit by internal
> optimization, and forego inventing syntax.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] describe objects, as in pg_depend

2010-11-18 Thread Tom Lane

Alvaro Herrera  writes:
> In the process of looking it over again, I noticed that in an
> assert-enabled build, it's trivial to crash the server with this
> function: just pass a nonzero subobjid with an object class that doesn't
> take one.  This could be fixed easily by turning the Asserts into
> elog(ERROR).

> Another problem with this function is that a lot of checks that
> currently raise errors with elog(ERROR) are now user-facing.  On this
> issue one possible answer would be to do nothing; another would be to go
> over all those calls and turn them into full-fledged ereports.

Yeah, it would definitely be necessary to ensure that you couldn't cause
an Assert by passing bogus input.  I wouldn't bother making the errors
into ereports though: that's adding a lot of translation burden to no
good purpose.

Please do not do this:

+/* this doesn't really need to appear in any header file */
+Datum  pg_describe_object(PG_FUNCTION_ARGS);

Put the extern declaration in a header file, don't be cute.

This is useless, because getObjectDescription never returns null
(or if it does, we have a whole lot of unprotected callers to fix):

+   if (!description)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+errmsg("invalid object specification")));

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Tom Lane :
> Pavel Stehule  writes:
>> 2010/11/18 Tom Lane :
>>> More to the point, if there is indeed an interesting performance win
>>> here, we could get the same win by internally optimizing the existing
>>> syntax.
>
>> sorry, but I don't agree. I don't think, so there are some big space
>> for optimizing - and if then it means much more code complexity for
>> current expr executor. Next - FOR IN ARRAY takes fields from array on
>> request - and it is possible, because a unpacking of array is
>> controlled by statement - it's impossible do same when unpacking is
>> inside other functions with same effectivity.
>
> Just because you haven't thought about it doesn't mean it's impossible
> or impractical.
>
> The first implementation I was thinking of would involve looking at the
> SELECT querytree after parsing to see if it's "SELECT UNNEST(something)"
> --- that is, empty jointree and so on, single tlist item that is an
> invocation of the unnest() function.  If it is, you pull out the
> argument expression of unnest() and go from there, with exactly the same
> execution behavior as in the specialized-syntax patch.  This is
> perfectly safe if you identify the array_unnest function by OID: since
> it's a built-in function you know exactly what it's supposed to do.

this additional control will do slow down for any expression - more -
somebody can use a form: SELECT FROM unnest(), and this form will be
slower.

>
> But having said that, it's still not apparent to me that array_unnest
> itself is markedly slower than what you could hope to do in plpgsql.
> I think the real issue here is that plpgsql's simple-expression code
> can't be used with set-returning expressions, which means that we have
> to go through the vastly more expensive SPI code path.  But that
> restriction is largely based on fear of re-using expression trees, which
> is something we fixed a few weeks ago.  I think that it would now be
> interesting to look at whether "FOR x IN SELECT simple-expression" could
> use the simple-expression code even when the expression returns set.
> That approach might bring a useful speedup not just for unnest, but for
> many other use-cases as well.
>

any SRF call must not be faster then direct access to array. There is
overhead with tuples.

I don't understand you. Sorry. I don't think, so your objections are objective.

Regards

Pavel

>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Tom Lane

Andrew Dunstan  writes:
> Syntactic sugar is not entirely to be despised, anyway.

If it were harmless syntactic sugar I wouldn't be objecting so loudly.
The problem here is that FOR is a syntactic choke point: it's already
overloaded with several different sub-syntaxes that are quite difficult
to separate.  Adding another one makes that worse, with the consequences
that we might misinterpret the user's intent, leading either to
misleading/unhelpful error messages or unexpected runtime behavior.
If you consult the archives you can find numerous past instances of
complaints from people who hit such problems with the existing set of
FOR sub-syntaxes, so this isn't hypothetical.

I'm also quite afraid of having a conflict with other future extensions
of FOR, which we might want to introduce either on our own hook or for
compatibility with something Oracle might add to PL/SQL in future.

This might not be the worst place in the entire system to be introducing
inessential syntactic sugar, but it's certainly one of the top ten.
I would *much* rather we get the performance benefit by internal
optimization, and forego inventing syntax.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN and nfiltered

2010-11-18 Thread Robert Haas

On Thu, Nov 18, 2010 at 10:45 AM, Marko Tiikkaja
 wrote:
> Here's a patch for showing in EXPLAIN ANALYZE the number of rows a plan qual
> filtered from a node's input.  The output looks like this:

I have wished for this many, MANY times.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] describe objects, as in pg_depend

2010-11-18 Thread Alvaro Herrera


Thanks for the comments.  Updated patch attached.

In the process of looking it over again, I noticed that in an
assert-enabled build, it's trivial to crash the server with this
function: just pass a nonzero subobjid with an object class that doesn't
take one.  This could be fixed easily by turning the Asserts into
elog(ERROR).

Another problem with this function is that a lot of checks that
currently raise errors with elog(ERROR) are now user-facing.  On this
issue one possible answer would be to do nothing; another would be to go
over all those calls and turn them into full-fledged ereports.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


0001-Add-pg_describe_object-function.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Tom Lane

Pavel Stehule  writes:
> 2010/11/18 Tom Lane :
>> More to the point, if there is indeed an interesting performance win
>> here, we could get the same win by internally optimizing the existing
>> syntax.

> sorry, but I don't agree. I don't think, so there are some big space
> for optimizing - and if then it means much more code complexity for
> current expr executor. Next - FOR IN ARRAY takes fields from array on
> request - and it is possible, because a unpacking of array is
> controlled by statement - it's impossible do same when unpacking is
> inside other functions with same effectivity.

Just because you haven't thought about it doesn't mean it's impossible
or impractical.

The first implementation I was thinking of would involve looking at the
SELECT querytree after parsing to see if it's "SELECT UNNEST(something)"
--- that is, empty jointree and so on, single tlist item that is an
invocation of the unnest() function.  If it is, you pull out the
argument expression of unnest() and go from there, with exactly the same
execution behavior as in the specialized-syntax patch.  This is
perfectly safe if you identify the array_unnest function by OID: since
it's a built-in function you know exactly what it's supposed to do.

But having said that, it's still not apparent to me that array_unnest
itself is markedly slower than what you could hope to do in plpgsql.
I think the real issue here is that plpgsql's simple-expression code
can't be used with set-returning expressions, which means that we have
to go through the vastly more expensive SPI code path.  But that
restriction is largely based on fear of re-using expression trees, which
is something we fixed a few weeks ago.  I think that it would now be
interesting to look at whether "FOR x IN SELECT simple-expression" could
use the simple-expression code even when the expression returns set.
That approach might bring a useful speedup not just for unnest, but for
many other use-cases as well.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN and nfiltered

2010-11-18 Thread Marko Tiikkaja


On 2010-11-18 6:44 PM +0200, Andres Freund wrote:

On Thursday 18 November 2010 16:45:23 Marko Tiikkaja wrote:

Here's a patch for showing in EXPLAIN ANALYZE the number of rows a plan
qual filtered from a node's input.  The output looks like this:

If it supports the same for index-scans I *really* like it and even proposed a
patch earlier (4a16a8af.2080...@anarazel.de)
I still find myself constantly wishing to have something like that...
Especially when wondering if it might be worthwile to add another column
(which is implemented as an additional qual atm) should get included in a
multiicolumn index.

It was shot down at the time because it might break some explain-parsers
expectations. Imho that argument is less valid these days.


I have to admit, I didn't query the archives before looking at this.  I 
guess I should have.  This patch supports all Scan nodes, and I think 
that's the right thing to do.


I agree 100% that breaking the parseability (is that a word?) of the 
EXPLAIN output is not really an argument these days, so we might want to 
give this some thought.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN and nfiltered

2010-11-18 Thread Andres Freund

On Thursday 18 November 2010 17:48:43 Marko Tiikkaja wrote:
> On 2010-11-18 6:44 PM +0200, Andres Freund wrote:
> > On Thursday 18 November 2010 16:45:23 Marko Tiikkaja wrote:
> >> Here's a patch for showing in EXPLAIN ANALYZE the number of rows a plan
> > 
> >> qual filtered from a node's input.  The output looks like this:
> > If it supports the same for index-scans I *really* like it and even
> > proposed a patch earlier (4a16a8af.2080...@anarazel.de)
> > I still find myself constantly wishing to have something like that...
> > Especially when wondering if it might be worthwile to add another column
> > (which is implemented as an additional qual atm) should get included in a
> > multiicolumn index.
> > 
> > It was shot down at the time because it might break some explain-parsers
> > expectations. Imho that argument is less valid these days.
> 
> I have to admit, I didn't query the archives before looking at this.  I
> guess I should have.  This patch supports all Scan nodes, and I think
> that's the right thing to do.
Uh. No worries. That was my first pg-patch, I doubt it was well-done ;-)

> I agree 100% that breaking the parseability (is that a word?) of the
> EXPLAIN output is not really an argument these days, so we might want to
> give this some thought.
Yes.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN and nfiltered

2010-11-18 Thread Andres Freund

On Thursday 18 November 2010 16:45:23 Marko Tiikkaja wrote:
> Hi,
> 
> Here's a patch for showing in EXPLAIN ANALYZE the number of rows a plan
> qual filtered from a node's input.  The output looks like this:
If it supports the same for index-scans I *really* like it and even proposed a 
patch earlier (4a16a8af.2080...@anarazel.de)
I still find myself constantly wishing to have something like that... 
Especially when wondering if it might be worthwile to add another column 
(which is implemented as an additional qual atm) should get included in a 
multiicolumn index.

It was shot down at the time because it might break some explain-parsers 
expectations. Imho that argument is less valid these days.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Tom Lane :
> Merlin Moncure  writes:
>> On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane  wrote:
>>> Yes, which begs the question of why bother at all.
>
>> Pavel's performance argument is imnsho valid.
>
> Well, that argument is unsupported by any evidence, so far as I've seen.
>
> More to the point, if there is indeed an interesting performance win
> here, we could get the same win by internally optimizing the existing
> syntax.  That would provide the benefit to existing code not just
> new code; and it would avoid foreclosing our future options for
> extending FOR in not-so-redundant ways.

sorry, but I don't agree. I don't think, so there are some big space
for optimizing - and if then it means much more code complexity for
current expr executor. Next - FOR IN ARRAY takes fields from array on
request - and it is possible, because a unpacking of array is
controlled by statement - it's impossible do same when unpacking is
inside other functions with same effectivity.

Regards

Pavel Stehule

>
>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN and nfiltered

2010-11-18 Thread Marko Tiikkaja


On 2010-11-18 6:26 PM +0200, Tom Lane wrote:

Marko Tiikkaja  writes:

Here's a patch for showing in EXPLAIN ANALYZE the number of rows a plan
qual filtered from a node's input.


I don't like this a whole lot.  It's unclear what "filtered" means,
or why it's worth expending precious EXPLAIN ANALYZE output space for.


The name can be changed, of course.  But I think the idea is good; I 
find myself constantly manually finding out how many rows were actually 
filtered.



Also, you've not implemented it for any except scan nodes;


That was intentional.


and I
think it's not going to be entirely well-defined for join nodes,
since it's somewhat arbitrary which conditions are considered part
of the join qual versus the filter.  (That problem will get worse
not better with the planned generalization of inner indexscans,
since there may be join quals in scan nodes.)


Hmm..  Maybe I'm misunderstanding something, but I don't see that as a 
huge problem.



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN and nfiltered

2010-11-18 Thread Tom Lane

Marko Tiikkaja  writes:
> Here's a patch for showing in EXPLAIN ANALYZE the number of rows a plan 
> qual filtered from a node's input.

I don't like this a whole lot.  It's unclear what "filtered" means,
or why it's worth expending precious EXPLAIN ANALYZE output space for.

Also, you've not implemented it for any except scan nodes; and I
think it's not going to be entirely well-defined for join nodes,
since it's somewhat arbitrary which conditions are considered part
of the join qual versus the filter.  (That problem will get worse
not better with the planned generalization of inner indexscans,
since there may be join quals in scan nodes.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Tom Lane

Merlin Moncure  writes:
> On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane  wrote:
>> Yes, which begs the question of why bother at all.

> Pavel's performance argument is imnsho valid.

Well, that argument is unsupported by any evidence, so far as I've seen.

More to the point, if there is indeed an interesting performance win
here, we could get the same win by internally optimizing the existing
syntax.  That would provide the benefit to existing code not just
new code; and it would avoid foreclosing our future options for
extending FOR in not-so-redundant ways.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Cédric Villemain

2010/11/18 Pavel Stehule :
> 2010/11/18 Cédric Villemain :
>> 2010/11/18 Robert Haas :
>>> On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure  wrote:
 On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova  
>> wrote:
>>> i will start the review of this one... but before that sorry for
>>> suggesting this a bit later but about using UNNEST as part of the
>>> sintax?
>
>> Does for-in-array do what unnset does?
>
> Yes, which begs the question of why bother at all.  AFAICS this patch
> simply allows you to replace
>
>        for x in select unnest(array_value) loop
>
> with
>
>        for x in unnest array_value loop
>
> (plus or minus a parenthesis or so).  I do not think we need to add a
> bunch of code and create even more syntactic ambiguity (FOR loops are
> already on the hairy edge of unparsability) to save people from writing
> "select".

 Pavel's performance argument is imnsho valid. arrays at present are
 the best way to pass data around functions and any optimizations here
 are very welcome.  Given that, is there any way to mitigate your
 concerns on the syntax side?
>>>
>>> Can we get the performance benefit any other way?  I hate to think
>>> that it will still be slow for people using the already-supported
>>> syntax.
>>
>> If you are able to make unnest() outputting 1st row without detoasting
>> last field.
>>
>> I think if we have :
>> #define DatumGetTextPSlice(X,m,n)   ((text *) PG_DETOAST_DATUM_SLICE(X,m,n))
>> but for array, most is done
>>
>> Pavel, am I correct ?
>
> yes, it can help, but still if you iterate over complete array, you
> have to do n - detoast ops.

sure.

>
> Pavel
>
>>
>>>
>>> --
>>> Robert Haas
>>> EnterpriseDB: http://www.enterprisedb.com
>>> The Enterprise PostgreSQL Company
>>>
>>> --
>>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>>
>>
>>
>>
>> --
>> Cédric Villemain               2ndQuadrant
>> http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
>>
>



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Andrew Dunstan




On 11/18/2010 10:33 AM, Robert Haas wrote:

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure  wrote:


Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome.  Given that, is there any way to mitigate your
concerns on the syntax side?

Can we get the performance benefit any other way?  I hate to think
that it will still be slow for people using the already-supported
syntax.


It's not disastrously slower. AFAICT from a very quick glance over the 
patch, he's getting the speedup by bypassing the normal mechanism for 
evaluating "for x in select ...". So we'd have to special-case that to 
trap calls to unnest in the general form. That would be pretty ugly. Or 
else make unnest and SPI faster. But that's a much bigger project.


Syntactic sugar is not entirely to be despised, anyway.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Cédric Villemain :
> 2010/11/18 Robert Haas :
>> On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure  wrote:
>>> On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane  wrote:
 Merlin Moncure  writes:
> On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova  
> wrote:
>> i will start the review of this one... but before that sorry for
>> suggesting this a bit later but about using UNNEST as part of the
>> sintax?

> Does for-in-array do what unnset does?

 Yes, which begs the question of why bother at all.  AFAICS this patch
 simply allows you to replace

        for x in select unnest(array_value) loop

 with

        for x in unnest array_value loop

 (plus or minus a parenthesis or so).  I do not think we need to add a
 bunch of code and create even more syntactic ambiguity (FOR loops are
 already on the hairy edge of unparsability) to save people from writing
 "select".
>>>
>>> Pavel's performance argument is imnsho valid. arrays at present are
>>> the best way to pass data around functions and any optimizations here
>>> are very welcome.  Given that, is there any way to mitigate your
>>> concerns on the syntax side?
>>
>> Can we get the performance benefit any other way?  I hate to think
>> that it will still be slow for people using the already-supported
>> syntax.
>
> If you are able to make unnest() outputting 1st row without detoasting
> last field.
>
> I think if we have :
> #define DatumGetTextPSlice(X,m,n)   ((text *) PG_DETOAST_DATUM_SLICE(X,m,n))
> but for array, most is done
>
> Pavel, am I correct ?

yes, it can help, but still if you iterate over complete array, you
have to do n - detoast ops.

Pavel

>
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise PostgreSQL Company
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>
>
> --
> Cédric Villemain               2ndQuadrant
> http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Which data structures for the index?

2010-11-18 Thread Vaibhav Kaushal

Thanks once again Mr. Heikki for your help. Got your point!

Thanks :)

-Vaibhav (*_*)

On Thu, Nov 18, 2010 at 7:10 PM, Heikki Linnakangas <
heikki.linnakan...@enterprisedb.com> wrote:

> On 18.11.2010 15:19, Vaibhav Kaushal wrote:
>
>> I have a small problem: Suppose that I have a table in PostgreSQL which
>> has
>> a integer field as its Primary key and I have used the Hash indexing
>> method
>> for creating the index for the field on the table. Now I want to know the
>> following details about the index (assuming the machine running PostgreSQL
>> is a Linux box with DB installed at /var/lib/pgsql/data):
>>
>> * Which file would contain the index table data? (OR how to find them?
>> Will
>> I find them in the CATALOG tables?)
>>
>
> SELECT relfilenode FROM pg_class WHERE relname='index name'. The index data
> is stored in a file with that name. Something like
> /var/lib/pgsql/data/base/11910/
>
>
>  * Is there some documentation available about the source apart from that
>> on
>> the website and the the one which gets compiled with the source?
>> (specially
>> about the conversions and the steps of conversion of the data from they
>> raw
>> disc reads to the structured layout in the memory just before the executor
>> is started)
>>
>
> The source and the README files in the source tree are your best help. And
> the comments in the header files are very helpful too.
>
>
>  * Which file in the source tree is responsible for the scan of the index?
>> (The main file in case there are two of them for the btree and hash
>> indexes
>> separately)
>>
>
> src/backend/access/nbtree/nbtree.c, btgettuple function
> and
> src/backend/access/hash/hash.c, hashgettuple function
>
> src/backend/access/index/indexam.c is the common entry point for all index
> types.
>
>
>  * Which data structures are the most important ones in index scanning? (I
>> will search them myself but please someone tell me the structures; there
>> are
>> just too many of them :( )
>>
>
> Depends on what you're interested in. IndexScanDesc is common between all
> index scans, Understanding the page structure might also be helpful, see
> src/include/storage/bufpage.h.
>
>
>  * Are the pages of the DB file layout of the index table in someway
>> different than what is discussed at
>> http://www.postgresql.org/docs/9.0/interactive/storage-file-layout.html ?
>> If
>> yes then what are the differences?
>>
>
> No, that applies to indexes too.
>
>
>  And I must say that while browsing the source, I was so pleased to read
>> the
>> comments (they really helped a lot). Thanks to the PostgreSQL coding
>> conventions and of course the contributors. I am a bit clear about the
>> working of the executor (thanks to ECLIPSE for support of ctags and a nice
>> UI) but I am still much in a mess.
>>
>> Thanks in advance for the answer ;)
>>
>
> Good luck!
>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Robert Haas :
> On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure  wrote:
>> On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane  wrote:
>>> Merlin Moncure  writes:
 On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova  
 wrote:
> i will start the review of this one... but before that sorry for
> suggesting this a bit later but about using UNNEST as part of the
> sintax?
>>>
 Does for-in-array do what unnset does?
>>>
>>> Yes, which begs the question of why bother at all.  AFAICS this patch
>>> simply allows you to replace
>>>
>>>        for x in select unnest(array_value) loop
>>>
>>> with
>>>
>>>        for x in unnest array_value loop
>>>
>>> (plus or minus a parenthesis or so).  I do not think we need to add a
>>> bunch of code and create even more syntactic ambiguity (FOR loops are
>>> already on the hairy edge of unparsability) to save people from writing
>>> "select".
>>
>> Pavel's performance argument is imnsho valid. arrays at present are
>> the best way to pass data around functions and any optimizations here
>> are very welcome.  Given that, is there any way to mitigate your
>> concerns on the syntax side?
>
> Can we get the performance benefit any other way?  I hate to think
> that it will still be slow for people using the already-supported
> syntax.
>

I afraid so other ways are more difficult with deeper impacts on
plpgsql executor.

what is a slow:

a) repeated detoasting - access with subscripts - maybe detoasted
values can be cached?
b) evaluation of SRF expression - maybe call of SRF function can be
simple expression,
c) faster evaluation ro query

The most important is @a. Only a few people uses a FOR IN SELECT
UNNEST form now. Probably not optimizable on PLpgSQL level is form FOR
IN SELECT * FROM UNNEST.

FOR IN ARRAY doesn't impacts on expression executing - this patch is
just simple and isolated.

Pavel



> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Cédric Villemain

2010/11/18 Robert Haas :
> On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure  wrote:
>> On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane  wrote:
>>> Merlin Moncure  writes:
 On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova  
 wrote:
> i will start the review of this one... but before that sorry for
> suggesting this a bit later but about using UNNEST as part of the
> sintax?
>>>
 Does for-in-array do what unnset does?
>>>
>>> Yes, which begs the question of why bother at all.  AFAICS this patch
>>> simply allows you to replace
>>>
>>>        for x in select unnest(array_value) loop
>>>
>>> with
>>>
>>>        for x in unnest array_value loop
>>>
>>> (plus or minus a parenthesis or so).  I do not think we need to add a
>>> bunch of code and create even more syntactic ambiguity (FOR loops are
>>> already on the hairy edge of unparsability) to save people from writing
>>> "select".
>>
>> Pavel's performance argument is imnsho valid. arrays at present are
>> the best way to pass data around functions and any optimizations here
>> are very welcome.  Given that, is there any way to mitigate your
>> concerns on the syntax side?
>
> Can we get the performance benefit any other way?  I hate to think
> that it will still be slow for people using the already-supported
> syntax.

If you are able to make unnest() outputting 1st row without detoasting
last field.

I think if we have :
#define DatumGetTextPSlice(X,m,n)   ((text *) PG_DETOAST_DATUM_SLICE(X,m,n))
but for array, most is done

Pavel, am I correct ?

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] EXPLAIN and nfiltered

2010-11-18 Thread Marko Tiikkaja


Hi,

Here's a patch for showing in EXPLAIN ANALYZE the number of rows a plan 
qual filtered from a node's input.  The output looks like this:



  QUERY PLAN 


---
 Subquery Scan on ss  (cost=0.00..50.00 rows=267 width=4) (actual 
time=0.035..0.088 rows=5 filtered=3 loops=1)

   Filter: (ss.a < 6)
   ->  Limit  (cost=0.00..40.00 rows=800 width=4) (actual 
time=0.031..0.067 rows=8 filtered=0 loops=1)
 ->  Seq Scan on foo  (cost=0.00..40.00 rows=800 width=4) 
(actual time=0.027..0.040 rows=8 filtered=2 loops=1)

   Filter: (a < 9)
 Total runtime: 0.146 ms
(6 rows)


It might be better if the output was on the Filter: line but this was 
just the result of a quick idea and I wanted to see how much work the 
actual implementation would be.


Any suggestions and comments on the output format, the patch and the 
idea are welcome.



Regards,
Marko Tiikkaja
*** a/src/backend/commands/explain.c
--- b/src/backend/commands/explain.c
***
*** 975,992  ExplainNode(PlanState *planstate, List *ancestors,
double  startup_sec = 1000.0 * 
planstate->instrument->startup / nloops;
double  total_sec = 1000.0 * 
planstate->instrument->total / nloops;
double  rows = planstate->instrument->ntuples / nloops;
  
if (es->format == EXPLAIN_FORMAT_TEXT)
{
appendStringInfo(es->str,
!" (actual 
time=%.3f..%.3f rows=%.0f loops=%.0f)",
!startup_sec, 
total_sec, rows, nloops);
}
else
{
ExplainPropertyFloat("Actual Startup Time", 
startup_sec, 3, es);
ExplainPropertyFloat("Actual Total Time", total_sec, 3, 
es);
ExplainPropertyFloat("Actual Rows", rows, 0, es);
ExplainPropertyFloat("Actual Loops", nloops, 0, es);
}
}
--- 975,994 
double  startup_sec = 1000.0 * 
planstate->instrument->startup / nloops;
double  total_sec = 1000.0 * 
planstate->instrument->total / nloops;
double  rows = planstate->instrument->ntuples / nloops;
+   double  filtered = planstate->instrument->nfiltered / 
nloops;
  
if (es->format == EXPLAIN_FORMAT_TEXT)
{
appendStringInfo(es->str,
!" (actual 
time=%.3f..%.3f rows=%.0f filtered=%.0f loops=%.0f)",
!startup_sec, 
total_sec, rows, filtered, nloops);
}
else
{
ExplainPropertyFloat("Actual Startup Time", 
startup_sec, 3, es);
ExplainPropertyFloat("Actual Total Time", total_sec, 3, 
es);
ExplainPropertyFloat("Actual Rows", rows, 0, es);
+   ExplainPropertyFloat("Rows Filtered", rows, 0, es);
ExplainPropertyFloat("Actual Loops", nloops, 0, es);
}
}
***
*** 999,1004  ExplainNode(PlanState *planstate, List *ancestors,
--- 1001,1007 
ExplainPropertyFloat("Actual Startup Time", 0.0, 3, es);
ExplainPropertyFloat("Actual Total Time", 0.0, 3, es);
ExplainPropertyFloat("Actual Rows", 0.0, 0, es);
+   ExplainPropertyFloat("Rows Filtered", 0.0, 0, es);
ExplainPropertyFloat("Actual Loops", 0.0, 0, es);
}
}
*** a/src/backend/executor/execScan.c
--- b/src/backend/executor/execScan.c
***
*** 19,24 
--- 19,25 
  #include "postgres.h"
  
  #include "executor/executor.h"
+ #include "executor/instrument.h"
  #include "miscadmin.h"
  #include "utils/memutils.h"
  
***
*** 221,226  ExecScan(ScanState *node,
--- 222,229 
 * Tuple fails qual, so free per-tuple memory and try again.
 */
ResetExprContext(econtext);
+   if (node->ps.instrument)
+   node->ps.instrument->nfiltered += 1;;
}
  }
  
*** a/src/include/executor/instrument.h
--- b/src/include/executor/instrument.h
***
*** 49,54  typedef struct Instrumentation
--- 49,55 
double  startup;/* Total startup time (in 
seconds) */
double  total;  /* Total total time (in 
seconds) */
double  ntuples;/* Total tuples produced */
+   double  nfil

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Robert Haas

On Thu, Nov 18, 2010 at 10:24 AM, Merlin Moncure  wrote:
> On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane  wrote:
>> Merlin Moncure  writes:
>>> On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova  
>>> wrote:
 i will start the review of this one... but before that sorry for
 suggesting this a bit later but about using UNNEST as part of the
 sintax?
>>
>>> Does for-in-array do what unnset does?
>>
>> Yes, which begs the question of why bother at all.  AFAICS this patch
>> simply allows you to replace
>>
>>        for x in select unnest(array_value) loop
>>
>> with
>>
>>        for x in unnest array_value loop
>>
>> (plus or minus a parenthesis or so).  I do not think we need to add a
>> bunch of code and create even more syntactic ambiguity (FOR loops are
>> already on the hairy edge of unparsability) to save people from writing
>> "select".
>
> Pavel's performance argument is imnsho valid. arrays at present are
> the best way to pass data around functions and any optimizations here
> are very welcome.  Given that, is there any way to mitigate your
> concerns on the syntax side?

Can we get the performance benefit any other way?  I hate to think
that it will still be slow for people using the already-supported
syntax.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Cédric Villemain

2010/11/18 Pavel Stehule :
> 2010/11/18 Cédric Villemain :
>> 2010/11/18 Pavel Stehule :
>>> 2010/11/18 Tom Lane :
 Merlin Moncure  writes:
> On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova  
> wrote:
>> i will start the review of this one... but before that sorry for
>> suggesting this a bit later but about using UNNEST as part of the
>> sintax?

> Does for-in-array do what unnset does?

 Yes, which begs the question of why bother at all.  AFAICS this patch
 simply allows you to replace

        for x in select unnest(array_value) loop

 with

        for x in unnest array_value loop

 (plus or minus a parenthesis or so).  I do not think we need to add a
 bunch of code and create even more syntactic ambiguity (FOR loops are
 already on the hairy edge of unparsability) to save people from writing
 "select".
>>>
>>> this patch is semantically equal to SELECT unnest(..), but it is
>>> evaluated as simple expression and does directly array unpacking and
>>> iteration, - so it means this fragment is significantly >>faster<<.
>>
>> Did you implement a method to be able to walk the array and detoast
>> only the current needed data ?
>
> not only - iteration over array can help with readability but a
> general work with SRF (set returning functions is more harder and
> slower) - so special loop statement can to safe a some toast op / when
> you use a large array and access via index, or can to safe a some work
> with memory, because there isn't necessary convert array to set of
> tuples. Please, recheck these tests.
>
> test:
>
> CREATE OR REPLACE FUNCTION rndstr() RETURNS text AS $$select
> array_to_string(array(select substring('ABCDEFGHIJKLMNOPQ' FROM
> (random()*16)::int FOR 1) from generate_series(1,10)),'')$$ LANGUAGE
> sql;
>
> create or replace function rndarray(int) returns text[] as $$select
> array(select rndstr() from generate_series(1,$1)) $$ language sql;
>
> create table t10(x text[]);
> insert into t10 select rndarray(10) from generate_series(1,1);
> create table t100(x text[]);
> insert into t100 select rndarray(100) from generate_series(1,1);
> create table t1000(x text[]);
> insert into t1000 select rndarray(1000) from generate_series(1,1);
>
> CREATE OR REPLACE FUNCTION public.filter(text[], text, integer)
>  RETURNS text[]
>  LANGUAGE plpgsql
> AS $function$
> DECLARE
>  s text[] := '{}';
>  l int := 0;
>  v text;
> BEGIN
>  FOR v IN ARRAY $1
>  LOOP
>    EXIT WHEN l = $3;
>    IF v LIKE $2 THEN
>      s := s || v;
>      l := l + 1;
>    END IF;
>  END LOOP;
>  RETURN s;
> END;$function$;
>
> postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t10;
>        avg
> 
>  1.1596079803990200
> (1 row)
>
> Time: 393.649 ms
>
> postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t100;
>        avg
> 
>  3.497689245536
> (1 row)
>
> Time: 2804.502 ms
>
> postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t1000;
>         avg
> -
>  10.
> (1 row)
>
> Time: 9729.994 ms
>
> CREATE OR REPLACE FUNCTION public.filter01(text[], text, integer)
>  RETURNS text[]
>  LANGUAGE plpgsql
> AS $function$
> DECLARE
>  s text[] := '{}';
>  l int := 0;
>  v text;
> BEGIN
>  FOR v IN SELECT UNNEST($1)
>  LOOP
>    EXIT WHEN l = $3;
>    IF v LIKE $2 THEN
>      s := s || v;
>      l := l + 1;
>    END IF;
>  END LOOP;
>  RETURN s;
> END;$function$;
>
> postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t10;
>        avg
> 
>  1.1596079803990200
> (1 row)
>
> Time: 795.383 ms
>
> postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t100;
>        avg
> 
>  3.497689245536
> (1 row)
>
> Time: 3848.258 ms
>
> postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t1000;
>         avg
> -
>  10.
> (1 row)
>
> Time: 12366.093 ms
>
> The iteration via specialized FOR IN ARRAY is about 25-30% faster than
> FOR IN SELECT UNNEST
>
> postgres=# CREATE OR REPLACE FUNCTION public.filter02(text[], text, integer)
>  RETURNS text[]
>  LANGUAGE plpgsql
> AS $function$
> DECLARE
>  s text[] := '{}';
>  l int := 0; i int;
>  v text;
> BEGIN
>  FOR i IN array_lower($1,1)..array_upper($1,1)
>  LOOP
>    EXIT WHEN l = $3;
>    IF $1[i] LIKE $2 THEN
>      s := s || $1[i];
>      l := l + 1;
>    END IF;
>  END LOOP;
>  RETURN s;
> END;$function$
> ;
>
> postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t10;
>        avg
> 
>  1.1596079803990200
> (1 row)
>
> Time: 414.960 ms
>
> postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t100;
>        avg
> 
>  3.497689245536
> (1 row)
>
> Time: 3460.970 ms
>
> there FOR IN ARRAY is faster about 30% then access per index
>
> for T1000 I had to cancel over 1 minute

I can't test until this wee

Re: [HACKERS] unlogged tables

2010-11-18 Thread Robert Haas

On Thu, Nov 18, 2010 at 3:07 AM, Dimitri Fontaine
 wrote:
>
>> CREATE VOLATILE TABLE blow_me_away (k text, v text) SOME OTHER WORDS
>> THAT EXPLAIN THE DETAILS GO HERE;
>
> [ TRUNCATE ON RESTART ]
>
> Your patch implements this option, right?

Yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Merlin Moncure

On Thu, Nov 18, 2010 at 12:47 AM, Tom Lane  wrote:
> Merlin Moncure  writes:
>> On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova  
>> wrote:
>>> i will start the review of this one... but before that sorry for
>>> suggesting this a bit later but about using UNNEST as part of the
>>> sintax?
>
>> Does for-in-array do what unnset does?
>
> Yes, which begs the question of why bother at all.  AFAICS this patch
> simply allows you to replace
>
>        for x in select unnest(array_value) loop
>
> with
>
>        for x in unnest array_value loop
>
> (plus or minus a parenthesis or so).  I do not think we need to add a
> bunch of code and create even more syntactic ambiguity (FOR loops are
> already on the hairy edge of unparsability) to save people from writing
> "select".

Pavel's performance argument is imnsho valid. arrays at present are
the best way to pass data around functions and any optimizations here
are very welcome.  Given that, is there any way to mitigate your
concerns on the syntax side?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] duplicate connection failure messages

2010-11-18 Thread Alvaro Herrera

Excerpts from Bruce Momjian's message of mié nov 17 13:04:46 -0300 2010:

> OK, I doubt we want to add complexity to improve this, so I see our
> options as:
> 
> o  ignore the problem
> o  display IPv4/IPv6 labels
> o  display only an IPv6 label
> o  something else

I think we should use inet_ntop where available to print the address.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Pavel Stehule

2010/11/18 Cédric Villemain :
> 2010/11/18 Pavel Stehule :
>> 2010/11/18 Tom Lane :
>>> Merlin Moncure  writes:
 On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova  
 wrote:
> i will start the review of this one... but before that sorry for
> suggesting this a bit later but about using UNNEST as part of the
> sintax?
>>>
 Does for-in-array do what unnset does?
>>>
>>> Yes, which begs the question of why bother at all.  AFAICS this patch
>>> simply allows you to replace
>>>
>>>        for x in select unnest(array_value) loop
>>>
>>> with
>>>
>>>        for x in unnest array_value loop
>>>
>>> (plus or minus a parenthesis or so).  I do not think we need to add a
>>> bunch of code and create even more syntactic ambiguity (FOR loops are
>>> already on the hairy edge of unparsability) to save people from writing
>>> "select".
>>
>> this patch is semantically equal to SELECT unnest(..), but it is
>> evaluated as simple expression and does directly array unpacking and
>> iteration, - so it means this fragment is significantly >>faster<<.
>
> Did you implement a method to be able to walk the array and detoast
> only the current needed data ?

not only - iteration over array can help with readability but a
general work with SRF (set returning functions is more harder and
slower) - so special loop statement can to safe a some toast op / when
you use a large array and access via index, or can to safe a some work
with memory, because there isn't necessary convert array to set of
tuples. Please, recheck these tests.

test:

CREATE OR REPLACE FUNCTION rndstr() RETURNS text AS $$select
array_to_string(array(select substring('ABCDEFGHIJKLMNOPQ' FROM
(random()*16)::int FOR 1) from generate_series(1,10)),'')$$ LANGUAGE
sql;

create or replace function rndarray(int) returns text[] as $$select
array(select rndstr() from generate_series(1,$1)) $$ language sql;

create table t10(x text[]);
insert into t10 select rndarray(10) from generate_series(1,1);
create table t100(x text[]);
insert into t100 select rndarray(100) from generate_series(1,1);
create table t1000(x text[]);
insert into t1000 select rndarray(1000) from generate_series(1,1);

CREATE OR REPLACE FUNCTION public.filter(text[], text, integer)
 RETURNS text[]
 LANGUAGE plpgsql
AS $function$
DECLARE
 s text[] := '{}';
 l int := 0;
 v text;
BEGIN
  FOR v IN ARRAY $1
  LOOP
EXIT WHEN l = $3;
IF v LIKE $2 THEN
  s := s || v;
  l := l + 1;
END IF;
  END LOOP;
  RETURN s;
END;$function$;

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t10;
avg

 1.1596079803990200
(1 row)

Time: 393.649 ms

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t100;
avg

 3.497689245536
(1 row)

Time: 2804.502 ms

postgres=# select avg(array_upper(filter(x,'%AA%', 10),1)) from t1000;
 avg
-
 10.
(1 row)

Time: 9729.994 ms

CREATE OR REPLACE FUNCTION public.filter01(text[], text, integer)
 RETURNS text[]
 LANGUAGE plpgsql
AS $function$
DECLARE
 s text[] := '{}';
 l int := 0;
 v text;
BEGIN
  FOR v IN SELECT UNNEST($1)
  LOOP
EXIT WHEN l = $3;
IF v LIKE $2 THEN
  s := s || v;
  l := l + 1;
END IF;
  END LOOP;
  RETURN s;
END;$function$;

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t10;
avg

 1.1596079803990200
(1 row)

Time: 795.383 ms

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t100;
avg

 3.497689245536
(1 row)

Time: 3848.258 ms

postgres=# select avg(array_upper(filter01(x,'%AA%', 10),1)) from t1000;
 avg
-
 10.
(1 row)

Time: 12366.093 ms

The iteration via specialized FOR IN ARRAY is about 25-30% faster than
FOR IN SELECT UNNEST

postgres=# CREATE OR REPLACE FUNCTION public.filter02(text[], text, integer)
 RETURNS text[]
 LANGUAGE plpgsql
AS $function$
DECLARE
 s text[] := '{}';
 l int := 0; i int;
 v text;
BEGIN
  FOR i IN array_lower($1,1)..array_upper($1,1)
  LOOP
EXIT WHEN l = $3;
IF $1[i] LIKE $2 THEN
  s := s || $1[i];
  l := l + 1;
END IF;
  END LOOP;
  RETURN s;
END;$function$
;

postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t10;
avg

 1.1596079803990200
(1 row)

Time: 414.960 ms

postgres=# select avg(array_upper(filter02(x,'%AA%', 10),1)) from t100;
avg

 3.497689245536
(1 row)

Time: 3460.970 ms

there FOR IN ARRAY is faster about 30% then access per index

for T1000 I had to cancel over 1 minute


>
> (I wonder because I have something like that in that garage : select
> array_filter(foo,'like','%bar%',10); where 10 is the limit and can be
> avoided, foo is the array, like is callback function, '%bar%' the
> parameter for the callback function for filtering results.)
>
> It will make my toy in the garage a fast race car (and pr

Re: [HACKERS] Which data structures for the index?

2010-11-18 Thread Heikki Linnakangas


On 18.11.2010 15:19, Vaibhav Kaushal wrote:

I have a small problem: Suppose that I have a table in PostgreSQL which has
a integer field as its Primary key and I have used the Hash indexing method
for creating the index for the field on the table. Now I want to know the
following details about the index (assuming the machine running PostgreSQL
is a Linux box with DB installed at /var/lib/pgsql/data):

* Which file would contain the index table data? (OR how to find them? Will
I find them in the CATALOG tables?)


SELECT relfilenode FROM pg_class WHERE relname='index name'. The index 
data is stored in a file with that name. Something like 
/var/lib/pgsql/data/base/11910/



* Is there some documentation available about the source apart from that on
the website and the the one which gets compiled with the source? (specially
about the conversions and the steps of conversion of the data from they raw
disc reads to the structured layout in the memory just before the executor
is started)


The source and the README files in the source tree are your best help. 
And the comments in the header files are very helpful too.



* Which file in the source tree is responsible for the scan of the index?
(The main file in case there are two of them for the btree and hash indexes
separately)


src/backend/access/nbtree/nbtree.c, btgettuple function
and
src/backend/access/hash/hash.c, hashgettuple function

src/backend/access/index/indexam.c is the common entry point for all 
index types.



* Which data structures are the most important ones in index scanning? (I
will search them myself but please someone tell me the structures; there are
just too many of them :( )


Depends on what you're interested in. IndexScanDesc is common between 
all index scans, Understanding the page structure might also be helpful, 
see src/include/storage/bufpage.h.



* Are the pages of the DB file layout of the index table in someway
different than what is discussed at
http://www.postgresql.org/docs/9.0/interactive/storage-file-layout.html ? If
yes then what are the differences?


No, that applies to indexes too.


And I must say that while browsing the source, I was so pleased to read the
comments (they really helped a lot). Thanks to the PostgreSQL coding
conventions and of course the contributors. I am a bit clear about the
working of the executor (thanks to ECLIPSE for support of ctags and a nice
UI) but I am still much in a mess.

Thanks in advance for the answer ;)


Good luck!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Which data structures for the index?

2010-11-18 Thread Vaibhav Kaushal

Hi all,

I have a small problem: Suppose that I have a table in PostgreSQL which has
a integer field as its Primary key and I have used the Hash indexing method
for creating the index for the field on the table. Now I want to know the
following details about the index (assuming the machine running PostgreSQL
is a Linux box with DB installed at /var/lib/pgsql/data):

* Which file would contain the index table data? (OR how to find them? Will
I find them in the CATALOG tables?)

* Is there some documentation available about the source apart from that on
the website and the the one which gets compiled with the source? (specially
about the conversions and the steps of conversion of the data from they raw
disc reads to the structured layout in the memory just before the executor
is started)

* Which file in the source tree is responsible for the scan of the index?
(The main file in case there are two of them for the btree and hash indexes
separately)

* Which data structures are the most important ones in index scanning? (I
will search them myself but please someone tell me the structures; there are
just too many of them :( )

* Are the pages of the DB file layout of the index table in someway
different than what is discussed at
http://www.postgresql.org/docs/9.0/interactive/storage-file-layout.html ? If
yes then what are the differences?

And I must say that while browsing the source, I was so pleased to read the
comments (they really helped a lot). Thanks to the PostgreSQL coding
conventions and of course the contributors. I am a bit clear about the
working of the executor (thanks to ECLIPSE for support of ctags and a nice
UI) but I am still much in a mess.

Thanks in advance for the answer ;)

-Vaibhav (*_*)

Re: [HACKERS] GiST insert algorithm rewrite

2010-11-18 Thread Heikki Linnakangas


On 17.11.2010 19:36, Teodor Sigaev wrote:

Hmm, will have to do some benchmarking on that. I'm using the Consistent
function when walking down to check if the downlink needs to be updated,
and assumed that it would be insignificant compared to the cost of
calling Penalty on all the keys on the page.

Why consistent?! It's impossible - you don't know right strategy number,
index with storage type/over type could do not accept the same type as
query. Index over tsvector is an example.


Sorry, I was confused. I'm calling the gistgetadjusted() function, which 
uses the Union function. Ie. I'm doing the same we did before when 
propagating the changes up the tree. I'm just doing it on the way down 
instead.


I ran some quick performance tests on my laptop, and couldn't see any 
measurable difference with the patch. So I think we're good on 
performance. I used the attached scripts, with \timing.


Have you had a chance to look at the patch yet? I'm hesitant to commit 
before you take a look at it, though I still have to proofread it myself 
as well.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
DROP TABLE IF EXISTS tt;
CREATE TABLE tt (a int4);
CREATE INDEX i_tt ON tt USING gist (a);

checkpoint;
INSERT INTO tt
SELECT i FROM generate_series(1, 20) i;
DROP TABLE IF EXISTS tt;
CREATE TABLE tt (a tsvector);
CREATE INDEX i_tt ON tt USING gist (a);

checkpoint;
INSERT INTO tt
SELECT (chr(97+(i%27)) || chr(97+(i%29)) || repeat(chr(97+(i%23)), (i%11) + 1))::tsvector AS a
FROM generate_series(1, 20) i;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq changes for synchronous replication

2010-11-18 Thread Fujii Masao

On Tue, Nov 16, 2010 at 10:49 AM, Robert Haas  wrote:
>> Just in a quick scan, I don't have any objection to v2 except that the
>> protocol documentation is lacking.
>
> OK, I'll mark it Waiting on Author pending that issue.

The patch is touching protocol.sgml as follows. Isn't this enough?

-
*** a/doc/src/sgml/protocol.sgml
--- b/doc/src/sgml/protocol.sgml
***
*** 1344,1350  The commands accepted in walsender mode are:
WAL position XXX/XXX.
The server can reply with an error, e.g. if the requested section of WAL
has already been recycled. On success, server responds with a
!   CopyOutResponse message, and then starts to stream WAL to the frontend.
WAL will continue to be streamed until the connection is broken;
no further commands will be accepted.
   
--- 1344,1350 
WAL position XXX/XXX.
The server can reply with an error, e.g. if the requested section of WAL
has already been recycled. On success, server responds with a
!   CopyXLogResponse message, and then starts to stream WAL to the frontend.
WAL will continue to be streamed until the connection is broken;
no further commands will be accepted.
   
***
*** 2696,2701  CopyOutResponse (B)
--- 2696,2737 

  
  
+ CopyXLogResponse (B)
+ 
+ 
+ 
+
+ 
+ 
+ 
+ Byte1('W')
+ 
+ 
+ 
+ Identifies the message as a Start Copy XLog response.
+ This message is used only for Streaming Replication.
+ 
+ 
+ 
+ 
+ 
+ Int32
+ 
+ 
+ 
+ Length of message contents in bytes, including self.
+ 
+ 
+ 
+ 
+
+ 
+ 
+ 
+
+
+ 
+ 
  DataRow (B)
  
  
-

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-18 Thread Cédric Villemain

2010/11/18 Pavel Stehule :
> 2010/11/18 Tom Lane :
>> Merlin Moncure  writes:
>>> On Wed, Nov 17, 2010 at 7:08 PM, Jaime Casanova  
>>> wrote:
 i will start the review of this one... but before that sorry for
 suggesting this a bit later but about using UNNEST as part of the
 sintax?
>>
>>> Does for-in-array do what unnset does?
>>
>> Yes, which begs the question of why bother at all.  AFAICS this patch
>> simply allows you to replace
>>
>>        for x in select unnest(array_value) loop
>>
>> with
>>
>>        for x in unnest array_value loop
>>
>> (plus or minus a parenthesis or so).  I do not think we need to add a
>> bunch of code and create even more syntactic ambiguity (FOR loops are
>> already on the hairy edge of unparsability) to save people from writing
>> "select".
>
> this patch is semantically equal to SELECT unnest(..), but it is
> evaluated as simple expression and does directly array unpacking and
> iteration, - so it means this fragment is significantly >>faster<<.

Did you implement a method to be able to walk the array and detoast
only the current needed data ?

(I wonder because I have something like that in that garage : select
array_filter(foo,'like','%bar%',10); where 10 is the limit and can be
avoided, foo is the array, like is callback function, '%bar%' the
parameter for the callback function for filtering results.)

It will make my toy in the garage a fast race car (and probably doable
in (plpg)SQL instead of C) ...

-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlogged tables

2010-11-18 Thread Dimitri Fontaine


> CREATE VOLATILE TABLE blow_me_away (k text, v text) SOME OTHER WORDS
> THAT EXPLAIN THE DETAILS GO HERE;

[ TRUNCATE ON RESTART ]

Your patch implements this option, right?

Regards,
> 
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unused parameter in vacuum.c

2010-11-18 Thread Heikki Linnakangas


On 18.11.2010 08:12, Shigeru HANADA wrote:

When I was reading vacuum.c, I found that the parameter 'stmttype' of
get_rel_oids() is not used at all.  The parameter had been used as
command tag in the notice message about "invalid object type", but now
such messages are reported by other functions.

Please find attached remove-unused-parameter patch.


Thanks, applied.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

86 matches

Mail list logo