date:20091123

Re: [HACKERS] Hot standby and removing VACUUM FULL

2009-11-23 Thread Hannu Krosing

On Tue, 2009-11-24 at 09:24 +0200, Heikki Linnakangas wrote:
> Greg Smith wrote:
> > Heikki Linnakangas wrote:
> >> So I guess what I'm asking is: Does anyone see any show-stoppers in
> >> removing VACUUM FULL
> > Here's the disclaimers attached to the new VACUUM REPLACE implementation
> > from Itagaki:
> > 
> > "We still need traditional VACUUM FULL behavior for system catalog
> > because we cannot change relfilenode for them. Also, VACUUM FULL REPLACE
> > is not always better than traditional VACUUM FULL; the new version
> > requires additional disk space and might be slower if we have a few dead
> > tuples."
> > 
> > That first part seems like it would limit the ability to completely
> > discard the current behavior.
> 
> For system catalog,s you could still use a utility like the one I
> experimented with at
> http://archives.postgresql.org/message-id/4ab15065.1000...@enterprisedb.com.
> Essentially, do a bunch of dummy UPDATEs on the rows that you want to
> move. It can cause serialization errors in concurrent updaters, like any
> UPDATE, but I think it would be good enough for the narrow remaining use
> case of shrinking system catalogs.

call it VACUUM SHRINK ;)

and you will need to do a REINDEX after using it to get full benefit,
same as ordinary VACUUM or VACUUM FULL.

> -- 
>   Heikki Linnakangas
>   EnterpriseDB   http://www.enterprisedb.com
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] KNNGiST for knn-search

2009-11-23 Thread Heikki Linnakangas

Teodor Sigaev wrote:
> we'd like to present contrib module for CVS HEAD, which contains
> implementation of knn (k nearest neighbourhood) search in PostgreSQL,
> see README.knngist for
> details.

Cool!

> Old way:
> SELECT coordinates, (coordinates <-> '5.0,5.0'::point) AS dist FROM spots
> order by dist asc LIMIT 10;
> 
> Time: 1024.242 ms
> 
> knn-search:
> SELECT coordinates, (coordinates <-> '5.0,5.0'::point) AS dist FROM spots
> WHERE coordinates  >< '5.0,5.0'::point LIMIT 10;
> 
> Time: 3.158 ms

I think you'll need to work on that. A WHERE qual shouldn't imply a sort
order. You'll have to teach the planner how to use the index to speed up
a query in the first form.

> We didn't patch existing implementation of GiST for several reasons:
> 
> 1. KNNGiST is about 5% slower than GiST on non-knn search queries, like
>   contains or contained by, because of some overhead of new algorithm of
>   tree traversal

Is it possible to use the regular GiST traversal algorithm on a
KNNGiST-tree, when performing regular GiST searches that don't require a
particular order?

> 2.  KNNGiST can't be used in  bitmap index scan, which destroys order of
> results,
>   We don't know the way to forbid bitmap index scan only for knn queries.
>   Current version of KNNGiST doesn't distinguish knn-search and usual
> search
>   and postgres doesn't know about ordered output from KNNGiST.

Yeah, you really need to modify the planner to understand the ordering
and plan accordingly.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and removing VACUUM FULL

2009-11-23 Thread Heikki Linnakangas

Greg Smith wrote:
> Heikki Linnakangas wrote:
>> So I guess what I'm asking is: Does anyone see any show-stoppers in
>> removing VACUUM FULL
> Here's the disclaimers attached to the new VACUUM REPLACE implementation
> from Itagaki:
> 
> "We still need traditional VACUUM FULL behavior for system catalog
> because we cannot change relfilenode for them. Also, VACUUM FULL REPLACE
> is not always better than traditional VACUUM FULL; the new version
> requires additional disk space and might be slower if we have a few dead
> tuples."
> 
> That first part seems like it would limit the ability to completely
> discard the current behavior.

For system catalog,s you could still use a utility like the one I
experimented with at
http://archives.postgresql.org/message-id/4ab15065.1000...@enterprisedb.com.
Essentially, do a bunch of dummy UPDATEs on the rows that you want to
move. It can cause serialization errors in concurrent updaters, like any
UPDATE, but I think it would be good enough for the narrow remaining use
case of shrinking system catalogs.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and removing VACUUM FULL

2009-11-23 Thread Heikki Linnakangas

Itagaki Takahiro wrote:
> VACUUM FULL is still reserved for system catalogs in my patch
> because we cannot modify relfilenodes for the catalog tables.
> Do you have solutions for it?

Tom had an idea on that:

http://archives.postgresql.org/message-id/19750.1252094...@sss.pgh.pa.us

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SE-PgSQL patch review

2009-11-23 Thread KaiGai Kohei

>> * Can we use error messages that looks like existing privilege errors?
>>ERRCODE_INSUFFICIENT_PRIVILEGE:
>>  => permission denied to rename database
> 
> Here is a point that we should provide users a hint which enables
> to make clear the reason of access violations. So, I think we should
> contains a mark to show it come from SE-PgSQL.
> 
> Currently, it raises an error on access violation in sepgsql_compute_perms().
> It always prints out "SELinux: security policy violation".

It is just for your information.

SELinux allows end-users to confirm what accesses are violated using audit logs.
In operating system, it is written on the /var/log/audit/audit.log.

See the result of:
  % grep ^type=AVC /var/log/audit/audit.log
:
  type=AVC msg=audit(1258412043.107:81): avc:  denied  { search } for  pid=1750 
comm="httpd" name="root" dev=sda5 ino=1062881 
scontext=system_u:system_r:httpd_t:s0 
tcontext=system_u:object_r:admin_home_t:s0 tclass=dir
:

In SE-PgSQL, it writes out detailed information about access violations
on log files. Then, it can be used to revise security policy.

For example:
  postgres=# UPDATE t1 SET a = 1;
  ERROR:  SELinux: security policy violation

It shall be logged as follows:
  LOG:  SELinux: denied { update } 
scontext=unconfined_u:unconfined_r:sepgsql_test_t:Unclassified 
tcontext=system_u:object_r:sepgsql_ro_table_t:Classified tclass=db_table name=t1
  STATEMENT:  UPDATE t1 SET a = 1;
  ERROR:  SELinux: security policy violation
  STATEMENT:  UPDATE t1 SET a = 1;

We can also provide these logs to analyzer programs to pop-up a hint for
trouble-shooting (setroubleshoot), to generate policy module automatically
from access violation logs (audit2allow), and so on.

Thanks,
-- 
OSS Platform Development Division, NEC
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Nikhil Sontakke

Hi,

>> What would probably be helpful here is to take the mess of raw data
>> above and turn it into a simpler partitioning roadmap.
>
> Thanks for summarising.
>

Yeah, excellent summary Greg. As you rightly pointed out, partitioning
needs a broad roadmap so that the community can contribute in unison.
That ways we can in future avoid decent efforts like Manu's which
might not bear any fruit because of the prevailing confusion today..

> I briefly tried to do that on the thread for Itagaki-san's patch. That's
> a first stab at things, at least.

+1. Itagaki-san's patch seems like a firm foot forward.

Regards,
Nikhils
-- 
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SE-PgSQL patch review

2009-11-23 Thread KaiGai Kohei

Itagaki Takahiro wrote:
> I'm reviewing SE-PgSQL patch and have some comments.
> https://commitfest.postgresql.org/action/patch_view?id=212
> 
> Forgive me, but I'm a novice of SELinux. Some of the comments
> and question might be nonsense.

Itagaki-san, thanks for your volunteering so much!

>  Patch overview 
> The patch itself is large (>13K), but more than half of it are
> well-written regression test, comments and README.
> 
> It adds object-level and column-level security checks after normal
> permission checks. Row-level checks are not included in the patch.

Yes, I separated most of features to reduce code size.
The developer documentation and source code comments were added instead.

>  Syntax and identifier names 
> * Can we use "security context" only for securty checking but also
>   for general "context" handling? If so, we can call it just "context".
>   It might be useful if the context is designed for more general purpose.
>   For example, we could develop context-based logging or contex-aware
>   setting parameters. I think "Application Name" patch is one of the
>   implementations of context-based logging.
>   https://commitfest.postgresql.org/action/patch_view?id=195

The "security context" is a proper-noun in SELinux.
I don't think it is a good idea to call it just a "context".

If we have a set of properties of the client called as a context,
security context is one of the property in a context, such as user
identifier, application name and so on.

If Log_line_prefix supports the security context of the client,
it seems me a good idea. (However, we can implement it in the future.)

BTW, SELinux also provides indications about what should be logged
on access violation events. It is also an duty of userspace object
manager. See the sepgsql_audit_log() in selinux.c.

> * CREATE TABLE tbl (...) SECURITY_CONTEXT = '...'
>   IMHO, '_' and '=' don't suit for SQL.
>   If there are no conflicts in gram.y, how about removing them?
>   CREATE TABLE tbl (...) SECURITY CONTEXT '...'

I tried it, and it seems to me fine.

> * CREATE TABLE tbl (col integer AS SECURITY_CONTEXT = '...')
>   Is the syntax " SECURITY_CONTEXT" natural in English?

We need to put a reserved token, such as "AS", prior to the SECURITY_CONTEXT
to avoid syntax conflicts to "DEFAULT b_expr" option.

> * Since SE-PgSQL will be integrated into the core, we can use pg_*
>   names for the feature. For example, we can rename sepgsql_getcon() to
>   pg_get_security_context(). Also, we should not use 'con' as an 
>   abbreviated form of 'context' because we often use it for 'connection'.
>   The same can be said for GUC parameter names.

What is your opinion for '*_secon()', instead of '*_security_context'?
If OK, these functions will be renamed as follows:

- sepgsql_template1_getcon -> pg_get_template1_secon
- sepgsql_default_getcon   -> pg_get_default_secon
- sepgsql_getcon   -> pg_get_client_secon
- sepgsql_database_getcon  -> pg_get_database_secon
- sepgsql_schema_getcon-> pg_get_schema_secon
- sepgsql_relation_getcon  -> pg_get_relation_secon
- sepgsql_attribute_getcon -> pg_get_attribute_secon

>   (ex: "sepostgresql" to be "security_policy")

I prefer "selinux_support" more than "security_policy", because
it is a noun to mean a set of access control rules in the kernel.

>  Error messages and error codes 
> * It uses dedicated 'SExxx' error codes, but I think they should belong to
>   the same family of ERRCODE_INSUFFICIENT_PRIVILEGE (42501).
> /* Class SE - SE-PostgreSQL Error (SE-PgSQL-specific error class) */
> #define ERRCODE_SELINUX_INTERNAL_ERRORMAKE_SQLSTATE('S','E', 
> '0','0','1')
> #define ERRCODE_INVALID_SECURITY_CONTEXT  MAKE_SQLSTATE('S','E', 
> '0','0','2')
> #define ERRCODE_SELINUX_AUDIT_LOG MAKE_SQLSTATE('S','E', 
> '0','0','3')

I already uses predefined error code, if exist.
For example, sepgsql_compute_perms() is the routine to make access control
decision in SELinux. It uses ERRCODE_INSUFFICIENT_PRIVILEGE, when it needs
to raise an error.

  extern bool
  sepgsql_compute_perms(char *scontext, char *tcontext,
uint16 tclass, uint32 required,
const char *audit_name, bool abort)
  {
  :
  /*
   * It asks SELinux whether the given access should be allowed,
   * or not. It set the given avd structure correctly, then returns.
   */
  compute_perms_internal(scontext, tcontext, tclass, &avd);
  :
  /*
   * If here is no policy violations, or SE-PgSQL performs in permissive
   * mode, or the client process peforms in permissive domain, it returns
   * normally with 'true'.
   */
  if (!denied ||
  !sepgsql_get_enforce() ||
  (avd.flags & SELINUX_AVD_FLAGS_PERMISSIVE) != 0)
  return true;
  :
  /*
   * Otherwise, it raises an error or returns 'false', depending on the
   * caller's indication

Re: [HACKERS] named generic constraints [feature request]

2009-11-23 Thread Pavel Stehule

2009/11/24 Caleb Cushing :
>> CREATE OR REPLACE FUNCTION emptystr(text)
>> RETURNS bool AS $$
>>  SELECT $1 <> ''; -- it is SQL not C
>> $$ LANGUAGE sql;
>>
>> CREATE TABLE users(
>>  username TEXT CHECK (NOT emptystr(username)),
>
> although I'm not going to continue discussing the request. this code
> as the opposite desired effect. it should be
>
> SELECT $1 = ''; -- you have a double negative

no -

"--" is line comment in SQL - it same like "//" in C++

Regards
Pavel Stehule

>
> --
> Caleb Cushing
>
> http://xenoterracide.blogspot.com
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Itagaki Takahiro

Tom Lane  wrote:

> What we need first is an explicit representation of partitioning, and
> then to build routing code on top of that.  I haven't looked at
> Itagaki-san's syntax patch at all, but I think it's at least starting
> in a sensible place.

I have the following development plan for partitioning.
I'll continue to use inherits-based partitioning... at least in 8.5.

8.5 Alpha 3:
Syntax and catalog changes (on-disk structure).
I think pg_dump is the biggest stopper in the phase.

8.5 Alpha 4:
Internal representation (on-memory structure), that will replace
insert-triggers first, and also replace CHECK constraints if
possible (but probably non-INSERT optimizations will slide to 8.6).

The internal representation of RANGE partitions will be an array of
pairs of { upper-value, parition-relid } for each parent table.
An insert target partition are determined using binary search on insert.
It will be faster than sequential checks of CHECK constraint
especially in large number of child tables. The array will be kept
in CacheMemoryContext or query context to reduce access to the system
catalog. RelationData or TupleDesc will have an additional field for it.

> * It only applies to COPY.  You'd certainly want routing for INSERT as
>   well.  And it shouldn't be necessary to specify an option.

Sure. We need the routingin both INSERT and COPY. Even if Emmanuel-san's
patch will be committed in Alpha 3, the code would be discarded in Alpha 4.

> * Building this type of infrastructure on top of independent, not
>   guaranteed consistent table constraints is just throwing more work
>   into a dead end.

I think the current approach is not necessarily wrong for CHECK-based
partitioning, but I'd like to have more specialized or generalized
functionality for the replacement of triggers.

If we will take specialized approach, triggers will be replaced with
built-in feature. We can only use RANGE and LIST partitions.

On the other hand, it might be interesting to take some generalized
approach; For example, spliting BEFORE INSERT triggers into 3 phases:
  1. Can cancel the insert and modify the new tuple.
  2. Can cancel the insert, but cannot modify tuple.
  3. Neigher can cancel nor modify.
We call triggers in the number order. INSERT TRIGGERs are implemented
in 2nd phase, so we're not afraid of modifing partition keys.
(3rd phase will be used for replication trigger.)

However, I think generalized one is overkill.
A specialized approach would be enough.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] named generic constraints [feature request]

2009-11-23 Thread Josh Berkus

Caleb,

I can understand why you want this.  However, it would be tricky to
implement because of data typing, and is fairly easily worked around
using either domains or functions.  So I don't think anyone is going to
want to add it to the TODO list, sorry.

Of course, Postgres is fully hackable if you *really* want it.

--Josh Berkus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: log query in auto-explain

2009-11-23 Thread Itagaki Takahiro


Andrew Dunstan  wrote:

> I mean it should be in the Explain output:
> http://www.postgresql.org/2009/explain";>
>   SELECT '' AS zero, BOOLTBL1.*
> A number of users (including me) badly want to be able to extract the 
> explain output from the log files with the query text included.

I see it's useful for non-text output format.
+1 for the extension.

Also please include the documentation fix because it contains
a sample output using STATEMENT section.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: log query in auto-explain

2009-11-23 Thread Andrew Dunstan




Itagaki Takahiro wrote:

Andrew Dunstan  wrote:

  
Basically it includes the text of the query being explained in the 
explain output.



I expected the query text is printed in "STATEMENT" section.
Do you mean the query should be merged into "LOG" section?
Are there any situation where "STATEMENT" section does not work?
  



I mean it should be in the Explain output:

   http://www.postgresql.org/2009/explain";>
 SELECT '' AS zero, BOOLTBL1.*
  FROM BOOLTBL1
  WHERE booleq(bool 'false', f1);
 
   Seq Scan
   booltbl1
   booltbl1
   0.00
   42.88
   877
   1
   booleq(false, f1)
 
   

This is especially important for structured output like XML and/or JSON. 
A number of users (including me) badly want to be able to extract the 
explain output from the log files with the query text included.


cheers

andrew


  


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] SE-PgSQL patch review

2009-11-23 Thread Itagaki Takahiro

I'm reviewing SE-PgSQL patch and have some comments.
https://commitfest.postgresql.org/action/patch_view?id=212

Forgive me, but I'm a novice of SELinux. Some of the comments
and question might be nonsense.

 Patch overview 
The patch itself is large (>13K), but more than half of it are
well-written regression test, comments and README.

It adds object-level and column-level security checks after normal
permission checks. Row-level checks are not included in the patch.

 Syntax and identifier names 
* Can we use "security context" only for securty checking but also
  for general "context" handling? If so, we can call it just "context".
  It might be useful if the context is designed for more general purpose.
  For example, we could develop context-based logging or contex-aware
  setting parameters. I think "Application Name" patch is one of the
  implementations of context-based logging.
  https://commitfest.postgresql.org/action/patch_view?id=195

* CREATE TABLE tbl (...) SECURITY_CONTEXT = '...'
  IMHO, '_' and '=' don't suit for SQL.
  If there are no conflicts in gram.y, how about removing them?
  CREATE TABLE tbl (...) SECURITY CONTEXT '...'

* CREATE TABLE tbl (col integer AS SECURITY_CONTEXT = '...')
  Is the syntax " SECURITY_CONTEXT" natural in English?

* Since SE-PgSQL will be integrated into the core, we can use pg_*
  names for the feature. For example, we can rename sepgsql_getcon() to
  pg_get_security_context(). Also, we should not use 'con' as an 
  abbreviated form of 'context' because we often use it for 'connection'.
  The same can be said for GUC parameter names.
  (ex: "sepostgresql" to be "security_policy")

 Error messages and error codes 
* It uses dedicated 'SExxx' error codes, but I think they should belong to
  the same family of ERRCODE_INSUFFICIENT_PRIVILEGE (42501).
/* Class SE - SE-PostgreSQL Error (SE-PgSQL-specific error class) */
#define ERRCODE_SELINUX_INTERNAL_ERRORMAKE_SQLSTATE('S','E', 
'0','0','1')
#define ERRCODE_INVALID_SECURITY_CONTEXT  MAKE_SQLSTATE('S','E', 
'0','0','2')
#define ERRCODE_SELINUX_AUDIT_LOG MAKE_SQLSTATE('S','E', 
'0','0','3')

* Can we use error messages that looks like existing privilege errors?
   ERRCODE_INSUFFICIENT_PRIVILEGE:
 => permission denied to rename database
   Error messages from SE-PgSQL
 => SE-PgSQL prevents to modify "pg_class" by hand
  looks very different. I'd suggest something like
 => security context denied to rename database

  I'll suggest we avoid using the term "SE-PgSQL" in the user visible
  message and documentation because because the feature will be integrated 
  into the core deeply. Of course, we can use "SE-PgSQL" in *promotion*.

 Internal structures 
* Are the security labels enough stable?
  What will we need when SELinux configuration is modified?
  We store security labels as text for each object and column.

* Each security checks are called after pg_*_aclcheck(). Is it possible to
  merge SE-PgSQL checks into ACL functions in aclchk.c ?

* What is difference between sepgsql_database_getcon(oid) and 
  pg_database.datsecon? Why do we need those getcon functions?


That's all comments for now.  I'll test the patch in actual machines next.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

2009-11-23 Thread Andrew Dunstan




Greg Smith wrote:
I haven't heard anything from Andrew about ragged CVS import either.  
I think that ultimately those features are useful, but just exceed 
what the existing code could be hacked to handle cleanly. 


The patch is attached for your edification/amusement. I have backpatched 
it to 8.4 for the client that needed it, and it's working just fine. I 
didn't pursue it when it was clear that it was not going to be accepted. 
COPY returning text[] would allow us to achieve the same thing, a bit 
more verbosely, but it would be a lot more work to develop.


cheers

andrew
Index: src/backend/commands/copy.c
===
RCS file: /cvsroot/pgsql/src/backend/commands/copy.c,v
retrieving revision 1.316
diff -c -r1.316 copy.c
*** src/backend/commands/copy.c	29 Jul 2009 20:56:18 -	1.316
--- src/backend/commands/copy.c	13 Sep 2009 02:57:16 -
***
*** 116,121 
--- 116,122 
  	char	   *escape;			/* CSV escape char (must be 1 byte) */
  	bool	   *force_quote_flags;		/* per-column CSV FQ flags */
  	bool	   *force_notnull_flags;	/* per-column CSV FNN flags */
+ 	boolragged; /* allow ragged CSV input? */
  
  	/* these are just for error messages, see copy_in_error_callback */
  	const char *cur_relname;	/* table name for error messages */
***
*** 822,827 
--- 823,836 
  		 errmsg("conflicting or redundant options")));
  			force_notnull = (List *) defel->arg;
  		}
+ 		else if (strcmp(defel->defname, "ragged") == 0)
+ 		{
+ 			if (cstate->ragged)
+ ereport(ERROR,
+ 		(errcode(ERRCODE_SYNTAX_ERROR),
+ 		 errmsg("conflicting or redundant options")));
+ 			cstate->ragged = intVal(defel->arg);
+ 		}
  		else
  			elog(ERROR, "option \"%s\" not recognized",
   defel->defname);
***
*** 948,953 
--- 957,972 
  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  			  errmsg("COPY force not null only available using COPY FROM")));
  
+ 	/* Check ragged */
+ 	if (!cstate->csv_mode && cstate->ragged)
+ 		ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+  errmsg("COPY ragged available only in CSV mode")));
+ 	if (cstate->ragged && !is_from)
+ 		ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 			  errmsg("COPY  ragged only available using COPY FROM")));
+ 
  	/* Don't allow the delimiter to appear in the null string. */
  	if (strchr(cstate->null_print, cstate->delim[0]) != NULL)
  		ereport(ERROR,
***
*** 2951,2964 
  		int			input_len;
  
  		/* Make sure space remains in fieldvals[] */
! 		if (fieldno >= maxfields)
  			ereport(ERROR,
  	(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
  	 errmsg("extra data after last expected column")));
  
  		/* Remember start of field on both input and output sides */
  		start_ptr = cur_ptr;
! 		fieldvals[fieldno] = output_ptr;
  
  		/*
  		 * Scan data for field,
--- 2970,2984 
  		int			input_len;
  
  		/* Make sure space remains in fieldvals[] */
! 		if (fieldno >= maxfields && ! cstate->ragged)
  			ereport(ERROR,
  	(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
  	 errmsg("extra data after last expected column")));
  
  		/* Remember start of field on both input and output sides */
  		start_ptr = cur_ptr;
! 		if (fieldno < maxfields)
! 			fieldvals[fieldno] = output_ptr;
  
  		/*
  		 * Scan data for field,
***
*** 3045,3051 
  		/* Check whether raw input matched null marker */
  		input_len = end_ptr - start_ptr;
  		if (!saw_quote && input_len == cstate->null_print_len &&
! 			strncmp(start_ptr, cstate->null_print, input_len) == 0)
  			fieldvals[fieldno] = NULL;
  
  		fieldno++;
--- 3065,3072 
  		/* Check whether raw input matched null marker */
  		input_len = end_ptr - start_ptr;
  		if (!saw_quote && input_len == cstate->null_print_len &&
! 			strncmp(start_ptr, cstate->null_print, input_len) == 0 &&
! 			fieldno < maxfields)
  			fieldvals[fieldno] = NULL;
  
  		fieldno++;
***
*** 3059,3065 
  	Assert(*output_ptr == '\0');
  	cstate->attribute_buf.len = (output_ptr - cstate->attribute_buf.data);
  
! 	return fieldno;
  }
  
  
--- 3080,3092 
  	Assert(*output_ptr == '\0');
  	cstate->attribute_buf.len = (output_ptr - cstate->attribute_buf.data);
  
! 	/* for ragged input, set field null for underflowed fields */
! 	if (cstate->ragged)
! 		while (fieldno  < maxfields)
! 			fieldvals[fieldno++] = NULL;
! 
! 
! 	return cstate->ragged ? maxfields  : fieldno;
  }
  
  
Index: src/backend/parser/gram.y
===
RCS file: /cvsroot/pgsql/src/backend/parser/gram.y,v
retrieving revision 2.677
diff -c -r2.677 gram.y
*** src/backend/parser/gram.y	18 Aug 2009 23:40:20 -	2.677
--- src/backend/parser/gram.y	13 Sep 2009 02:57:17 -
***
*** 504,510 
  
  	QUOTE
  
! 	RANGE READ REAL REASSIGN RECHECK RECURSIVE REFERENCES REINDEX
  	RELATIVE_P RELEASE R

Re: [HACKERS] named generic constraints [feature request]

2009-11-23 Thread Caleb Cushing

> CREATE OR REPLACE FUNCTION emptystr(text)
> RETURNS bool AS $$
>  SELECT $1 <> ''; -- it is SQL not C
> $$ LANGUAGE sql;
>
> CREATE TABLE users(
>  username TEXT CHECK (NOT emptystr(username)),

although I'm not going to continue discussing the request. this code
as the opposite desired effect. it should be

SELECT $1 = ''; -- you have a double negative

-- 
Caleb Cushing

http://xenoterracide.blogspot.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

2009-11-23 Thread Greg Smith


Tom Lane wrote:

Those discussions don't have a lot of credibility if they didn't take
place on the public mailing lists.
  
You know how people complain about how new contributors are treated 
here?  Throwing out comments like this, that come off as belittling to 
other people's work, doesn't help.  All I was suggesting was that Dan 
wasn't developing this in complete isolation from the hackers community 
as Robert had feared, as will be obvious when we get to:



pgsql-hackers had some preliminary discussions a couple months back
on refactoring COPY to allow things like this --- see the thread
starting here:
http://archives.postgresql.org/pgsql-hackers/2009-09/msg00486.php
While I don't think we arrived at any final decisions, I would like
to know how this design fits in with what was discussed then.
  
The patch provided here is a first step toward what you suggested in the 
related "Copy enhancements thread" a few days later, at 
http://archives.postgresql.org/pgsql-hackers/2009-09/msg00616.php  It's 
one way to implement a better decoupled "data transformation" layer on 
top of COPY.  When Dan showed me an earlier implementation of the basic 
idea embodied in this patch (developed independently and earlier 
actually), I gave it a thumbs-up as seeming to match the general 
direction the community discussion suggested, and encouraged him to work 
on getting the code to where it could be released.  He started with 
output rather than input, mainly because the dblink feature had a 
compelling use-case that justified spending time on development for the 
company.  Rather than keep going into input transformation, this 
development checkpoint made a good place to pause and solicit public 
feedback, particularly since the input side has additional hurdles to 
clear before it can work.


As far as other past discussion here that might be relevant, this patch 
includes a direct change to gram.y to support the new syntax.  You've 
already suggested before that it might be time to update COPY the same 
way EXPLAIN and now VACUUM have been overhauled to provide a more 
flexible options interface:  
http://archives.postgresql.org/pgsql-hackers/2009-09/msg00616.php  This 
patch might be more fuel for that idea.


Emmanuel has given up the more error tolerant COPY patch that thread was 
associated with, and I haven't heard anything from Andrew about ragged 
CVS import either.  I think that ultimately those features are useful, 
but just exceed what the existing code could be hacked to handle 
cleanly.  If it's possible to lower the complexity bar to implementing 
them by abstracting the transformation into a set of functions, and have 
more flexible syntax for the built-in ones the database ships with, that 
may be useful groundwork for returning to implementing those features 
without bloating COPY's internals (and therefore performance) 
intolerably.  Dan even suggested in his first note that it might be 
possible to write the current STDOUT|file behavior in terms of the new 
function interface, which has the potential to make the COPY 
implementation cleaner rather than more cluttered (as long as the 
performance doesn't suffer).


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN BUFFERS

2009-11-23 Thread Itagaki Takahiro


Jeff Janes  wrote:

> Just a quick note:  this patch does not apply cleanly to HEAD due to
> the subsequent removal from explain.c of the near-by lines:

Thanks for reporting.
The attached patch is rebased to current CVS.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



explain_buffers_20091124.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby and removing VACUUM FULL

2009-11-23 Thread Itagaki Takahiro

Heikki Linnakangas  wrote:

> So I guess what I'm asking is: Does anyone see any show-stoppers in
> removing VACUUM FULL, and does anyone want to step up to the plate and
> promise to do it before release?

I'm working on "New VACUUM FULL" patch, that shrinks tables
using CLUSTER-like rewrites.
https://commitfest.postgresql.org/action/patch_view?id=202

What can I do for the Hot Standby issue?
VACUUM FULL is still reserved for system catalogs in my patch
because we cannot modify relfilenodes for the catalog tables.
Do you have solutions for it?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TRIGGER with WHEN clause

2009-11-23 Thread Itagaki Takahiro


Tom Lane  wrote:

> > [ TRIGGER WHEN patch ]
> Applied with assorted revisions.  AFAICT the system column issue is only
> a problem for NEW references in BEFORE triggers --- those columns do
> read correctly in OLD, and all the time in AFTER triggers.  I revised
> the parsing logic to enforce that.  The patch also missed establishing
> dependencies for stuff referenced in the WHEN expression, and there were
> a few other minor problems.

Thanks for your fix.

>> * Keep the expression in nodeToString string form within the TriggerDesc
>> structure; then it's just one more pfree in FreeTriggerDesc.  The main
>> disadvantage of this is that we'd have to repeat stringToNode every time
>> the trigger is used.

I read your fix for the repeated compiling issue is to cache compiled
expressions in ResultRelInfo->ri_TrigWhenExprs. So, we can reuse the
result of stringToNode when we modify multiple rows in a query.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Updating column on row update

2009-11-23 Thread Craig Ringer

On 23/11/2009 11:35 PM, Tom Lane wrote:
> Andrew Gierth  writes:
>> "Tom" == Tom Lane  writes:
>>  Tom> Well, that's pretty much exactly the question --- are there?  It
>>  Tom> would certainly make it easier for someone to exploit any other
>>  Tom> security weakness they might find.
> 
>> Loops in plain SQL are no problem: see generate_series. The last time
>> we discussed this I demonstrated reasonably straightforward SQL
>> examples of how to do things like password-cracking (and that was long
>> before we had CTEs, so it would be even easier now); my challenge to
>> anyone to produce examples of malicious plpgsql code that couldn't be
>> reproduced in plain SQL went unanswered.
> 
> The fact remains though that the looping performance of anything you can
> cons up in straight SQL will be an order of magnitude worse than in
> plpgsql; and it's a notation the average script kiddie will find pretty
> unfamiliar.  So I think this still does represent some barrier.  Whether
> it's enough of a barrier to justify keeping plpgsql out of the default
> install is certainly debatable.

"the average script kiddie" doesn't write their own exploits. They tend
to use proofs of concept created by very, very smart people involved in
active security research (be it malicious or not) that've been wrapped
up in easy-to-use "click to exploit" tools. They'll no more learn SQL to
explot Pg than they learn x86 asm to write their payload injectors, or
learn details about the Linux kernel to use a local root exploit.

Just making it a bit harder doesn't stop determined attackers, such as
security researchers, criminals seeking confidential information (credit
card databases etc) or commercially-motivated black hats seeking to
build botnets. Once the determined attackers find a way, the script
kiddies and the botnet builders tend to follow.

Any attack relying on the presence of PL/PgSQL will have to be an attack
by an already-authenticated user* with an established connection. Mass
botnet- or worm- style exploits are out. That said, PL/PgSQL does
undeniably increase the available attack surface for a rogue authorized
user, simply by being more code that has to be free of security issues.
An authenticated user might seek to escalate to DB superuser priveleges,
gain access to other DBs, or execute code as the "postgres" user to
trigger a local root exploit on the hosting machine. So there is some
concern, since not all authenticated users are necessarily fully trusted.

It's for that reason that I proposed it being made available by default
only to superusers and to the owner of the current database. If either
of those are executing malicious code, you're already well and truly
screwed. Thinking about it some more, though, there's nothing that'll
let the DB owner (unlike the superuser) execute supplied code as the
"postgres" user or break into other DBs, so an exploit against PL/PgSQL
might still give them a way to escalate priveleges.

I don't see making PL/PgSQL available by default only to a superuser as
particularly useful. Anything else does increase the attack surface
available for potential exploit. So the question becomes "is that
increase a sufficient worry to block the installation of PL/PgSQL by
default" ?

Personally, I think not. Default installing PL/PgSQL doesn't increase
the risk of a worm, which would be my main worry about enabling a
feature by default. PL/PgSQL is so widely used that any security issue
with it is already about as critical as it can be. Those few with
significant databases who do NOT use it can drop it if they are
concerned, but I sincerely doubt there are many significant production
DBs out there without it installed and in use, as it's effectively a
requirement for the use of triggers.

So - I say go ahead and install it by default, available to all users.
It's meant to be a trusted language, it's basically required for
triggers, it's nearly universal anyway, and it's a pain not having it
installed.

If it's not to be installed by default, then a cleaner way to ensure
it's installed it would be strongly desirable.

(I'm also honestly not sure what relevance performance has here. If it
takes an attacker 10 minutes to exploit a server rather than 1 minute,
is it any less cracked? Performance is only an issue if it renders an
attack impossible due to memory/storage requirements, or non-linear
computation time growth. Anyway, I frequently seek to avoid PL/PgSQL,
using pure SQL loops etc instead, because it's *faster* that way.)

* It could, I guess, be a hijacked connection, but if you have
connection hijacking going on you've already lost and have bigger things
to worry about. Otherwise it's going to be a stolen username/password,
or an authorized user gone rogue.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: log query in auto-explain

2009-11-23 Thread Itagaki Takahiro

Andrew Dunstan  wrote:

> Basically it includes the text of the query being explained in the 
> explain output.

I expected the query text is printed in "STATEMENT" section.
Do you mean the query should be merged into "LOG" section?
Are there any situation where "STATEMENT" section does not work?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

2009-11-23 Thread Daniel Farina

On Mon, Nov 23, 2009 at 4:20 PM, Tom Lane  wrote:
> pgsql-hackers had some preliminary discussions a couple months back
> on refactoring COPY to allow things like this --- see the thread
> starting here:
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00486.php
> While I don't think we arrived at any final decisions, I would like
> to know how this design fits in with what was discussed then.

This seems to be about importing/ingress, whereas this patch is about
exporting/egress...it is an interesting question on how much parsing
to do before on the ingress side before handing a row to a function
though, should we try to make these kinds of operations a bit more
symmetrical.

I did consider refactoring COPY, but since it's never clear when we
start a feature whether it is going to manifest itself as a good
upstream candidate we default to trying to make future merges with
Postgres tractable I did not take on such a large and artistic task.

fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

2009-11-23 Thread Tom Lane

Greg Smith  writes:
> Robert Haas wrote:
>> I'm fuzzy on what problem this is attempting to solve...  as mentioned
>> in the above guidelines, it's usually good to start with some design
>> discussions before writing/submitting code.

> This has been through some heavy design discussions with a few PG 
> hackers you know and some you don't, they just couldn't release the 
> result until now.

Those discussions don't have a lot of credibility if they didn't take
place on the public mailing lists.

pgsql-hackers had some preliminary discussions a couple months back
on refactoring COPY to allow things like this --- see the thread
starting here:
http://archives.postgresql.org/pgsql-hackers/2009-09/msg00486.php
While I don't think we arrived at any final decisions, I would like
to know how this design fits in with what was discussed then.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

2009-11-23 Thread Daniel Farina

On Mon, Nov 23, 2009 at 4:03 PM, Daniel Farina  wrote:
> Yes.  Take a look at the tests introduced to core PostgeSQL (see patch
> 2), where instead of returning a text[] I return just a single text of
> the verbatim output of the copy.  You could imagine making that an SRF
> instead.  It would have to understand COPY row delimiters in whatever
> mode you were operating in, though.

Actually, sorry, I lie, but not in a bad way  Since COPY operates
row at a time (rather than a stream of bytes with arbitrary
boundaries) you could rely on being passed each record one-at-a-time.
You don't have to understand the delimiter.  So you could even make a
bytea[][] that even contains the binary output, the first dimension
being row number, the second being the bytes themselves.  The header
would pose an interesting problem, though.

fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

2009-11-23 Thread Daniel Farina

On Mon, Nov 23, 2009 at 3:46 PM, Andrew Dunstan  wrote:
> I recently found myself trying to push data through dblink() and ended up
> writing code to make a call to the target to call a function which called
> back to the source to select the data and insert it. The speedup was
> massive, so I'll be interested to dig into the details here.

The way the indirection is accomplished here should be very cheap.
Overhead should be comparable to the fwrite() call that is used for
copying to a file...this is why I mentioned that it would be
interesting to make this good enough to be the underlying mechanism of
TO STDOUT/TO 'file' to reduce the overall number of mechanisms used to
perform COPY TO.

> I'm also interested in COPY returning rows as text[], which was discussed
> recently. Does this help move us towards that?

Yes.  Take a look at the tests introduced to core PostgeSQL (see patch
2), where instead of returning a text[] I return just a single text of
the verbatim output of the copy.  You could imagine making that an SRF
instead.  It would have to understand COPY row delimiters in whatever
mode you were operating in, though.

fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

2009-11-23 Thread Andrew Dunstan




Greg Smith wrote:

Robert Haas wrote:

I'm fuzzy on what problem this is attempting to solve...  as mentioned
in the above guidelines, it's usually good to start with some design
discussions before writing/submitting code.
This has been through some heavy design discussions with a few PG 
hackers you know and some you don't, they just couldn't release the 
result until now.  As for what it's good for, if you look at what you 
can do now with dblink, you can easily move rows between nodes using 
dblink_build_sql_insert.  This is perfectly fine for small bits of 
work, but the performance isn't nearly good enough to do serious 
replication with it.  The upper level patch here allows using COPY as 
the mechanism to move things between them, which is much faster for 
some use cases (which includes most of the really useful ones).  It 
dramatically increases the scale of what you can move around using 
dblink as the replication transport.


I recently found myself trying to push data through dblink() and ended 
up writing code to make a call to the target to call a function which 
called back to the source to select the data and insert it. The speedup 
was massive, so I'll be interested to dig into the details here.


The lower level patch is needed to build that layer, which is an 
immediate proof of its utility.  In addition, adding a user-defined 
function as a COPY target opens up all sorts of possibilities for 
things like efficient ETL implementation.  And if this approach is 
accepted as a reasonable one, as Dan suggested a next step might even 
be to similarly allow passing COPY FROM through a UDF, which has the 
potential to provide a new efficient implementation path for some of 
the custom input filter requests that pop up here periodically.


I'm also interested in COPY returning rows as text[], which was 
discussed recently. Does this help move us towards that?


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

2009-11-23 Thread Daniel Farina

On Mon, Nov 23, 2009 at 2:16 PM, Robert Haas  wrote:
> On Mon, Nov 23, 2009 at 4:34 PM, Daniel Farina  wrote:
>> Signed-off-by: Daniel Farina 
>
> Thanks for the patch.  You may want to take a look at this:
>
> http://wiki.postgresql.org/wiki/Submitting_a_Patch
>
> I'm fuzzy on what problem this is attempting to solve...

It seems somewhat strange that the only things COPY can do with its
output stream of bytes is exactly two modes that are baked into
Postgres in the core.  This allows carefully written UDFs to do
whatever they will with the stream of bytes, such as sending into a
waiting libpq connection.

> as mentioned in the above guidelines, it's usually good to start with some 
> design
> discussions before writing/submitting code.

The patch is derived from functionality in the Truviso
postgres-derived database product which is non-optional.  This is
extruded from that.

> Also, we prefer that patches be submitted as context diffs

I actually remembered this right after I sent it...sorry about that.

>  And that they not be split up over multiple emails.

With the possible exception of squashing together the test cases into
their implementing patches, I would say this is at least two patches.
One is to a contrib, the other to core PostgreSQL.  It so happens the
core addition makes the contrib changes much more obviously useful.

fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

2009-11-23 Thread Greg Smith


Robert Haas wrote:

I'm fuzzy on what problem this is attempting to solve...  as mentioned
in the above guidelines, it's usually good to start with some design
discussions before writing/submitting code.
This has been through some heavy design discussions with a few PG 
hackers you know and some you don't, they just couldn't release the 
result until now.  As for what it's good for, if you look at what you 
can do now with dblink, you can easily move rows between nodes using 
dblink_build_sql_insert.  This is perfectly fine for small bits of work, 
but the performance isn't nearly good enough to do serious replication 
with it.  The upper level patch here allows using COPY as the mechanism 
to move things between them, which is much faster for some use cases 
(which includes most of the really useful ones).  It dramatically 
increases the scale of what you can move around using dblink as the 
replication transport.


The lower level patch is needed to build that layer, which is an 
immediate proof of its utility.  In addition, adding a user-defined 
function as a COPY target opens up all sorts of possibilities for things 
like efficient ETL implementation.  And if this approach is accepted as 
a reasonable one, as Dan suggested a next step might even be to 
similarly allow passing COPY FROM through a UDF, which has the potential 
to provide a new efficient implementation path for some of the custom 
input filter requests that pop up here periodically.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

2009-11-23 Thread Robert Haas

On Mon, Nov 23, 2009 at 4:34 PM, Daniel Farina  wrote:
> Signed-off-by: Daniel Farina 

Thanks for the patch.  You may want to take a look at this:

http://wiki.postgresql.org/wiki/Submitting_a_Patch

I'm fuzzy on what problem this is attempting to solve...  as mentioned
in the above guidelines, it's usually good to start with some design
discussions before writing/submitting code.  Also, we prefer that
patches be submitted as context diffs and that they not be split up
over multiple emails.

Thanks,

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION

2009-11-23 Thread Daniel Farina

Signed-off-by: Daniel Farina 
---
 contrib/dblink/expected/dblink.out |  272 
 contrib/dblink/sql/dblink.sql  |  112 +++
 2 files changed, 384 insertions(+), 0 deletions(-)

diff --git a/contrib/dblink/expected/dblink.out 
b/contrib/dblink/expected/dblink.out
index d39aa45..788b2a3 100644
--- a/contrib/dblink/expected/dblink.out
+++ b/contrib/dblink/expected/dblink.out
@@ -872,6 +872,278 @@ SELECT * from dblink_get_notify();
 -++---
 (0 rows)
 
+-- test COPY ... TO FUNCTION support
+CREATE SCHEMA dblink_copy_to_function;
+SET search_path = dblink_copy_to_function, public;
+CREATE TABLE xyzzy(f1 int, f2 text, f3 text[], primary key (f1,f2));
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "xyzzy_pkey" 
for table "xyzzy"
+INSERT INTO xyzzy VALUES (0,'a','{"a0","b0","c0"}');
+INSERT INTO xyzzy VALUES (1,'b','{"a1","b1","c1"}');
+INSERT INTO xyzzy VALUES (2,'c','{"a2","b2","c2"}');
+INSERT INTO xyzzy VALUES (3,'d','{"a3","b3","c3"}');
+INSERT INTO xyzzy VALUES (4,'e','{"a4","b4","c4"}');
+INSERT INTO xyzzy VALUES (5,'f','{"a5","b5","c5"}');
+INSERT INTO xyzzy VALUES (6,'g','{"a6","b6","c6"}');
+INSERT INTO xyzzy VALUES (7,'h','{"a7","b7","c7"}');
+INSERT INTO xyzzy VALUES (8,'i','{"a8","b8","c8"}');
+INSERT INTO xyzzy VALUES (9,'j','{"a9","b9","c9"}');
+CREATE TABLE bar(f1 int, f2 text, f3 text[], primary key (f1,f2));
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "bar_pkey" for 
table "bar"
+INSERT INTO bar VALUES (100,'w','{"a100","b100","c100"}');
+INSERT INTO bar VALUES (101,'x','{"a101","b101","c101"}');
+CREATE TABLE baz(f1 int, f2 text, f3 text[], primary key (f1,f2));
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "baz_pkey" for 
table "baz"
+INSERT INTO baz VALUES (102,'y','{"a102","b102","c102"}');
+INSERT INTO baz VALUES (103,'z','{"a103","b103","c103"}');
+CREATE TABLE plugh(f1 int, f2 text, f3 text[], primary key (f1,f2));
+NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "plugh_pkey" 
for table "plugh"
+INSERT INTO plugh VALUES (104,'u','{"a102","b102","c102"}');
+INSERT INTO plugh VALUES (105,'v','{"a103","b103","c103"}');
+SELECT dblink_connect('copytofunction','dbname=contrib_regression');
+ dblink_connect 
+
+ OK
+(1 row)
+
+SELECT dblink_exec('copytofunction',
+   'SET search_path = dblink_copy_to_function, public;');
+ dblink_exec 
+-
+ SET
+(1 row)
+
+-- ensure that original base data is present
+SELECT *
+FROM dblink('copytofunction', 'SELECT * FROM xyzzy') AS t(a int, b text, c 
text[]);
+ a | b | c  
+---+---+
+ 0 | a | {a0,b0,c0}
+ 1 | b | {a1,b1,c1}
+ 2 | c | {a2,b2,c2}
+ 3 | d | {a3,b3,c3}
+ 4 | e | {a4,b4,c4}
+ 5 | f | {a5,b5,c5}
+ 6 | g | {a6,b6,c6}
+ 7 | h | {a7,b7,c7}
+ 8 | i | {a8,b8,c8}
+ 9 | j | {a9,b9,c9}
+(10 rows)
+
+-- try doing a few consecutive copies with one open connection
+SELECT dblink_copy_open('copytofunction', 'xyzzy', false);
+ dblink_copy_open 
+--
+ OK
+(1 row)
+
+COPY bar TO FUNCTION dblink_copy_write;
+COPY baz TO FUNCTION dblink_copy_write;
+SELECT dblink_copy_end();
+ dblink_copy_end 
+-
+ OK
+(1 row)
+
+-- confirm that data has arrived
+SELECT *
+FROM dblink('copytofunction', 'SELECT * FROM xyzzy') AS t(a int, b text, c 
text[]);
+  a  | b |c 
+-+---+--
+   0 | a | {a0,b0,c0}
+   1 | b | {a1,b1,c1}
+   2 | c | {a2,b2,c2}
+   3 | d | {a3,b3,c3}
+   4 | e | {a4,b4,c4}
+   5 | f | {a5,b5,c5}
+   6 | g | {a6,b6,c6}
+   7 | h | {a7,b7,c7}
+   8 | i | {a8,b8,c8}
+   9 | j | {a9,b9,c9}
+ 100 | w | {a100,b100,c100}
+ 101 | x | {a101,b101,c101}
+ 102 | y | {a102,b102,c102}
+ 103 | z | {a103,b103,c103}
+(14 rows)
+
+-- try doing a binary COPY
+SELECT dblink_copy_open('copytofunction', 'xyzzy', true);
+ dblink_copy_open 
+--
+ OK
+(1 row)
+
+COPY plugh TO FUNCTION dblink_copy_write BINARY;
+SELECT dblink_copy_end();
+ dblink_copy_end 
+-
+ OK
+(1 row)
+
+-- confirm that data has arrived
+SELECT *
+FROM dblink('copytofunction', 'SELECT * FROM xyzzy') AS t(a int, b text, c 
text[]);
+  a  | b |c 
+-+---+--
+   0 | a | {a0,b0,c0}
+   1 | b | {a1,b1,c1}
+   2 | c | {a2,b2,c2}
+   3 | d | {a3,b3,c3}
+   4 | e | {a4,b4,c4}
+   5 | f | {a5,b5,c5}
+   6 | g | {a6,b6,c6}
+   7 | h | {a7,b7,c7}
+   8 | i | {a8,b8,c8}
+   9 | j | {a9,b9,c9}
+ 100 | w | {a100,b100,c100}
+ 101 | x | {a101,b101,c101}
+ 102 | y | {a102,b102,c102}
+ 103 | z | {a103,b103,c103}
+ 104 | u | {a102,b102,c102}
+ 105 | v | {a103,b103,c103}
+(16 rows)
+
+-- try using reset to abort out of a copy state
+SELECT dblink_copy_open('copytofunction', 'xyzzy', true);
+ dblink_copy_open 
+--
+ OK
+(1 row)
+
+COPY plugh TO FUNCTION dblink_copy_write BINARY;
+SELECT dblink_connection_reset('copytofunction');
+ dblink_connection_reset 
+-
+ 
+(1 row)
+
+--

Re: [HACKERS] magic block in doc functions

2009-11-23 Thread Peter Eisentraut

On ons, 2009-11-18 at 01:39 -0200, Euler Taveira de Oliveira wrote:
> I noticed that some example functions don't contain the magic block and that
> leads to error while loading those examples in 8.2 or later.
> 
> Attached is a patch that adds it. I also add some missing header file. Don't
> have a strong opinion about backpatching it or not.

Committed to 8.5



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH 1/4] Add "COPY ... TO FUNCTION ..." support

2009-11-23 Thread Daniel Farina

This relatively small change enables all sort of new and shiny evil by
allowing specification of a function to COPY that accepts each
produced row's content in turn.  The function must accept a single
INTERNAL-type argument, which is actually of the type StringInfo.

Patch highlights:

  - Grammar production changes to enable "TO FUNCTION "

  - A new enumeration value in CopyDest to indicate this new mode
called COPY_FN.

  - CopyStateData's "filename" field has been renamed "destination"
and is now a Node type.  Before it was either a string or NULL;
now it may be a RangeVar, a string, or NULL.  Some code now has to
go through some minor strVal() unboxing for the regular TO '/file'
behavior.

  - Due to the relatively restricted way this function can be called
it was possible to reduce per-row overhead by computing the
FunctionCallInfo once and then reusing it, as opposed to simply
using one of the convenience functions in the fmgr.

  - Add and expose the makeNameListFromRangeVar procedure to
src/catalog/namespace.c, the inverse of makeRangeVarFromNameList.

Signed-off-by: Daniel Farina 
---
 src/backend/catalog/namespace.c |   21 +
 src/backend/commands/copy.c |  190 +-
 src/backend/executor/spi.c  |2 +-
 src/backend/nodes/copyfuncs.c   |2 +-
 src/backend/nodes/equalfuncs.c  |2 +-
 src/backend/parser/gram.y   |   30 --
 src/include/catalog/namespace.h |1 +
 src/include/nodes/parsenodes.h  |3 +-
 8 files changed, 212 insertions(+), 39 deletions(-)

diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index 99c9140..8911e29 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -2467,6 +2467,27 @@ QualifiedNameGetCreationNamespace(List *names, char 
**objname_p)
 }
 
 /*
+ * makeNameListFromRangeVar
+ * Utility routine to convert a qualified-name list into RangeVar 
form.
+ */
+List *
+makeNameListFromRangeVar(RangeVar *rangevar)
+{
+   List *names = NIL;
+
+   Assert(rangevar->relname != NULL);
+   names = lcons(makeString(rangevar->relname), names);
+
+   if (rangevar->schemaname != NULL)
+   names = lcons(makeString(rangevar->schemaname), names);
+
+   if (rangevar->catalogname != NULL)
+   names = lcons(makeString(rangevar->catalogname), names);
+
+   return names;
+}
+
+/*
  * makeRangeVarFromNameList
  * Utility routine to convert a qualified-name list into RangeVar 
form.
  */
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 5e95fd7..985505a 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -33,6 +33,7 @@
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "optimizer/planner.h"
+#include "parser/parse_func.h"
 #include "parser/parse_relation.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
@@ -55,7 +56,8 @@ typedef enum CopyDest
 {
COPY_FILE,  /* to/from file */
COPY_OLD_FE,/* to/from frontend (2.0 
protocol) */
-   COPY_NEW_FE /* to/from frontend 
(3.0 protocol) */
+   COPY_NEW_FE,/* to/from frontend (3.0 
protocol) */
+   COPY_FN /* to function */
 } CopyDest;
 
 /*
@@ -104,7 +106,8 @@ typedef struct CopyStateData
Relationrel;/* relation to copy to or from 
*/
QueryDesc  *queryDesc;  /* executable query to copy from */
List   *attnumlist; /* integer list of attnums to copy */
-   char   *filename;   /* filename, or NULL for STDIN/STDOUT */
+   Node   *destination;/* filename, or NULL for STDIN/STDOUT, 
or a
+* function */
boolbinary; /* binary format? */
booloids;   /* include OIDs? */
boolcsv_mode;   /* Comma Separated Value 
format? */
@@ -131,6 +134,13 @@ typedef struct CopyStateData
MemoryContext rowcontext;   /* per-row evaluation context */
 
/*
+* For writing rows out to a function. Used if copy_dest == COPY_FN
+*
+* Avoids repeated use of DirectFunctionCall for efficiency.
+*/
+   FunctionCallInfoDataoutput_fcinfo;
+
+   /*
 * These variables are used to reduce overhead in textual COPY FROM.
 *
 * attribute_buf holds the separated, de-escaped text for each field of
@@ -425,9 +435,11 @@ CopySendEndOfRow(CopyState cstate)
 {
StringInfo  fe_msgbuf = cstate->fe_msgbuf;
 
+   /* Take care adding row delimiters*/
switch (cstate->copy_dest)
{
case COPY_FILE:
+   case COPY_FN:

[HACKERS] [PATCH 2/4] Add tests for "COPY ... TO FUNCTION ..."

2009-11-23 Thread Daniel Farina

Signed-off-by: Daniel Farina 
---
 src/test/regress/input/copy.source  |   38 +++
 src/test/regress/output/copy.source |   69 +++
 src/test/regress/regress.c  |   56 
 3 files changed, 163 insertions(+), 0 deletions(-)

diff --git a/src/test/regress/input/copy.source 
b/src/test/regress/input/copy.source
index 376329d..e5dcd62 100644
--- a/src/test/regress/input/copy.source
+++ b/src/test/regress/input/copy.source
@@ -107,3 +107,41 @@ this is just a line full of junk that would error out if 
parsed
 
 copy copytest3 to stdout csv header;
 
+
+-- test copy to function
+
+CREATE FUNCTION copyto_setup_state ()
+RETURNS void
+AS '@libdir@/regr...@dlsuffix@'
+LANGUAGE C;
+
+CREATE FUNCTION copyto_function (internal)
+RETURNS void
+AS '@libdir@/regr...@dlsuffix@'
+LANGUAGE C;
+
+CREATE FUNCTION copyto_yield_len ()
+RETURNS int4
+AS '@libdir@/regr...@dlsuffix@'
+LANGUAGE C;
+
+CREATE FUNCTION copyto_yield_text ()
+RETURNS text
+AS '@libdir@/regr...@dlsuffix@'
+LANGUAGE C;
+
+CREATE FUNCTION copyto_free ()
+RETURNS void
+AS '@libdir@/regr...@dlsuffix@'
+LANGUAGE C;
+
+select copyto_setup_state();
+copy copytest to function copyto_function;
+select copyto_yield_len();
+select copyto_yield_text();
+select copyto_free();
+
+select copyto_setup_state();
+copy binary copytest to function copyto_function;
+select copyto_yield_len();
+select copyto_free();
diff --git a/src/test/regress/output/copy.source 
b/src/test/regress/output/copy.source
index 5a88d6e..74ea935 100644
--- a/src/test/regress/output/copy.source
+++ b/src/test/regress/output/copy.source
@@ -71,3 +71,72 @@ copy copytest3 to stdout csv header;
 c1,"col with , comma","col with "" quote"
 1,a,1
 2,b,2
+-- test copy to function
+CREATE FUNCTION copyto_setup_state ()
+RETURNS void
+AS '@libdir@/regr...@dlsuffix@'
+LANGUAGE C;
+CREATE FUNCTION copyto_function (internal)
+RETURNS void
+AS '@libdir@/regr...@dlsuffix@'
+LANGUAGE C;
+CREATE FUNCTION copyto_yield_len ()
+RETURNS int4
+AS '@libdir@/regr...@dlsuffix@'
+LANGUAGE C;
+CREATE FUNCTION copyto_yield_text ()
+RETURNS text
+AS '@libdir@/regr...@dlsuffix@'
+LANGUAGE C;
+CREATE FUNCTION copyto_free ()
+RETURNS void
+AS '@libdir@/regr...@dlsuffix@'
+LANGUAGE C;
+select copyto_setup_state();
+ copyto_setup_state 
+
+ 
+(1 row)
+
+copy copytest to function copyto_function;
+select copyto_yield_len();
+ copyto_yield_len 
+--
+   76
+(1 row)
+
+select copyto_yield_text();
+ copyto_yield_text 
+---
+ DOS abc\r\ndef  1+
+ Unixabc\ndef2+
+ Mac abc\rdef3+
+ esc\\apea\\r\\\r\\\n\\nb4+
+ 
+(1 row)
+
+select copyto_free();
+ copyto_free 
+-
+ 
+(1 row)
+
+select copyto_setup_state();
+ copyto_setup_state 
+
+ 
+(1 row)
+
+copy binary copytest to function copyto_function;
+select copyto_yield_len();
+ copyto_yield_len 
+--
+  142
+(1 row)
+
+select copyto_free();
+ copyto_free 
+-
+ 
+(1 row)
+
diff --git a/src/test/regress/regress.c b/src/test/regress/regress.c
index 0e94e68..a96a085 100644
--- a/src/test/regress/regress.c
+++ b/src/test/regress/regress.c
@@ -16,6 +16,7 @@
 #include "executor/spi.h"
 #include "utils/builtins.h"
 #include "utils/geo_decls.h"
+#include "utils/memutils.h"
 
 
 #define P_MAXDIG 12
@@ -34,6 +35,11 @@ extern char *reverse_name(char *string);
 extern int oldstyle_length(int n, text *t);
 extern Datum int44in(PG_FUNCTION_ARGS);
 extern Datum int44out(PG_FUNCTION_ARGS);
+extern Datum copyto_free(PG_FUNCTION_ARGS);
+extern Datum copyto_function(PG_FUNCTION_ARGS);
+extern Datum copyto_setup_state(PG_FUNCTION_ARGS);
+extern Datum copyto_yield_len(PG_FUNCTION_ARGS);
+extern Datum copyto_yield_text(PG_FUNCTION_ARGS);
 
 #ifdef PG_MODULE_MAGIC
 PG_MODULE_MAGIC;
@@ -737,3 +743,53 @@ int44out(PG_FUNCTION_ARGS)
*--walk = '\0';
PG_RETURN_CSTRING(result);
 }
+
+/*
+ * copyto testing
+ */
+static StringInfo global_buf;
+
+PG_FUNCTION_INFO_V1(copyto_setup_state);
+
+Datum
+copyto_setup_state(PG_FUNCTION_ARGS)
+{
+   MemoryContext oldcxt = MemoryContextSwitchTo(TopMemoryContext);
+   global_buf = makeStringInfo();
+   MemoryContextSwitchTo(oldcxt);
+   PG_RETURN_VOID();
+}
+
+PG_FUNCTION_INFO_V1(copyto_function);
+
+Datum
+copyto_function(PG_FUNCTION_ARGS)
+{
+   StringInfo copybuf = (void *) PG_GETARG_POINTER(0);
+   appendBinaryStringInfo(global_buf, copybuf->data, copybuf->len);
+   PG_RETURN_VOID();
+}
+
+PG_FUNCTION_INFO_V1(copyto_yield_len);
+
+Datum
+copyto_yield_l

[HACKERS] [PATCH 3/4] Add dblink functions for use with COPY ... TO FUNCTION ...

2009-11-23 Thread Daniel Farina

This patch enables dblink to be used for the purpose of efficient
bulk-loading via COPY and libpq in combination with the COPY TO
FUNCTION patch.

The following functions were added to accomplish this:

dblink_connection_reset: useful when handling errors and one just
wants to restore a connection to a known state, rolling back as many
transactions as necessary.

dblink_copy_end: completes the COPY

dblink_copy_open: puts a connection into the COPY state.  Accepts
connection name, relation name, and binary mode flag.

dblink_copy_write: writes a row to the last connection put in the COPY
state by dblink_copy_open.

Generally speaking, code that uses this will look like the following
(presuming a named connection has already been made):

try:
SELECT dblink_copy_open('myconn', 'relation_name', true);
COPY bar TO FUNCTION dblink_copy_write;

-- since the dblink connection is still in the COPY state, one
-- can even copy some more data in multiple steps...
COPY bar_2 TO FUNCTION dblink_copy_write;

SELECT dblink_copy_end();
finally:
SELECT dblink_copy_reset('myconn');

Signed-off-by: Daniel Farina 
---
 contrib/dblink/dblink.c |  190 +++
 contrib/dblink/dblink.h |5 +
 contrib/dblink/dblink.sql.in|   20 
 contrib/dblink/uninstall_dblink.sql |8 ++
 4 files changed, 223 insertions(+), 0 deletions(-)

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 72fdf56..d32aeec 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -1722,6 +1722,196 @@ dblink_get_notify(PG_FUNCTION_ARGS)
  * internal functions
  */
 
+/*
+ * Attempts to take the connection into a known state by rolling back
+ * transactions.  If unable to restore the connection to a known idle state,
+ * raises an error.
+ */
+PG_FUNCTION_INFO_V1(dblink_connection_reset);
+Datum
+dblink_connection_reset(PG_FUNCTION_ARGS)
+{
+   PGresult*res   = NULL;
+   PGconn  *conn  = NULL;
+   char*conname   = NULL;
+   remoteConn  *rconn = NULL;
+
+   bool triedonce = false;
+
+   DBLINK_INIT;
+
+   /* must be text */
+   Assert(PG_NARGS() == 1);
+   DBLINK_GET_NAMED_CONN;
+
+   if (!conn)
+   DBLINK_CONN_NOT_AVAIL;
+
+   while (!triedonce)
+   {
+   switch (PQtransactionStatus(conn))
+   {
+   case PQTRANS_IDLE:
+   /* Everything is okay */
+   goto finish;
+   case PQTRANS_ACTIVE:
+   case PQTRANS_INTRANS:
+   case PQTRANS_INERROR:
+   res = PQexec(conn, "ROLLBACK;");
+
+   if (PQresultStatus(res) != PGRES_COMMAND_OK)
+   ereport(ERROR,
+   
(errcode(ERRCODE_CONNECTION_FAILURE),
+errmsg("%s: could not 
issue ROLLBACK command",
+   
PG_FUNCNAME_MACRO)));
+
+   PQclear(res);
+   triedonce = true;
+   break;
+   case PQTRANS_UNKNOWN:
+   elog(ERROR, "%s: connection in unknown 
transaction state",
+PG_FUNCNAME_MACRO);
+   }
+   }
+
+finish:
+   PG_RETURN_VOID();
+}
+
+/*
+ * dblink COPY support, procedures and variables
+ */
+static PGconn *dblink_copy_current = NULL;
+
+/*
+ * dblink_copy_open
+ *
+ * Start a COPY into a given relation on the named remote connection.
+ */
+PG_FUNCTION_INFO_V1(dblink_copy_open);
+Datum
+dblink_copy_open(PG_FUNCTION_ARGS)
+{
+   PGresult   *res = NULL;
+   PGconn *conn = NULL;
+   char   *conname = NULL;
+   remoteConn *rconn = NULL;
+
+   const char  *copy_stmt   = "COPY %s FROM STDIN%s;";
+   const char  *with_binary = " WITH BINARY";
+   const char  *quoted_remoted_relname;
+   bool isbinary;
+   int  snprintf_retcode;
+
+   /*
+* Should be large enough to contain any formatted output.  Formed by
+* counting the characters in the static formatting sections plus the
+* bounded length of identifiers.  Some modest padding was added for
+* paranoia's sake, although all uses of this buffer are checked for
+* over-length formats anyway.
+*/
+   char buf[64 + NAMEDATALEN];
+
+   DBLINK_INIT;
+
+   /* must be text,text,bool */
+   Assert(PG_NARGS() == 3);
+   DBLINK_GET_NAMED_CONN;
+
+   if (!conn)
+   DBLINK_CONN_NOT_AVAIL;
+
+   /* Read the procedure argumen

[HACKERS] [PATCH 0/4] COPY to a UDF: "COPY ... TO FUNCTION ..."

2009-11-23 Thread Daniel Farina

I have extended COPY to support using a UDF as a target instead of the
normal 'file' or STDOUT targets.  This dovetails nicely with a couple
of extensions I have also written for dblink for the purposes of
enabling direct cross-node bulk loading and replication.  Please
peruse the patches (the non-test containing patches also possess
robust human-readable summaries and explanations) that are In-Reply-To
this email for more details.

You can also get these patches from a git repo.  These patches are
applied against the history tracked by git.postgresql.org:

  git fetch http://fdr.lolrus.org/postgresql.git \
copy-to-function:copy-to-function
  git checkout copy-to-function

While the functionality exposed in these patches has appeared robust
and efficient to us at Truviso, the code had ease-of-upstream merging
as a central design point, and as such I have shied away from adding
more invasive functionality that would make the interface less
byzantine/more usable.  This was intended to be the most surgical cut
before it seemed likely that this might be interesting to the
PostgreSQL project.

At least one additional datapoint of someone else wanting such a
functionality is seen in this thread:

  http://archives.postgresql.org/pgsql-hackers/2009-08/msg00428.php

Open Issues:

 * setup/copy/teardown and error handling: as-is it is unhealthily
   tempting to use a global variable (as seen in the dblink patches)
   to track state between setup/copy/cleanup steps.  I'm not sure what
   the right aesthetic is to make this a little more controlled than
   calling specific functions in exactly the right order.

 * 'transition state': similar to an aggregate, it may make sense for
   the target of TO FUNCTION to have a context in which it can stash
   state, or at least have access to a few constant parameters as it
   accepts records.  If such functionality existed one might be able
   to conveniently rewrite the current COPY ... TO (STDOUT|'file')
   behavior to be syntactic sugar for TO FUNCTION behavior, which is
   somewhat aesthetically pleasing to me.

 * It might be interesting to increase the symmetry of this operation
   allowing COPY to bulk load into UDFs.  With that in mind, the
   design the interfaces may change...or not.

This work is released under the BSD license as utilized by the
PostgreSQL project.  The copyright owner is Truviso, Inc in 2009.

Daniel Farina (4):
  Add "COPY ... TO FUNCTION ..." support
  Add tests for "COPY ... TO FUNCTION ..."
  Add dblink functions for use with COPY ... TO FUNCTION ...
  Add tests to dblink covering use of COPY TO FUNCTION

 contrib/dblink/dblink.c |  190 
 contrib/dblink/dblink.h |5 +
 contrib/dblink/dblink.sql.in|   20 +++
 contrib/dblink/expected/dblink.out  |  272 +++
 contrib/dblink/sql/dblink.sql   |  112 ++
 contrib/dblink/uninstall_dblink.sql |8 +
 src/backend/catalog/namespace.c |   21 +++
 src/backend/commands/copy.c |  190 +
 src/backend/executor/spi.c  |2 +-
 src/backend/nodes/copyfuncs.c   |2 +-
 src/backend/nodes/equalfuncs.c  |2 +-
 src/backend/parser/gram.y   |   30 +++--
 src/include/catalog/namespace.h |1 +
 src/include/nodes/parsenodes.h  |3 +-
 src/test/regress/input/copy.source  |   38 +
 src/test/regress/output/copy.source |   69 +
 src/test/regress/regress.c  |   56 +++
 17 files changed, 982 insertions(+), 39 deletions(-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] named generic constraints [feature request]

2009-11-23 Thread Peter Eisentraut

On mån, 2009-11-23 at 12:50 -0500, Caleb Cushing wrote:
> and domains
> only seem right if it's something, like a zip code, that has a very
> specific set of rules, that is in reality it's own type.

A domain is not really its own type, it's a domain over its base type.
Hence the name.

> where
> specifying something like 'empty' feels as generic (and arbitrary?) as
> null.

The problem with your empty constraint is that it's data type specific,
and therefore the operator is also different depending on context.  So
either you create a "named generic constraint" for every data type you
are interested in (in that case, see domains), or the thing could at
best work as a text substitution mechanism, which is something that SQL
typically doesn't do.

> empty is not the only example (I'm sure), just the best I can
> think of.

I doubt that there are any really good examples that cannot be solved
with the current facilities.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] arrays as input parameters in plperl

2009-11-23 Thread Alexey Klyukin

Hi,

I'd like to improve the way PL/Perl handles arrays as function input 
parameters. In PL/Perl arrays are passed as plain text strings, and getting a 
value of an array element requires additional perl code to parse that string. 
I'd like to make this easier by converting each postgresq array passed as an 
input parameter to the reference to the corresponding perl array. The patch in 
attachment illustrates this. it's limited to one-dimensional array output. The 
list  of upcoming improvements is:

-  convert n-dimensional array to the perl equivalent (a reference to a list of 
references)
-  proper support for arrays of composite types
-  compatibility with existing plperl functions by introducing a new custom 
variable, i.e. plperl.pass_array_refs that triggers the new behavior. I think 
it should be disabled by default.
 - documentation and additional tests

The patch adds a new attribute to the plperl_proc_desc struct, that records 
whether Nth argument is an array. The function plperl_ref_from_pg_array does 
the actual job of converting array input parameter to the perl array reference. 
I considered writing a perl routine instead of a C function, although I though 
it would be less readable, more complex and slower due to double conversion of 
input. The disadvantage of a C function is a code duplication with array_out, 
on which my function is based, although it can be avoided by putting a relevant 
part of array_out into a separate function. 

The patch is attached.

Anybody interested in this feature ? Ideas, improvements, suggestions ?

Regards,
--
Alexey Klyukin  http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc


plperl_array.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] named generic constraints [feature request]

2009-11-23 Thread Pavel Stehule

2009/11/23 Caleb Cushing :
> On Mon, Nov 23, 2009 at 4:17 AM, Pavel Stehule  
> wrote:
>> Hello
>>
>> do you know domains? It is very similar to your proposal.
>>
>
> obviously since I cited it.
>
>> constraint cannot be  part of  expression.
>>
> why not? NOT NULL is a contraint, UNIQUE is a contstraint.

yes - but you are defined constraint empty - not "not empty". for
example - there are not a constraint "NOT UNIQUE". I thing, so this
isn't workable. Constrainst are hard coded - it uses keywords. Your
new syntax is redundant - there are not any special value to using
CHECK clause and functions.

Regards
Pavel Stehule

>
>> CREATE OR REPLACE FUNCTION emptystr(text)
>> RETURNS bool AS $$
>>  SELECT $1 <> ''; -- it is SQL not C
>> $$ LANGUAGE sql;
>>
>> CREATE TABLE users(
>>  username TEXT CHECK (NOT emptystr(username)),
>>  ...
>
> this is probably the 'best' current solution.  however, I'd like to be
> able to not have to name the column for every constraint. and domains
> only seem right if it's something, like a zip code, that has a very
> specific set of rules, that is in reality it's own type. where
> specifying something like 'empty' feels as generic (and arbitrary?) as
> null. empty is not the only example (I'm sure), just the best I can
> think of.
>
>> p.s. Is it related to ANSI SQL?
>
> not to my knowledge (can't say that it isn't though, I've never read
> the standard).
> --
> Caleb Cushing
>
> http://xenoterracide.blogspot.com
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EXPLAIN BUFFERS

2009-11-23 Thread Jeff Janes

On Thu, Oct 15, 2009 at 3:29 AM, Itagaki Takahiro
 wrote:
>
> Robert Haas  wrote:
>
>> In this case, I think that the auto_explain changes out to be part of
>> the same patch as the core EXPLAIN changes
>
> Here is a rewritten patch to add EXPLAIN (ANALYZE, BUFFERS) and
> support for it by contrib/auto_explain. I removed pg_stat_statements
> support from the patch for now.

Just a quick note:  this patch does not apply cleanly to HEAD due to
the subsequent removal from explain.c of the near-by lines:

/* Convert parameter type data to the form parser wants */
getParamListTypes(params, ¶m_types, &num_params);

I think it is merely a text conflict and not a functional one.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Updating column on row update

2009-11-23 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 >> Loops in plain SQL are no problem: see generate_series. The last
 >> time we discussed this I demonstrated reasonably straightforward
 >> SQL examples of how to do things like password-cracking (and that
 >> was long before we had CTEs, so it would be even easier now); my
 >> challenge to anyone to produce examples of malicious plpgsql code
 >> that couldn't be reproduced in plain SQL went unanswered.

 Tom> The fact remains though that the looping performance of anything
 Tom> you can cons up in straight SQL will be an order of magnitude
 Tom> worse than in plpgsql;

Well, let's see. How about generating all possible strings of 6 characters
from A-Z? We'll just count the results for now:

select count(chr(65+(i/676))||chr(65+(i/26)%26)||chr(65+i%26)
 ||chr(65+(j/676))||chr(65+(j/26)%26)||chr(65+j%26))
  from generate_series(0,17575) i, generate_series(0,17575) j;
   count   
---
 308915776
(1 row)

Time: 462570.563 ms

create function foo() returns bigint language plpgsql
 as $f$
  declare
c bigint := 0;
s text;
  begin
for i in 0..17575 loop
  for j in 0..17575 loop
s := chr(65+(i/676))||chr(65+(i/26)%26)||chr(65+i%26)
 ||chr(65+(j/676))||chr(65+(j/26)%26)||chr(65+j%26);
c := c + 1;
  end loop;
end loop;
return c;
  end;
$f$;

select foo();
foo
---
 308915776
(1 row)

Time: 624809.671 ms

plpgsql comes out 35% _slower_, not "an order of magnitude worse".

-- 
Andrew.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] named generic constraints [feature request]

2009-11-23 Thread Caleb Cushing

On Mon, Nov 23, 2009 at 4:17 AM, Pavel Stehule  wrote:
> Hello
>
> do you know domains? It is very similar to your proposal.
>

obviously since I cited it.

> constraint cannot be  part of  expression.
>
why not? NOT NULL is a contraint, UNIQUE is a contstraint.

> CREATE OR REPLACE FUNCTION emptystr(text)
> RETURNS bool AS $$
>  SELECT $1 <> ''; -- it is SQL not C
> $$ LANGUAGE sql;
>
> CREATE TABLE users(
>  username TEXT CHECK (NOT emptystr(username)),
>  ...

this is probably the 'best' current solution.  however, I'd like to be
able to not have to name the column for every constraint. and domains
only seem right if it's something, like a zip code, that has a very
specific set of rules, that is in reality it's own type. where
specifying something like 'empty' feels as generic (and arbitrary?) as
null. empty is not the only example (I'm sure), just the best I can
think of.

> p.s. Is it related to ANSI SQL?

not to my knowledge (can't say that it isn't though, I've never read
the standard).
-- 
Caleb Cushing

http://xenoterracide.blogspot.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] KNNGiST for knn-search

2009-11-23 Thread Teodor Sigaev


Hi there,

http://www.sigaev.ru/misc/knngist-0.11.tar.gz

we'd like to present contrib module for CVS HEAD, which contains implementation 
of knn (k nearest neighbourhood) search in PostgreSQL, see README.knngist for

details. KNNGiST is an extension of existing GiST, which inherits most of
their code, so it's has recovery (WAL-logged)  and concurrency support.
Basically, it introduces a new distance-based priority queue tree traversal
algorithm (instead of depth-first in plain GiST), so index scan returns rows
sorted by closiness to a query. Notice, index returns all rows, so one should
use LIMIT clause to specify k (which is usually small) to get real benefits.
We get 300x times better performance on real-life data (about 1 mln points):

Module requires rbtree and point_ops patches applied. 
(http://archives.postgresql.org/message-id/4b0a8dfa.7050...@sigaev.ru and 
http://archives.postgresql.org/message-id/4b0a8f0f.3020...@sigaev.ru)


Old way:
SELECT coordinates, (coordinates <-> '5.0,5.0'::point) AS dist FROM spots
order by dist asc LIMIT 10;

Time: 1024.242 ms

knn-search:
SELECT coordinates, (coordinates <-> '5.0,5.0'::point) AS dist FROM spots
WHERE coordinates  >< '5.0,5.0'::point LIMIT 10;

Time: 3.158 ms


We didn't patch existing implementation of GiST for several reasons:

1. KNNGiST is about 5% slower than GiST on non-knn search queries, like
  contains or contained by, because of some overhead of new algorithm of
  tree traversal
2.  KNNGiST can't be used in  bitmap index scan, which destroys order of 
results,
  We don't know the way to forbid bitmap index scan only for knn queries.
  Current version of KNNGiST doesn't distinguish knn-search and usual search
  and postgres doesn't know about ordered output from KNNGiST.

We see several ways to add KNNGiST to PostgreSQL:

1. Patch existing GiST.
   Con - see problems above.
   Pro - any existing GiST opclasses will work with KNNGiST.

2. add KNNGIST as a contrib module like now.
   Con - there is no way in PostgreSQL to test other modules, which depends
 on KNNGiST. For example, it's easy to add support for trigrams, but
 then we add dependence on contrib/pg_trgm module.

3. add KNNGiST as separate access method into core of PostgreSQL.
   We think it the best way, since we don't touch existing GiST and opclasses,
   and could use KNNGiST in other contrib modules


Separate problem is query planning.

1. To use KNNGiST we need to use index scan ! To prevent seqscan on table,
   operator >< has extremely high cost.
2. To prevent bitmap index scan, KNNGiST doesn't have bitmap index scan
   interface at all (no amgetbitmap method).


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/
knngist - contrib module for k-nearest neighbourhood search

Support: The Open Planning Project, Inc.

Module knngist provides:

- KNNGiST access method is an extension of GiST, which implements  k-nearest
 neighbourhood search
- Operator class for KNNGiST for data type points with capability of knn-search
- Operator class for KNNGiST for data type tsvector without capability of 
knn-search

KNNGiST is inherited from GiST and use the same write methods, so, that KNNGiST 
has recovery (WAL-logged)  and concurrency support.

KNNGiST supports all queries executable by GiST, but with possible 
performance loss.  KNNGiST keeps all features of GiST, such as multicolumn 
indexes with support of any subset of the index's columns, indexing and 
searching of NULL values.

The KNNGiST differs from GiST in:
- search traversal algorithm on tree. While GiST uses first-depth search
  algorithm of KNN version uses traversal priority queue
- consistent user-defined method should return distance of tuple to the query,
  instead of a boolean value as in GiST. However, KNNGiST can use GiST's
  consistent method for additional filtering of result or GiST-alike
  search,  but not for knn-search (for example, tsvector_ops).
- KNNGiST doesn't have amgetbitmap method, because of nature of knn-search.


consistent user-defined method for KNNGiST can return:

 - negative value, which means tuple doesn't match query
   (like false in GiST's consistent)
 - 0.0 means one of:
   - a zero distance (exact match)
   - a match for filtering clause, like a <@ or @> for point.
   KNNGist doesn't distinguish these two cases and relies on user-defined 
   methods
 - positive value, which means the method returns distance. In this case 
   keyRecheck should be false!, since it's impossible to make right order
   with lossy values.

Distance between tuple and query is calculated as a sum of all distances 
(on all keys). Notice, that distance is a numerical (non-negative)
description of how tuple is different from a query and KNNGiST doesn't require, 
that it should follow triangle rule. 

Caveats:

Currently, it's impossible to specify the number of closest neighbourhood
points returned,

Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove -w (--ignore-all-space) option from pg_regress's diff

2009-11-23 Thread Andrew Dunstan




Tom Lane wrote:

Andrew Dunstan  writes:
  

Tom Lane wrote:


I'm thinking that the most appropriate fix is to have pg_regress
continue to use -w, but only on Windows.
  


  
Well, the filter could be as simple as something like this in the 
Makefile for the mingw case:



  

perl -spi.bak -e 's/(?


I'm not at all thrilled with having the build process intentionally
modify source files.  Quite aside from messing up the file timestamps,
what if this is done on a committer's machine?  If the checked-out
files didn't have CRs, that means his CVS client didn't add them
and probably won't remove them on checkin.


  


OK, it's a choice of evils. I'm not that unhappy with what you've done.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Simon Riggs

On Mon, 2009-11-23 at 12:23 -0500, Greg Smith wrote:

> What would probably be helpful here is to take the mess of raw data 
> above and turn it into a simpler partitioning roadmap.  

Thanks for summarising.

I briefly tried to do that on the thread for Itagaki-san's patch. That's
a first stab at things, at least. 

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Greg Smith


Simon Riggs wrote:

...Read my detailed comments in response to Kedar's patch and post
comments on that thread to say you didn't agree with that proposal and
that you were thinking of another way entirely.

Useful background here is:

http://wiki.postgresql.org/wiki/Table_partitioning
http://archives.postgresql.org/pgsql-hackers/2008-01/msg00413.php
http://archives.postgresql.org/message-id/bd8134a40906080702s96c90a9q3bbb581b9bd0d...@mail.gmail.com
http://archives.postgresql.org/message-id/1247564358.11347.1308.ca...@ebony.2ndquadrant

The basic problem here is that Emmanuel and Aster developed a useful 
answer to one of the more pressing implementation details needed here, 
but did so without being involved in the much larger discussion of how 
to implement general, more automated partitioning in PostgreSQL that (as 
you can see from the date of the first links there) has been going on 
for years already.  What we did wrong as a community is not more 
explicitly tell Emmanuel the above when he first submitted code a few 
months ago, before he'd invested more time on a subset implementation 
that was unlikely to be committed.  As I already commented upthread, I 
was just happy to see coding progress being made on part of the design 
that nobody had hacked on before to my knowledge; I didn't consider then 
how Emmanuel was going to be disappointed by the slow rate that code 
would be assimilated into the design going on in this area.


What would probably be helpful here is to take the mess of raw data 
above and turn it into a simpler partitioning roadmap.  There's a stack 
of useful patches here, multiple contributors who have gotten familiar 
with the implementation details required, and enough time left that it's 
possible to pull something together in time for 8.5--but only if 
everyone is clear on exactly what direction to push toward.  I'm going 
to reread the history here myself and see if I can write something 
helpful here.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Tom Lane

Simon Riggs  writes:
> Anyway, I want data routing, as is the intention of this patch. I just
> don't think this patch is a useful way to do it. It is too narrow in its
> scope and potentially buggy in its approach to developing a cache and
> using trigger-like stuff. 

FWIW, I agree --- there are two really fundamental problems with this
patch:

* It only applies to COPY.  You'd certainly want routing for INSERT as
  well.  And it shouldn't be necessary to specify an option.

* Building this type of infrastructure on top of independent, not
  guaranteed consistent table constraints is just throwing more work
  into a dead end.  The patch is already full of special-case errors
  for possible inconsistency of the constraints, and I don't think it's
  bulletproof even so (what if someone is altering the constraints
  concurrently? What if there's more than one legal destination?)
  And the performance necessarily sucks.

What we need first is an explicit representation of partitioning, and
then to build routing code on top of that.  I haven't looked at
Itagaki-san's syntax patch at all, but I think it's at least starting
in a sensible place.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] point_ops for GiST

2009-11-23 Thread Greg Stark

2009/11/23 Teodor Sigaev :

> point   <@ box
> point   <@ polygon
> point   <@ circle

I've always wondered why we didn't have these

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Simon Riggs

On Mon, 2009-11-23 at 10:43 -0500, Emmanuel Cecchet wrote:
> Simon Riggs wrote:
> > I was unaware you were developing these ideas and so was unable to
> > provide comments until now. 

> The first patch was published to this list on September 10 (almost 2.5 
> months ago) along with the wiki page describing the problem and the 
> solution.

> What should I have done to raise awareness further?

...Read my detailed comments in response to Kedar's patch and post
comments on that thread to say you didn't agree with that proposal and
that you were thinking of another way entirely. ~14 July. >4 months ago.

...Contact me personally when you saw that I hadn't responded to your
later posts, knowing that I have recent track record as a reviewer of
partitioning patches.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove -w (--ignore-all-space) option from pg_regress's diff

2009-11-23 Thread Tom Lane

Andrew Dunstan  writes:
> Tom Lane wrote:
>> I'm thinking that the most appropriate fix is to have pg_regress
>> continue to use -w, but only on Windows.

> Well, the filter could be as simple as something like this in the 
> Makefile for the mingw case:

> perl -spi.bak -e 's/(? rm expected/*.bak

I'm not at all thrilled with having the build process intentionally
modify source files.  Quite aside from messing up the file timestamps,
what if this is done on a committer's machine?  If the checked-out
files didn't have CRs, that means his CVS client didn't add them
and probably won't remove them on checkin.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove -w (--ignore-all-space) option from pg_regress's diff

2009-11-23 Thread Andrew Dunstan




Tom Lane wrote:

Magnus Hagander  writes:
  

On Mon, Nov 23, 2009 at 06:58, Andrew Dunstan  wrote:


They might not be using the same CVS programs, though. It appears that
Windows CVS (which, for example, red_bat uses) translates line endings to
CRLF, which is why it passed the regression tests, but MinGW CVS does not,
which I think is is why narwahl and vaquita failed and why dawn_bat will
probably fail next go round. brown_bat is on Cygwin and we should not expect
a change there.
  


  

Yeah, I've seen a lot of weirdness with CVS clients on Windows doing
that differently, so that also seems like a very likely reason.



brown_bat is indeed still green, so Andrew's probably fingered the right
component.  I thought for a moment about insisting that Windows
buildfarm members use a non-translating CVS client, but that would still
leave people vulnerable when trying to build from source, if they use a
tarball extractor that converts newlines.

I'm thinking that the most appropriate fix is to have pg_regress
continue to use -w, but only on Windows.  (I notice that ecpg is already
doing it that way, presumably for the same reason of newline
differences.)  A filter such as Andrew mumbled about upthread seems like
more trouble than the problem is worth.  Any actually-interesting
whitespace changes should get caught on other platforms.
  


Well, the filter could be as simple as something like this in the 
Makefile for the mingw case:


   perl -spi.bak -e 's/(?http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Simon Riggs

On Mon, 2009-11-23 at 10:24 -0500, Emmanuel Cecchet wrote:

> I think there is a misunderstanding between what Simon wants 
> ('Anyway, I want data routing, as is the intention of this patch.') and 
> what this patch is about. This patch is just supposed to load tuples in 
> a hierarchy of tables as this is a recurrent use case in datawarehouse 
> scenarios. It is not supposed to solve data routing in general 
> (otherwise that would be integrated standard in COPY and not as an option).

I have not misunderstood. You wish to solve a very specific problem,
with very specific code. I've done that myself on occasion. My opinion
is that we should solve many of the partitioning problems with one set
of central, common code. If we do not do this we will need 3-4 times as
much code, most of which will be similar and yet must be exactly the
same. That alone is enough to block the patch's proposed method (IMHO).

> But it looks like it is a waste of everybody's time to continue this 
> discussion further. Just move the patch to the rejected patches and 
> let's wait for Itagaki's implementation.

The lack of discussion and design in this area has held back the last
few patches, by various authors; we should learn from that. Also,
working in isolation on narrow problems will not move us forwards as
fast as if we all work together on pieces of the whole vision for
partitioning. My piece was to think through how to link each of the
different aspects of partitioning and to propose a solution. Please join
with Itagaki to move this forwards - your further contributions will be
valuable.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Emmanuel Cecchet


Simon Riggs wrote:

I was unaware you were developing these ideas and so was unable to
provide comments until now. 
The first patch was published to this list on September 10 (almost 2.5 
months ago) along with the wiki page describing the problem and the 
solution.

What should I have done to raise awareness further?

/E

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Updating column on row update

2009-11-23 Thread Tom Lane

Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Well, that's pretty much exactly the question --- are there?  It
>  Tom> would certainly make it easier for someone to exploit any other
>  Tom> security weakness they might find.

> Loops in plain SQL are no problem: see generate_series. The last time
> we discussed this I demonstrated reasonably straightforward SQL
> examples of how to do things like password-cracking (and that was long
> before we had CTEs, so it would be even easier now); my challenge to
> anyone to produce examples of malicious plpgsql code that couldn't be
> reproduced in plain SQL went unanswered.

The fact remains though that the looping performance of anything you can
cons up in straight SQL will be an order of magnitude worse than in
plpgsql; and it's a notation the average script kiddie will find pretty
unfamiliar.  So I think this still does represent some barrier.  Whether
it's enough of a barrier to justify keeping plpgsql out of the default
install is certainly debatable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Simon Riggs

On Mon, 2009-11-23 at 09:39 -0500, Emmanuel Cecchet wrote:

> I think you should read the thread and the patch 

I did read the thread and patch in full before posting. My opinions are
given to help you and the community towards a desirable common goal.

I was unaware you were developing these ideas and so was unable to
provide comments until now. My review of Kedar's patch in July did lay
out in general terms a specific implementation route for future work on
partitioning. I had thought I might not have made those comments clearly
enough, so gave a more specific description of what I consider to be a
more workable and general solution for cacheing and using partitioning
metadata.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Simon Riggs

On Mon, 2009-11-23 at 09:39 -0500, Emmanuel Cecchet wrote:

> I think you should read the thread and the patch 

I did read the thread and patch in full before posting. My opinions are
given to help you and the community towards a desirable common goal.

I was unaware you were developing these ideas and so was unable to
provide comments until now. My review of Kedar's patch in July did lay
out in general terms a specific implementation route for future work on
partitioning. I had thought I might not have made those comments clearly
enough, so gave a more specific description of what I consider to be a
more workable and general solution for cacheing and using partitioning
metadata.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Updating column on row update

2009-11-23 Thread Andrew Dunstan




Tom Lane wrote:

Thom Brown  writes:
  

As for having plpgsql installed by default, are there any security
implications?



Well, that's pretty much exactly the question --- are there?  It would
certainly make it easier for someone to exploit any other security
weakness they might find.  I believe plain SQL plus SQL functions is
Turing-complete, but that doesn't mean it's easy or fast to write loops
etc in it.


  


That's a bit harder argument to sustain now we have recursive queries, ISTM.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Emmanuel Cecchet


Robert Haas wrote:

On Mon, Nov 23, 2009 at 9:39 AM, Emmanuel Cecchet  wrote:
  

I think you should read the thread and the patch before making any false
statements like you did in your email.

1. The patch does not use any trigger for routing.



Whoa, whoa!  I don't think that Simon said that it did.  But even if I
am wrong and he did...
  
Quote from Simon's email: "It is too narrow in its scope and potentially 
buggy in its approach to developing a cache and using trigger-like stuff."

You should really think twice about the style of your emails that cast a
detestable tone to discussions on pg-hackers.


...I certainly don't think this comment is justified.  This is a
technical discussion about the best way of solving a certain problem,
and I don't believe that any of the discussion up to this point has
been anything other than civil.  I can tell that you are frustrated
that your patch is not getting the support you would like to see it
get, but launching ad hominem attacks on Simon or anyone else is not
going to help

We certainly don't live in the same civilization then.

I am not frustrated if my patch does not get in because of technical 
considerations and I am happy so far with Jan's feedback that helped a 
lot. I think there is a misunderstanding between what Simon wants 
('Anyway, I want data routing, as is the intention of this patch.') and 
what this patch is about. This patch is just supposed to load tuples in 
a hierarchy of tables as this is a recurrent use case in datawarehouse 
scenarios. It is not supposed to solve data routing in general 
(otherwise that would be integrated standard in COPY and not as an option).


But it looks like it is a waste of everybody's time to continue this 
discussion further. Just move the patch to the rejected patches and 
let's wait for Itagaki's implementation.


Emmanuel

--
Emmanuel Cecchet
Aster Data
Web: http://www.asterdata.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Updating column on row update

2009-11-23 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 > Thom Brown  writes:
 >> As for having plpgsql installed by default, are there any security
 >> implications?

 Tom> Well, that's pretty much exactly the question --- are there?  It
 Tom> would certainly make it easier for someone to exploit any other
 Tom> security weakness they might find.  I believe plain SQL plus SQL
 Tom> functions is Turing-complete, but that doesn't mean it's easy or
 Tom> fast to write loops etc in it.

Now that we have recursive CTEs, plain SQL is turing-complete without
requiring functions.

(Yes, I did actually prove this a while back, by implementing one of
the known-Turing-complete tag system automata as a single recursive
query. This proof is pretty boring, though, because you wouldn't
actually _use_ that approach in practice.)

Loops in plain SQL are no problem: see generate_series. The last time
we discussed this I demonstrated reasonably straightforward SQL
examples of how to do things like password-cracking (and that was long
before we had CTEs, so it would be even easier now); my challenge to
anyone to produce examples of malicious plpgsql code that couldn't be
reproduced in plain SQL went unanswered.

-- 
Andrew (irc:RhodiumToad)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Updating column on row update

2009-11-23 Thread Thom Brown

2009/11/23 Tom Lane 

> Thom Brown  writes:
> > As for having plpgsql installed by default, are there any security
> > implications?
>
> Well, that's pretty much exactly the question --- are there?  It would
> certainly make it easier for someone to exploit any other security
> weakness they might find.  I believe plain SQL plus SQL functions is
> Turing-complete, but that doesn't mean it's easy or fast to write loops
> etc in it.
>
>regards, tom lane
>

I personally find it more important to gracefully add plpgsql if it doesn't
already exist than to rely on it already being there.  In a way it wouldn't
solve this problem as someone could have still removed it.  Other procedural
languages could benefit from some sort of check too.

Thom

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Robert Haas

On Mon, Nov 23, 2009 at 9:39 AM, Emmanuel Cecchet  wrote:
> I think you should read the thread and the patch before making any false
> statements like you did in your email.
>
> 1. The patch does not use any trigger for routing.

Whoa, whoa!  I don't think that Simon said that it did.  But even if I
am wrong and he did...

> You should really think twice about the style of your emails that cast a
> detestable tone to discussions on pg-hackers.

...I certainly don't think this comment is justified.  This is a
technical discussion about the best way of solving a certain problem,
and I don't believe that any of the discussion up to this point has
been anything other than civil.  I can tell that you are frustrated
that your patch is not getting the support you would like to see it
get, but launching ad hominem attacks on Simon or anyone else is not
going to help.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Emmanuel Cecchet


Simon,

I think you should read the thread and the patch before making any false 
statements like you did in your email.


1. The patch does not use any trigger for routing.
2. This is just an option for COPY that is useful for loading operations 
in the datawarehouse world. It is not meant to implement full 
partitioning as explained many times already in this thread.
3. This patch elaborates on existing mechanisms and cannot rely on a 
meta-data representation of partitions which does not exist yet and will 
probably not exist in 8.5


You should justify your statements when you say 'potentially buggy in 
its approach to developing a cache and using trigger-like stuff'. I 
understand that you don't like it because this is not what you want but 
this is not my fault. This is not an implementation of partitioning like 
COPY does not do update/delete/alter/...
And yes the use case is 'narrow' like any option in COPY. It is like 
complaining that the CSV option is not useful because you want to load 
binary dumps.


If Itagaki gets the support of the community to get his implementation 
accepted, I will gladly use it. Contributing? If Aster is willing to 
contribute a code monkey to implement your specs, why not but you will 
have to convince them.


You should really think twice about the style of your emails that cast a 
detestable tone to discussions on pg-hackers.


Emmanuel



On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote:
  

Hi,

I have extracted the partitioning option for COPY (removed the error 
logging part) from the previous patch.



We can use an INSERT trigger to route tuples into partitions even now.
Why do you need an additional router for COPY? 
  
Tom has already explained on the list why using a trigger was a bad idea 
(and I know we can use a trigger since I am the one who wrote it).
If you look at the code you will see that you can do optimizations in 
the COPY code that you cannot do in the trigger.




 Also, it would be nicer
that the router can works not only in COPY but also in INSERT.
  
  
As 8.5 will at best provide a syntactic hack on top of the existing 
constraint implementation, I think that it will not hurt to have routing 
in COPY since we will not have it anywhere otherwise.


BTW, I'm working on meta data of partitioning now. Your "partitioning"
option in COPY could be replaced with the catalog.
  
  
This implementation is only for the current 8.5 and it will not be 
needed anymore once we get a fully functional partitioning in Postgres 
which seems to be for a future version.



Yes, the trigger way of doing this is a bad way. 


I regret to say that the way proposed here isn't much better, AFAICS.
Let me explain why I think that, but -1 to anyone applying this patch.

This patch proposes keeping a cache of last visited partitions to reduce
the overhead of data routing.

What I've requested is that partitioning work by using a data structure
held in relcache for inheritance parents. This differs in 3 ways from
this patch
a) it has a clearly defined location for the cached metadata, with
clearly identified and well worked out mechanisms for cache invalidation
b) the cache can be built once when it is first needed, not slowly grown
as parts of the metadata are used
c) it would be available for all parts of the server, not just COPY.

The easiest way to build that metadata is when structured partitioning
info is available. i.e. the best next action is to complete and commit
Itagaki's partitioning syntax patch. Then we can easily build the
metadata for partitioning, which can then be used in COPY for data
routing.

Anyway, I want data routing, as is the intention of this patch. I just
don't think this patch is a useful way to do it. It is too narrow in its
scope and potentially buggy in its approach to developing a cache and
using trigger-like stuff. 


ISTM that with the right metadata in the right place, a cleaner and
easier solution is still possible for 8.5. The code within COPY should
really just reduce to a small piece of code to derive the correct
relation for the desired row and then use that during heap_insert().

I have just discussed partitioning with Itagaki-san at JPUG, so I know
his plans. Itagaki-san and Manu, please can you work together to make
this work for 8.5? 


---

A more detailed explanation of Partitioning Metadata:

Partitioning Metadata is information held on the relcache for a table
that has child partitions. Currently, a table does not cache info about
its children, which prevents various optimisations.

We would have an extra pointer on the Relation struct that points to a
PartitioningMetadata struct. We can fill in this information when we
construct the relcache for a relation, or we can populate it on demand
the first time we attempt to use that information (if it exists).

We want to hold an array of partition boundary values. This will then
allow us to use bsearch to find the partition t

Re: [HACKERS] [GENERAL] Updating column on row update

2009-11-23 Thread Tom Lane

Thom Brown  writes:
> As for having plpgsql installed by default, are there any security
> implications?

Well, that's pretty much exactly the question --- are there?  It would
certainly make it easier for someone to exploit any other security
weakness they might find.  I believe plain SQL plus SQL functions is
Turing-complete, but that doesn't mean it's easy or fast to write loops
etc in it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Updating column on row update

2009-11-23 Thread Robert Haas

On Sun, Nov 22, 2009 at 11:38 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Sun, Nov 22, 2009 at 6:51 PM, Tom Lane  wrote:
>>> CREATE IF NOT EXISTS has been proposed and rejected before, more than
>>> once.  Please see the archives.
>
>> Search for CINE to find the discussions.  This is a good place to start:
>> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00252.php
>
>> Despite Tom's assertions to the contrary, I am unable to find a clear
>> consensus against this feature in the archives,
>
> I think you didn't look back far enough --- that issue was settled years
> ago.  IIRC the killer argument is that after CINE you do not know the
> state of the object: it exists, yes, but what properties has it got?
> If it already existed then it's still got its old definition, which
> might or might not be what you're expecting.
>
> CREATE OR REPLACE has got far safer semantics from the viewpoint of a
> script that wants to bull through without having any actual error
> handling (which is more or less the scenario we are arguing here, no?)
> After successful execution of the command you know exactly what
> properties the object has got.

Sure.  I think that CINE only makes sense for objects for which COR
can't be implemented - things that have internal substructure, like
tables or tablespaces.  I agree that there are pitfalls for the unwary
but I still think it's better than nothing.  I understand that you
disagree.

> Whether it would be sensible to have CREATE OR REPLACE semantics for a
> language is something I'm not very clear on.  It seems like changing any
> of the properties of a pg_language entry could be rather fatal from the
> viewpoint of an existing function using the language.
>
> [ thinks for awhile... ]  Actually, CREATE LANGUAGE is unique among
> creation commands in that the common cases have no parameters, at least
> not since we added pg_pltemplate.  So you could imagine defining CINE
> for a language as disallowing any parameters and having these semantics:
>        * language not present -> create from template
>        * language present, matches template -> OK, do nothing
>        * language present, does not match template -> report error
> This would meet the objection of not being sure what the state is
> after successful execution of the command.  It doesn't scale to any
> other object type, but is it worth doing for this one type?

CREATE OR REPLACE seems like a better fit in this case.  For example,
it seems plausible that someone might want to add an inline handler to
a procedural language that didn't have one without dropping and
recreating the language.  Even changing the call handler seems like it
could be potentially useful in an upgrade scenario.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unicode UTF-8 table formatting for psql text output

2009-11-23 Thread Tom Lane

Peter Eisentraut  writes:
> What is the plan behind keeping the old format?  Are we going to remove
> it after one release if no one complains, or are we seriously expecting
> that someone has code that actually parses this?

Plan?  Do we need a plan?  The extra support consists of about two lines
of code and a small table, so there wouldn't be much harm in leaving
it there forever.

On the other hand it seems pretty likely that no one would care after a
release or two.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove -w (--ignore-all-space) option from pg_regress's diff

2009-11-23 Thread Magnus Hagander

On Mon, Nov 23, 2009 at 15:23, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Mon, Nov 23, 2009 at 06:58, Andrew Dunstan  wrote:
>>> They might not be using the same CVS programs, though. It appears that
>>> Windows CVS (which, for example, red_bat uses) translates line endings to
>>> CRLF, which is why it passed the regression tests, but MinGW CVS does not,
>>> which I think is is why narwahl and vaquita failed and why dawn_bat will
>>> probably fail next go round. brown_bat is on Cygwin and we should not expect
>>> a change there.
>
>> Yeah, I've seen a lot of weirdness with CVS clients on Windows doing
>> that differently, so that also seems like a very likely reason.
>
> brown_bat is indeed still green, so Andrew's probably fingered the right
> component.  I thought for a moment about insisting that Windows
> buildfarm members use a non-translating CVS client, but that would still
> leave people vulnerable when trying to build from source, if they use a
> tarball extractor that converts newlines.
>
> I'm thinking that the most appropriate fix is to have pg_regress
> continue to use -w, but only on Windows.  (I notice that ecpg is already
> doing it that way, presumably for the same reason of newline
> differences.)  A filter such as Andrew mumbled about upthread seems like
> more trouble than the problem is worth.  Any actually-interesting
> whitespace changes should get caught on other platforms.

Agreed, that seems like the most sensible solution.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove -w (--ignore-all-space) option from pg_regress's diff

2009-11-23 Thread Tom Lane

Magnus Hagander  writes:
> On Mon, Nov 23, 2009 at 06:58, Andrew Dunstan  wrote:
>> They might not be using the same CVS programs, though. It appears that
>> Windows CVS (which, for example, red_bat uses) translates line endings to
>> CRLF, which is why it passed the regression tests, but MinGW CVS does not,
>> which I think is is why narwahl and vaquita failed and why dawn_bat will
>> probably fail next go round. brown_bat is on Cygwin and we should not expect
>> a change there.

> Yeah, I've seen a lot of weirdness with CVS clients on Windows doing
> that differently, so that also seems like a very likely reason.

brown_bat is indeed still green, so Andrew's probably fingered the right
component.  I thought for a moment about insisting that Windows
buildfarm members use a non-translating CVS client, but that would still
leave people vulnerable when trying to build from source, if they use a
tarball extractor that converts newlines.

I'm thinking that the most appropriate fix is to have pg_regress
continue to use -w, but only on Windows.  (I notice that ecpg is already
doing it that way, presumably for the same reason of newline
differences.)  A filter such as Andrew mumbled about upthread seems like
more trouble than the problem is worth.  Any actually-interesting
whitespace changes should get caught on other platforms.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] point_ops for GiST

2009-11-23 Thread Teodor Sigaev


Hi!

patch implements operator class for GiST over points. Supported operations:
point   << point
point   >> point
point   <^ point
point   >^ point
point   ~= point
point   <@ box
box @> point
point   <@ polygon
polygon @> point
point   <@ circle
circle  @> point
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


point_ops-0.4.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Red-black tree for GIN

2009-11-23 Thread Teodor Sigaev


Hi there,

attached is a patch, which contains implementation of a  red-black
tree,  a self-balanced implementation of binary tree.  The main goal of
this patch is to improve creation time of GIN index in the corner cases.
While creation, GIN collects data in memory in binary tree until it
reach some limit and then it flush tree to disk. Some data could
produces unbalanced binary tree,  for example, sorted data, so the tree
degenerates to the list with O(N^2) processing time (see thread
http://archives.postgresql.org/pgsql-performance/2009-03/msg00340.php)
), which cause very slow index creation.  Tom has fixed that by limiting
depth of tree  (committed to 8.3 and 8.4),  but we found it's not enough
and propose to use red-black tree, which is very good for skewed data and has 
almost the same performance for unsorted data, see 
http://www.sai.msu.su/~megera/wiki/2009-07-27 and

http://www.sai.msu.su/~megera/wiki/2009-04-03 for more information.

Implementation of red-black tree has several currently unused  methods,
but they will be used in next patches.

--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


rbtree-0.5.gz
Description: Unix tar archive

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] forget patch win32.mak.

2009-11-23 Thread Magnus Hagander

Thanks, applied!

//Magnus

2009/11/23 Hiroshi Saito :
> Hi.
>
> Only for CVS-HEAD, not need backpatch.
> I checked REL8_4_STABLE is this.
> ==
> nmake -f win32.mak
> ...
> Microsoft (R) Manifest Tool version 5.2.3790.2076
> Copyright (c) Microsoft Corporation 2005.
> All rights reserved.
>   cd ..\..
>   echo All Win32 parts have been built!
> All Win32 parts have been built!
> ==
>
> Thanks!
>
> Regards,
> Hiroshi Saito
>
> - Original Message - From: "Magnus Hagander" 
>
>
>> 2009/11/22 Hiroshi Saito :
>>>
>>> Hi Magnus.
>>>
>>> It is a thing left behind.:-)
>>> please apply it. Thanks!
>>> ==
>>> libpq.lib(ip.obj) : error LNK2019: 未解決の外部シンボル __imp__wsaio...@36 が関
>>> 数 _pg_foreach_ifaddr で参照されました。
>>> libpq.lib(ip.obj) : error LNK2019: 未解決の外部シンボル __imp__wsasock...@24 が
>>> 関数 _pg_foreach_ifaddr で参照されました。
>>> ==
>>
>> Hi!
>>
>> Is this for HEAD only, or also for backpatching?
>>
>>
>> --
>> Magnus Hagander
>> Me: http://www.hagander.net/
>> Work: http://www.redpill-linpro.com/
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>
>



-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove -w (--ignore-all-space) option from pg_regress's diff

2009-11-23 Thread Magnus Hagander

On Mon, Nov 23, 2009 at 06:58, Andrew Dunstan  wrote:
>
>
> Tom Lane wrote:
>>
>> Andrew Dunstan  writes:
>>
>>>
>>> Tom Lane wrote:
>>>

 Remove -w (--ignore-all-space) option from pg_regress's diff calls.

>>
>>
>>>
>>> Looks like this has broken on Windows due to different line endings,
>>> which -w hid from us.
>>>
>>
>> Yeah.  I was waiting for brown_bat to report in before bringing this
>> up on the list, because I don't entirely understand what's happening.
>> So far it appears that the MSVC builds are fine and the MinGW builds
>> are not.  How can that be?  Aren't they using the same diff program?
>>

In most cases, I think they would not use the same diff program, no.
The MSVC builds would be using the one from the gnuwin32 project, and
the mingw one use the one from mingw. It's probably the same diff
source underneath, but it's most likely built with different options.


> They might not be using the same CVS programs, though. It appears that
> Windows CVS (which, for example, red_bat uses) translates line endings to
> CRLF, which is why it passed the regression tests, but MinGW CVS does not,
> which I think is is why narwahl and vaquita failed and why dawn_bat will
> probably fail next go round. brown_bat is on Cygwin and we should not expect
> a change there.

Yeah, I've seen a lot of weirdness with CVS clients on Windows doing
that differently, so that also seems like a very likely reason.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partitioning option for COPY

2009-11-23 Thread Simon Riggs

On Wed, 2009-11-11 at 19:53 -0500, Emmanuel Cecchet wrote:
> Hi,
> >> I have extracted the partitioning option for COPY (removed the error 
> >> logging part) from the previous patch.
> >> 
> >
> > We can use an INSERT trigger to route tuples into partitions even now.
> > Why do you need an additional router for COPY? 
> Tom has already explained on the list why using a trigger was a bad idea 
> (and I know we can use a trigger since I am the one who wrote it).
> If you look at the code you will see that you can do optimizations in 
> the COPY code that you cannot do in the trigger.
> 
> >  Also, it would be nicer
> > that the router can works not only in COPY but also in INSERT.
> >   
> As 8.5 will at best provide a syntactic hack on top of the existing 
> constraint implementation, I think that it will not hurt to have routing 
> in COPY since we will not have it anywhere otherwise.
> > BTW, I'm working on meta data of partitioning now. Your "partitioning"
> > option in COPY could be replaced with the catalog.
> >   
> This implementation is only for the current 8.5 and it will not be 
> needed anymore once we get a fully functional partitioning in Postgres 
> which seems to be for a future version.

Yes, the trigger way of doing this is a bad way. 

I regret to say that the way proposed here isn't much better, AFAICS.
Let me explain why I think that, but -1 to anyone applying this patch.

This patch proposes keeping a cache of last visited partitions to reduce
the overhead of data routing.

What I've requested is that partitioning work by using a data structure
held in relcache for inheritance parents. This differs in 3 ways from
this patch
a) it has a clearly defined location for the cached metadata, with
clearly identified and well worked out mechanisms for cache invalidation
b) the cache can be built once when it is first needed, not slowly grown
as parts of the metadata are used
c) it would be available for all parts of the server, not just COPY.

The easiest way to build that metadata is when structured partitioning
info is available. i.e. the best next action is to complete and commit
Itagaki's partitioning syntax patch. Then we can easily build the
metadata for partitioning, which can then be used in COPY for data
routing.

Anyway, I want data routing, as is the intention of this patch. I just
don't think this patch is a useful way to do it. It is too narrow in its
scope and potentially buggy in its approach to developing a cache and
using trigger-like stuff. 

ISTM that with the right metadata in the right place, a cleaner and
easier solution is still possible for 8.5. The code within COPY should
really just reduce to a small piece of code to derive the correct
relation for the desired row and then use that during heap_insert().

I have just discussed partitioning with Itagaki-san at JPUG, so I know
his plans. Itagaki-san and Manu, please can you work together to make
this work for 8.5? 

---

A more detailed explanation of Partitioning Metadata:

Partitioning Metadata is information held on the relcache for a table
that has child partitions. Currently, a table does not cache info about
its children, which prevents various optimisations.

We would have an extra pointer on the Relation struct that points to a
PartitioningMetadata struct. We can fill in this information when we
construct the relcache for a relation, or we can populate it on demand
the first time we attempt to use that information (if it exists).

We want to hold an array of partition boundary values. This will then
allow us to use bsearch to find the partition that a specific value
applies to. Thus it can be used for routing data from INSERTs or COPY,
can be used for identifying which partitions need to be
included/excluded from an APPEND node. Using this will be O(logN) rather
than O(N), so allowing us to have much larger number of partitions when
required. Note that it can also be used within the executor to perform
dynamic partition elimination, thus allowing us to easily implement
partition aware joins etc.

To construct the array we must sort the partition boundary values and
prove that the partition definitions do not overlap. That is much easier
to do when the partitions are explicitly defined. (Plus, there is no
requirement to have, or mechanism to specify, unique partitions
currently, although most users assume this in their usage).

I imagine we would have an API called something like
RelationIdentifyPartition() where we provide value(s) for the
PartitioningKey column(s) and we then return the Oid of the partition
that holds that value. That function would build the metadata, if not
already cached, then bsearch it to provide the Oid.

-- 
 Simon Riggs   www.2ndQuadrant.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Updating column on row update

2009-11-23 Thread Thom Brown

2009/11/23 Tom Lane 

> CREATE OR REPLACE has got far safer semantics from the viewpoint of a
> script that wants to bull through without having any actual error
> handling (which is more or less the scenario we are arguing here, no?)
> After successful execution of the command you know exactly what
> properties the object has got.
>
> Whether it would be sensible to have CREATE OR REPLACE semantics for a
> language is something I'm not very clear on.  It seems like changing any
> of the properties of a pg_language entry could be rather fatal from the
> viewpoint of an existing function using the language.
>
> [ thinks for awhile... ]  Actually, CREATE LANGUAGE is unique among
> creation commands in that the common cases have no parameters, at least
> not since we added pg_pltemplate.  So you could imagine defining CINE
> for a language as disallowing any parameters and having these semantics:
>* language not present -> create from template
>* language present, matches template -> OK, do nothing
>* language present, does not match template -> report error
> This would meet the objection of not being sure what the state is
> after successful execution of the command.  It doesn't scale to any
> other object type, but is it worth doing for this one type?
>
>regards, tom lane
>

Actually, I prefer CREATE OR REPLACE over CINE, at least for the majority of
the creations, especially since it would be more consistent with what we
have for functions.  If there must be an exception for languages, it would
make sense from what you describe above.

As for having plpgsql installed by default, are there any security
implications?  If not, I can only see it as an advantage.  At the moment
we're having to resort to a bit of a hack using a CASE statement in a plain
SQL function as mentioned earlier in this thread.

Thom

Re: [HACKERS] named generic constraints [feature request]

2009-11-23 Thread Pavel Stehule

Hello

do you know domains? It is very similar to your proposal.

http://www.postgresql.org/docs/8.2/static/sql-createdomain.html

Regards
Pavel Stehule

2009/11/23 Caleb Cushing :
> So last time I checked this wasn't possible (at least not that anyone
> has told me). I'd like to be able to create constraints that aren't
> tied to a specific table/column.
>
> I think that the syntax would look something like this
>
> CREATE CONSTRAINT empty CHECK (VALUE = '\0' );
>
> this should allow us to do thinks like
>
> CREATE TABLE users (
>              username TEXT NOT empty
> );
>

constraint cannot be  part of  expression.

CREATE OR REPLACE FUNCTION emptystr(text)
RETURNS bool AS $$
  SELECT $1 <> ''; -- it is SQL not C
$$ LANGUAGE sql;

CREATE TABLE users(
  username TEXT CHECK (NOT emptystr(username)),
  ...

p.s. Is it related to ANSI SQL?

Regards
Pavel Stehule

> the example from create domain (modified)  is also pretty good
>
> CREATE CONSTRAINT zip CHECK(
>   VALUE ~ '^\\d{5}$'
> OR VALUE ~ '^\\d{5}-\\d{4}$'
> );
>
> --
> Caleb Cushing
>
> http://xenoterracide.blogspot.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] named generic constraints [feature request]

2009-11-23 Thread Caleb Cushing

So last time I checked this wasn't possible (at least not that anyone
has told me). I'd like to be able to create constraints that aren't
tied to a specific table/column.

I think that the syntax would look something like this

CREATE CONSTRAINT empty CHECK (VALUE = '\0' );

this should allow us to do thinks like

CREATE TABLE users (
  username TEXT NOT empty
);

the example from create domain (modified)  is also pretty good

CREATE CONSTRAINT zip CHECK(
   VALUE ~ '^\\d{5}$'
OR VALUE ~ '^\\d{5}-\\d{4}$'
);

-- 
Caleb Cushing

http://xenoterracide.blogspot.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Unicode UTF-8 table formatting for psql text output

2009-11-23 Thread Peter Eisentraut

On sön, 2009-11-22 at 00:23 -0500, Tom Lane wrote:
> Roger Leigh  writes:
> > Attached is an updated patch with a couple of tweaks to ensure output
> > is formatted and spaced correctly when border=0, which was off in the
> > last patch.
> 
> Applied wih minor editorialization.  Notably, I renamed the
> backwards-compatible option from "ascii-old" to "old-ascii",
> because the original submission failed to preserve the documented
> behavior that the options could be abbreviated to one letter.

What is the plan behind keeping the old format?  Are we going to remove
it after one release if no one complains, or are we seriously expecting
that someone has code that actually parses this?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

75 matches

Mail list logo