Re: [HACKERS] IMPORT FOREIGN SCHEMA statement

2014-06-16 Thread Atri Sharma
On Mon, Jun 16, 2014 at 11:28 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:

 On Mon, May 26, 2014 at 6:23 AM, Ronan Dunklau ronan.dunk...@dalibo.com
 wrote:
  Le dimanche 25 mai 2014 12:41:18 David Fetter a écrit :
  On Fri, May 23, 2014 at 10:08:06PM +0200, Ronan Dunklau wrote:
   Hello,
  
   Since my last proposal didn't get any strong rebuttal, please find
   attached a more complete version of the IMPORT FOREIGN SCHEMA
 statement.
 
  Thanks!
 
  Please to send future patches to this thread so people can track them
  in their mail.
 
  I'll do.
 
  I didn't for the previous one because it was a few months ago, and no
 patch
  had been added to the commit fest.
 
 
   I tried to follow the SQL-MED specification as closely as possible.
  
   This adds discoverability to foreign servers. The structure of the
   statement as I understand it is simple enough:
  
   IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT
 TO |
   EXCEPT) table_list ] INTO local_schema.
  
   The import_foreign_schema patch adds the infrastructure, and a new FDW
   routine:
  
   typedef List *(*ImportForeignSchema_function) (ForeignServer *server,
   ImportForeignSchemaStmt * parsetree);
  
   This routine must return a list of CreateForeignTableStmt mirroring
   whatever tables were found on the remote side, which will then be
   executed.
  
   The import_foreign_schema_postgres_fdw patch proposes an
 implementation of
   this API for postgres_fdw. It will import a foreign schema using the
 right
   types as well as nullable information.
 
  In the case of PostgreSQL, the right types are obvious until there's
  a user-defined one.  What do you plan to do in that case ?
 
 
  The current implementation fetches the types as regtype, and when
 receiving a
  custom type, two things can happen:
 
   - the type is defined locally: everything will work as expected
   - the type is not defined locally: the conversion function will fail,
 and
  raise an error of the form: ERROR:  type schema.typname does not exist

 Just wondering: what about the case where the same data type is
 defined on both local and remote, but with *different* definitions? Is
 it the responsibility of the fdw to check for type incompatibilities?


Logically, should be.

Just wondering, cant we extend the above proposed function  typedef List
*(*ImportForeignSchema_function) (ForeignServer *server,
ImportForeignSchemaStmt * parsetree); be changed a bit to give exact type
definitions from the remote side as well?

Regards,

Atri


Regards,

Atri
*l'apprenant*


[HACKERS] 9.5 CF1

2014-06-16 Thread Abhijit Menon-Sen
Hi.

There are 92 outstanding patches in this CommitFest, and 63 of them do
not have any reviewer. Those are very large numbers, so I hope everyone
will pitch in to keep things moving along.

There's quite a variety of patches available for review this time, and
any level of feedback about them is useful, from no longer applies to
HEAD or doesn't build to more detailed reviews.

If you don't have the time to do a full review, or are getting bogged
down, post whatever you do have (the same amount of fame and fortune
will still be yours!).

If you're wondering where to start, here are some suggestions, picked
almost at random:

Using Levenshtein distance to HINT a candidate column name

http://archives.postgresql.org/message-id/cam3swzs9-xr2ud_j9yrkdctt6xxy16h1eugtswmlu6or4ct...@mail.gmail.com

Better partial index-only scans

http://archives.postgresql.org/message-id/CABz-M-GrkvrMc9ni5S0mX53rtZg3=szneyru_a8rigq2b3m...@mail.gmail.com

Use unique index for longer pathkeys

http://archives.postgresql.org/message-id/20140613.164133.160845727.horiguchi.kyot...@lab.ntt.co.jp

SQL access to database attributes
http://archives.postgresql.org/message-id/53868e57.3030...@dalibo.com

pg_resetxlog option to change system identifier
http://archives.postgresql.org/message-id/539b97fc.8040...@2ndquadrant.com

pg_xlogdump --stats
http://archives.postgresql.org/message-id/20140604104716.ga3...@toroid.org

tab completion for set search_path TO

http://archives.postgresql.org/message-id/CAMkU=1xJzK0h7=0_sOLLKGaf7zSwp_YzcKwuG41Ns+_Qcn+t=g...@mail.gmail.com

idle_in_transaction_timeout
http://archives.postgresql.org/message-id/538e600e.1020...@dalibo.com

I'll post a periodic summary to the list, and will send out reminders by
private mail as usual.

Please feel free to contact me with questions.

-- Abhijit

P.S. If you tag your reviews with [REVIEW] in the Subject, it'll be
easier to keep track of them.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IMPORT FOREIGN SCHEMA statement

2014-06-16 Thread Ronan Dunklau
Le lundi 16 juin 2014 11:32:38 Atri Sharma a écrit :
 On Mon, Jun 16, 2014 at 11:28 AM, Michael Paquier michael.paqu...@gmail.com
  wrote:
  Just wondering: what about the case where the same data type is
  defined on both local and remote, but with *different* definitions? Is
  it the responsibility of the fdw to check for type incompatibilities?
 
 Logically, should be.
 

This is akin to what Stephen proposed, to allow IMPORT FOREIGN SCHEMA to also 
import types. 

The problem with checking if the type is the same is deciding where to stop. 
For composite types, sure it should be easy. But what about built-in types ? 
Or types provided by an extension / a native library ? These could theorically 
change from one release to another.

 Just wondering, cant we extend the above proposed function  typedef List
 *(*ImportForeignSchema_function) (ForeignServer *server,
 ImportForeignSchemaStmt * parsetree); be changed a bit to give exact type
 definitions from the remote side as well?

I toyed with this idea, but the more I think about it the less I'm sure what 
the API should look like, should we ever decide to go beyond the standard and 
import more than tables. Should the proposed function return value be changed 
to void, letting the FDW execute any DDL statement ? The advantage of 
returning a list of statement was to make it clear that tables should be 
imported, and letting the core enforce INTO local_schema part of the clause.

I would prefer if the API is limited by design to import tables. This 
limitation can always be bypassed by executing arbitrary statements before 
returning the list of ImportForeignSchemaStmt*. 

For the postgres_fdw specific case, we could add two IMPORT options (since it 
looked like we had a consensus on this):

 - import_types
 - check_types

Import types would import simple, composite types, issuing the corresponding 
statements before returning to core.

Check types would compare the local and remote definition for composite types. 
For types installed by an extension, it would check that the local type has 
also been created by an extension of the same name, installed in the same 
schema, raising a warning if the local and remote version differ.
For built-in types, a warning would be raised if the local and remote versions 
of PostgreSQL differ.

However, I don't know what we should be doing about types located in a 
different schema. For example, if the remote table s1.t1 has a column of 
composite type s2.typ1, should we import typ1 in s1 ? In S2, optionnaly 
creating the non-existing schema ? Raise an error ?

Regards,

-- 
Ronan Dunklau
http://dalibo.com - http://dalibo.org

signature.asc
Description: This is a digitally signed message part.


Re: [HACKERS] IMPORT FOREIGN SCHEMA statement

2014-06-16 Thread Michael Paquier
On Sat, May 24, 2014 at 5:08 AM, Ronan Dunklau ronan.dunk...@dalibo.com wrote:
 Hello,

 Since my last proposal didn't get any strong rebuttal, please find attached a
 more complete version of the IMPORT FOREIGN SCHEMA statement.

 I tried to follow the SQL-MED specification as closely as possible.

 This adds discoverability to foreign servers. The structure of the statement
 as I understand it is simple enough:

 IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT TO |
 EXCEPT) table_list ] INTO local_schema.

 The import_foreign_schema patch adds the infrastructure, and a new FDW
 routine:

 typedef List *(*ImportForeignSchema_function) (ForeignServer *server,
 ImportForeignSchemaStmt * parsetree);

 This routine must return a list of CreateForeignTableStmt mirroring whatever
 tables were found on the remote side, which will then be executed.

 The import_foreign_schema_postgres_fdw patch proposes an implementation of
 this API for postgres_fdw. It will import a foreign schema using the right
 types as well as nullable information.

 Regarding documentation, I don't really know where it should have been put. If
 I missed something, let me know and I'll try to correct it.

I have been playing a bit with this patch... And got a couple of comments.

1) Perhaps somebody could guide me to a better spec but by looking at
the spec here = http://farrago.sourceforge.net/design/sqlmed.html, it
seems that the FROM SERVER clause is misplaced, it should be placed
after the table list.
2) The query I am seeing on this spec offers the possiblitily to query
TABLE_NAME LIKE 'pattern'. Is this part of the spec as well? If yes,
are you planning to add it as well. I imagine that this could be quite
handy for users importing from a foreign schema tables that have the
same prefix name for example. ImportForeignSchemaRestrictionType could
be extended with an IMPORT_LIKE type. Extending a bit the spec...
There could be a list of LIKE patterns, this matches more your code by
adding all of them in table_names. I am not voting to add TABLE_NAME
in the list of reserved keywords though, so something like TABLE LIKE
'pattern' would be nice.
3) This has been already raised in this thread, but something missing
with this patch is the possiblity to pass options when executing an
import. This could allow a FDW to do more fine-grained operations on
the underlying objects of the table. There would be two level of
options: table level and schema level. For example, with the current
feature it is not possible to rename imported tables on-the-fly. I
imagine that this would be useful.
4) Something not mandatory with the core patch but always nice to
have: tab completion for this command in psql.
5) The error message returned to user when import fails because of a
missing type is rather unfriendly. Let's imagine with two nodes and
postgres_fdw... Node 1:
create type toto as (a int, b int);
create table toto_tab (a toto);
Node 2:
=# import foreign schema public from server postgres_server into public;
ERROR:  42704: type public.toto does not exist
LOCATION:  parseTypeString, parse_type.c:797
I would rather imagine something like IMPORT failed because of
missing type defined on remote but not locally.
6) Import does not fail if a table specified in LIMIT TO is not
defined on remote-side:
=# import foreign schema public from server postgres_server limit to
(tab_not_there) into public;
IMPORT FOREIGN SCHEMA
=# \d
No relations found.
7) A small thing: code comments do not respect the project format:
http://www.postgresql.org/docs/9.1/interactive/source-format.html
8) In fetch_remote_tables@postgres_fdw.c, you are missing materialized views:
WHERE relkind IN ('r', 'v', 'f')
9) The changes in pg_type.h adding new OID defines should be a
separate patch IMO
10) Code has many whitespaces.
11) Sometimes the import goes strangely:
11-1) After some testing I noticed that tables with incorrect names
were imported when using LIMIT TO. For example on remote a table
called ab is present, IMPORT SCHEMA LIMIT TO 'other_ab' created a
local entry called other_ab while it should create nothing.
11-2) Tables with empty names can be created locally. On foreign server:
=# \d
 List of relations
 Schema |   Name   | Type  | Owner
+--+---+
 public | aa   | table | ioltas
 public | toto_tab | table | ioltas
(2 rows)
On local node:
=# import foreign schema public from server postgres_server limit to
(toto_tab, NULL, aa) into public;
-- (Forget about NULL here, I could reproduce it without as well)
IMPORT FOREIGN SCHEMA
Time: 13.589 ms
=# \d
 List of relations
 Schema |   Name   | Type  | Owner
+--+---+
 public |  | foreign table | ioltas
 public | toto_tab | foreign table | ioltas
(2 rows)
Some more testing showed me that as well:
=# \d

List
of relations
 Schema |
  

Re: [HACKERS] [v9.5] Custom Plan API

2014-06-16 Thread Shigeru Hanada
Kaigai-san,

I've just applied v1 patch, and tried build and install, but I found two issues:

1) The contrib/ctidscan is not automatically built/installed because
it's not described in contrib/Makefile.  Is this expected behavior?
2) I got an error message below when building document.

$ cd doc/src/sgml
$ make
openjade  -wall -wno-unused-param -wno-empty -wfully-tagged -D . -D .
-d stylesheet.dsl -t sgml -i output-html -V html-index postgres.sgml
openjade:catalogs.sgml:2525:45:X: reference to non-existent ID
SQL-CREATECUSTOMPLAN
make: *** [HTML.index] Error 1
make: *** Deleting file `HTML.index'

I'll review another part of the patch, including the design.


2014-06-14 10:59 GMT+09:00 Kohei KaiGai kai...@kaigai.gr.jp:
 According to the discussion upthread, I revised the custom-plan patch
 to focus on regular relation scan but no join support right now, and to
 support DDL command to define custom-plan providers.

 Planner integration with custom logic to scan a particular relation is
 enough simple, unlike various join cases. It's almost similar to what
 built-in logic are doing now - custom-plan provider adds a path node
 with its cost estimation if it can offer alternative way to scan referenced
 relation. (in case of no idea, it does not need to add any paths)

 A new DDL syntax I'd like to propose is below:

   CREATE CUSTOM PLAN name FOR class PROVIDER function_name;

 name is as literal, put a unique identifier.
 class is workload type to be offered by this custom-plan provider.
 scan is the only option right now, that means base relation scan.
 function_name is also as literal; it shall perform custom-plan provider.

 A custom-plan provider function is assumed to take an argument of
 internal type to deliver a set of planner information that is needed to
 construct custom-plan pathnode.
 In case of scan class, pointer towards an customScanArg object
 shall be delivered on invocation of custom-plan provider.

 typedef struct {
 uint32custom_class;
 PlannerInfo*root;
 RelOptInfo *baserel;
 RangeTblEntry  *rte;
 } customScanArg;

 In case when the custom-plan provider function being invoked thought
 it can offer an alternative scan path on the relation of baserel, things
 to do is (1) construct a CustomPath (or its inherited data type) object
 with a table of callback function pointers (2) put its own cost estimation,
 and (3) call add_path() to register this path as an alternative one.

 Once the custom-path was chosen by query planner, its CreateCustomPlan
 callback is called to populate CustomPlan node based on the pathnode.
 It also has a table of callback function pointers to handle various planner's
 job in setrefs.c and so on.

 Similarly, its CreateCustomPlanState callback is called to populate
 CustomPlanState node based on the plannode. It also has a table of
 callback function pointers to handle various executor's job during quey
 execution.

 Most of callback designs are not changed from the prior proposition in
 v9.4 development cycle, however, here is a few changes.

 * CustomPlan became to inherit Scan, and CustomPlanState became to
   inherit ScanState. Because some useful routines to implement scan-
   logic, like ExecScan, expects state-node has ScanState as its base
   type, it's more kindness for extension side. (I'd like to avoid each
   extension reinvent ExecScan by copy  paste!)
   I'm not sure whether it should be a union of Join in the future, however,
   it is a reasonable choice to have compatible layout with Scan/ScanState
   to implement alternative scan logic.

 * Exporting static functions - I still don't have a graceful answer here.
   However, it is quite natural that extensions to follow up interface updates
   on the future version up of PostgreSQL.
   Probably, it shall become clear what class of functions shall be
   exported and what class of functions shall be re-implemented within
   extension side in the later discussion.
   Right now, I exported minimum ones that are needed to implement
   alternative scan method - contrib/ctidscan module.

 Items to be discussed later:
 * planner integration for relations join - probably, we may define new
   custom-plan classes as alternative of hash-join, merge-join and
   nest-loop. If core can know this custom-plan is alternative of hash-
   join, we can utilize core code to check legality of join.
 * generic key-value style options in custom-plan definition - Hanada
   san proposed me off-list - like foreign data wrapper. It may enable
   to configure multiple behavior on a binary.
 * ownership and access control of custom-plan. right now, only
   superuser can create/drop custom-plan provider definition, thus,
   it has no explicit ownership and access control. It seems to me
   a reasonable assumption, however, we may have a usecase that
   needs custom-plan by unprivileged users.

 Thanks,

 2014-05-12 10:09 GMT+09:00 Kouhei Kaigai kai...@ak.jp.nec.com:
 

Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-06-16 Thread Christoph Berg
Re: Amit Kapila 2014-06-16 
CAA4eK1++wcFswhzH=92qiQFB=C0_Uy=mreadvhzxdrf3jow...@mail.gmail.com
 Developer options are mainly for debugging information or might help in one
 of the situations, so I thought somebody might not want them to be part of
 server configuration once they are set.  We already disallow parameters like
 config_file, transaction_isolation, etc. which are disallowed to be set in
 postgresql.conf.  Could you please explain me a bit in which
 situations/scenarios, do you think that allowing developer options via Alter
 System can be helpful?

Oh if these are disallowed in postgresql.conf, they should be
disallowed in auto.conf too. I meant to say that there shouldn't be
additional restrictions for auto.conf, except for cases like
data_directory which won't work at all (or are highly confusing).

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] How to implement the skip errors for copy from ?

2014-06-16 Thread xbzhang








I want to?implement the skip errors for copy from,lik as :create table A (c int 
primary key);copy A from stdin;112\.
copy will failed:ERROR: duplicate key violates primary key constraint CC_PKEY?
CONTEXT: COPY CC, line 2: 1
I want skip the error, and continue to copy the reset of tuple. The resultwill 
be that there are two rows in table A: 1 and 2.
how to?implement that ? Anybody give me some?suggestion?



张晓博?? 研发二部
北京人大金仓信息技术股份有限公司
地址:北京市海淀区上地西路八号院上地科技大厦4号楼501
邮编:100085
电话:(010) 5885 1118 - 8450
手机:15311394463
邮箱:xbzh...@kingbase.com.cn



Re: [HACKERS] How to implement the skip errors for copy from ?

2014-06-16 Thread Pavel Stehule
2014-06-16 11:46 GMT+02:00 xbzhang xbzh...@kingbase.com.cn:


 I want to implement the skip errors for copy from,lik as :
 create table A (c int primary key);
 copy A from stdin;
 1
 1
 2
 \.

 copy will failed:
 ERROR: duplicate key violates primary key constraint CC_PKEY
 CONTEXT: COPY CC, line 2: 1

 I want skip the error, and continue to copy the reset of tuple. The result
 will be that there are two rows in table A: 1 and 2.

 how to implement that ? Anybody give me some suggestion?


you should to reimplement a copy procedure to use a subtransactions. Using
subtransaction for any row is too expensive, but you can do subtransaction
per 1000 rows, and when some exception is raised, then store data per one
row/one subtransaction.

Regards

Pavel Stehule



 --

 张晓博   研发二部

 北京人大金仓信息技术股份有限公司

 地址:北京市海淀区上地西路八号院上地科技大厦4号楼501

 邮编:100085

 电话:(010) 5885 1118 - 8450

 手机:15311394463

 邮箱:xbzh...@kingbase.com.cn



Re: [HACKERS] pg_receivexlog add synchronous mode

2014-06-16 Thread furuyao
 You introduced the state machine using the flag flush_flg into
 pg_receivexlog.
 That's complicated and would reduce the readability of the source code.
 I think that the logic should be simpler like walreceiver's one.
 
 Maybe I found one problematic path as follows:
 
 1. WAL is written and flush_flag is set to 1 2. PQgetCopyData() returns
 0 and flush_flg is incremented to 2 3. PQconsumeInput() is executed 4.
 PQgetCopyData() reads keepalive message 5. After processing keepalive
 message, PQgetCopyDate() returns 0 6. Since flush_flg is 2, WAL is
 flushed and flush_flg is reset to 0
 
 But new message can arrive while processing keepalive message. Before
 flushing WAL, such new message should be processed.
Together with the readability, fixed to the same process as the loop of 
walreceiver.

 +Enables synchronous mode. If replication slot is disabled then
 +this setting is irrelevant.
 
 Why is that irrelevant in that case?
 
 Even when replication slot is not used, thanks to this feature,
 pg_receivexlog can flush WAL more proactively and which may improve the
 durability of WAL which pg_receivexlog writes.
It's mean, report the flush position or not.
If the SLOT is not used, it is not reported.
Fixed to be reported only when using the SLOT.

 +printf(_(  -m, --sync-modesynchronous mode\n));
 
 I think that calling this feature synchronous mode is confusing.
Modified the synchronous mode to this mode is written some records, flush 
them to disk..

Regards,

-- 
Furuya Osamu


pg_receivexlog-add-synchronous-mode-v4.patch
Description: pg_receivexlog-add-synchronous-mode-v4.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Atomics hardware support table supported architectures

2014-06-16 Thread Andres Freund
On 2014-06-15 03:12:21 +0200, Andres Freund wrote:
 Hi,
 
 At this year developer's meeting we'd discussed the atomics abstraction
 which is necessary for some future improvements. We'd concluded that a
 overview over the hardware capabilities of the supported platforms would
 be helpful. I've started with that at:
 https://wiki.postgresql.org/wiki/Atomics
 
 Does somebody want other columns in there?
 
 From that and previous discussions
 (e.g. 
 http://archives.postgresql.org/message-id/20131013004658.GG4056218%40alap2.anarazel.de
 ) I think we should definitely remove some platforms:

 3) sparcv8: Last released model 1997.

Just as a bit of context:  sparcv9 support got dropped from solaris
10. In 2005.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Clarification of FDW API Documentation

2014-06-16 Thread Etsuro Fujita
(2014/06/14 2:46), Tom Lane wrote:
 Jason Petersen ja...@citusdata.com writes:

 Even if there is no guarantee that `IterateForeignScan` is called exactly 
 once
 before each `ExecForeignDelete` call (which would remove the ability to have
 them cooperate using this single cursor), one could easily devise other 
 storage
 backends that don't need junk columns to perform `DELETE` operations.
 
 Such as?  I could imagine having an optimization that works like you
 suggest for simple scan cases, but it's not there now, and it could not
 be the only mode.

The optimization in the following comment for postgresPlanForeignModify?

/*
 * postgresPlanForeignModify
 *  Plan an insert/update/delete operation on a foreign table
 *
 * Note: currently, the plan tree generated for UPDATE/DELETE will always
 * include a ForeignScan that retrieves ctids (using SELECT FOR UPDATE)
 * and then the ModifyTable node will have to execute individual remote
 * UPDATE/DELETE commands.  If there are no local conditions or joins
 * needed, it'd be better to let the scan node do UPDATE/DELETE RETURNING
 * and then do nothing at ModifyTable.  Room for future optimization ...
 */

I think this would be very useful.  So, I plan to add a patch for it to
2014-08.

Thanks,

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-06-16 Thread Fujii Masao
On Mon, Jun 16, 2014 at 12:14 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Sun, Jun 15, 2014 at 6:29 PM, Christoph Berg c...@df7cb.de wrote:

 Re: Amit Kapila 2014-06-13
 CAA4eK1KLn1SmgVtd=5emabqxrrpveedtbuu94e-repmwxwv...@mail.gmail.com
  Agreed, I had mentioned in Notes section of document.  Apart from that
  I had disallowed parameters that are excluded from postgresql.conf by
  initdb (Developer options) and they are recommended in user manual
  to be not used in production.

 Excluding developer options seems too excessive to me. ALTER SYSTEM
 should be useful for developing too.

 Developer options are mainly for debugging information or might help in one
 of the situations, so I thought somebody might not want them to be part of
 server configuration once they are set.  We already disallow parameters like
 config_file, transaction_isolation, etc. which are disallowed to be set in
 postgresql.conf.  Could you please explain me a bit in which
 situations/scenarios, do you think that allowing developer options via Alter
 System can be helpful?

I think that's helpful. I've heard that some users enable the developer option
trace_sort in postgresql.conf in order to collect the information
about sorting.
They might want to set trace_sort via ALTER SYSTEM, for example.

 This information is not stored in pg_settings.  One way is to specify
 manually all the parameters which are disallowed but it seems the query
 will become clumsy, another could be to have another column in pg_settings.
 Do you think of any other way?

I like the latter idea, i.e., exposing the flags of GUC parameters in
pg_settings,
but it seems overkill just for tab-completion purpose. I'd withdraw my comment.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-16 Thread Stephen Frost
Dean,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
 On 13 June 2014 01:13, Stephen Frost sfr...@snowman.net wrote:
  This approach was suggested by an existing user testing out this RLS
  approach, to be fair, but it looks pretty sane to me as a way to address
  some of these concerns. Certainly open to other ideas and thoughts though.
 
 Yeah, I was thinking something like this could work, but I would go
 further. Suppose you had separate GRANTable privileges for direct
 access to individual tables, bypassing RLS, e.g.
 
   GRANT DIRECT SELECT|INSERT|UPDATE|DELETE ON table_name TO role_name

This is certainly an interesting idea and I'm glad we're getting this
level of discussion early on in the 9.5 cycle as I'd really like to see
a good solution implemented for 9.5.

I've been going back-and-forth about this and what's really swaying me
right now is that it'd be nearly impossible to determine if a given RLS
qual actually allows full access to a table for a given user without
going through the entire table and testing the qual against each row.
With this GRANT ability, we'd be able to completely avoid calling the
RLS quals when the user is granted this right.

Not sure offhand how many bits we've got left at the per-table level
though; we added TRUNCATE rights not that long ago and this looks like
another good right to add, but there are only so many bits available..
At the same time, I do think this is something we could also add later,
perhaps after figuring out a good way to extend the set of bits
available for privileges on tables.

 Combined with the GUC (direct_table_access, say) to request direct
 access to all tables. Then with direct_table_access = true/required, a
 SELECT from a table would error if the user hadn't been granted the
 DIRECT SELECT privilege on all the tables referenced in the query.

I can see this working.  One thing I'm curious about is if we would want
to support this inside of the SELECT statement (or perhaps COPY?)
directly, rather than making a user have to flip a GUC back and forth
while they're doing something.  I can imagine, during testing, a session
looking like this:

select * from table;
@#@!$!
set direct_table_access = true;
select * from table;
select * from table where blah = x;
alter table set row level security blah = x;
select * from table;
select * from table;
select * from table;
@!#$!@#!
set direct_table_access = false;
select * from table;
...

Would 'select direct' or 'select * from DIRECT table' (or maybe 'ONLY'?)
be workable?  There's certainly SQL standard concerns to be thought of
here which might precldue anything we do with SELECT, but we could
support something with COPY.

 Tools like pg_dump would require direct_table_access, but there might
 be other levels of access that didn't error out.

pg_dump would need an option to set direct_table_access or not.  Having
it ask by default is acceptable to me, but I do think we need to be able
to tell it to *not* set that.

 I think if I were using RLS, I would definitely want/expect this level
 of fine-grained control over permissions on a per-table basis, rather
 than the superuser/non-superuser level of control, or having
 RLS-exempt users.

I agree that it'd be great to have- and we need to make sure we don't
paint ourselves into a corner with the initial versions.  What I'm
worried about is that we're going to end up feature-creeping this to
death and ending up with nothing in 9.5.  I'll try to get a wiki page
going to discuss these items (as mentioned up-thread) and we can look at
prioritizing them and looking at what dependencies exist on other parts
of the system and seeing what's required for the initial version.

 Actually, given the fact that the majority of users won't be using
 RLS, I would be tempted to invert the above logic and have the new
 privilege be for LIMITED access (via RLS quals). So a user granted the
 normal SELECT privilege would be able to bypass RLS, but a user only
 granted LIMITED SELECT wouldn't.

This I don't agree with- it goes against what is done on existing
systems afaik and part of the idea is that you can minimize changes to
the applications or users but still be able to curtail what they can
see.  Making regular SELECTs start erroring if they haven't set some GUC
because RLS has been implemented on a given table would be quite
annoying, imv.

Now, that said, wouldn't the end user be able to control this for their
particular environment by setting the GUC accordingly in
postgresql.conf?  I'd still argue that it should be defaulted to what I
view as the 'normal' case, where RLS is applied unless you asked for
your queries to error instead, but if a user wants to have it flipped
around the other way, they could update their postgresql.conf to make it
so.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] [REVIEW] Re: Re: BUG #9578: Undocumented behaviour for temp tables created inside query language (SQL) functions

2014-06-16 Thread Abhijit Menon-Sen
(Cc: to pgsql-bugs dropped.)

At 2014-03-17 18:24:55 +1100, kommi.harib...@gmail.com wrote:

 *** a/doc/src/sgml/xfunc.sgml
 --- b/doc/src/sgml/xfunc.sgml
 ***
 *** 153,159  SELECT clean_emp();
 --- 153,186 
   (literal\/) (assuming escape string syntax) in the body of
   the function (see xref linkend=sql-syntax-strings).
  /para
 +
 +sect2 id=xfunc-sql-function-parsing-mechanism
 + titleParsing mechanism of a function/title
   
 +indexterm
 + primaryfunction/primary
 + secondaryparsing mechanism/secondary
 +/indexterm

I suggest Catalog changes within functions instead of the above title.

 + para
 +  The body of an SQL function is parsed as if it were a single - 
 multi-part
 +  statement and thus uses a constant snapshot of the system catalogs for
 +  every sub-statement therein. Commands that alter the catalog will 
 likely not
 +  work as expected.
 + /para
 + 
 + para  
 +  For example: Issuing CREATE TEMP TABLE within an SQL function will 
 add
 +  the table to the catalog but subsequent statements in the same 
 function will
 +  not see those additions and thus the temporary table will be invisible 
 to them.
 + /para
 + 
 + para  
 +  Thus it is generally advised that applicationPL/pgSQL/ be used, 
 instead of
 +  acronymSQL/acronym, when any catalog visibilities are required in 
 the same function.
 + /para
 +/sect2

I don't think that much text is warranted. I suggest something like the
following condensed wording:

para
 The body of an SQL function is parsed as if it were a single
 multi-part statement, using a constant snapshot of the system
 catalogs. The effect of any commands that alter the catalogs
 (e.g. CREATE TEMP TABLE) will therefore not be visible to
 subsequent commands in the function body.
/para

para
 The recommended workaround is to use applicationPL/PgSQL/.
/para

Does that seem sensible to you?

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-16 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:
 Kevin Grittner (kgri...@ymail.com) wrote:

 The proposed approach would leave the validity of any dump which
 was not run as a superuser in doubt.  The last thing we need, in
 terms of improving security, is another thing you can't do
 without connecting as a superuser.

 Any dump not run by a superuser is already in doubt, imv.  That
 is a problem we already have which really needs to be addressed,
 but I view that as an independent issue.

I'm not seeing that.  If the user can't dump, you get an error and
pg_dump returns something other than SUCCESS.

test=# create user bob;
CREATE ROLE
test=# create user tom;
CREATE ROLE
test=# set role bob;
SET
test= create table person(person_id int primary key, name text not null, ssn 
text);
CREATE TABLE
test= insert into person values (1, 'Stephen Frost', '123-45-6789');
INSERT 0 1
test= insert into person values (2, 'Kevin Grittner');
INSERT 0 1
test= grant select (person_id, name) on person to tom;
GRANT
test= \q
kgrittn@Kevin-Desktop:~/pg/master$ pg_dump -U bob test bob-backup.sql
kgrittn@Kevin-Desktop:~/pg/master$ pg_dump -U tom test tom-backup.sql
pg_dump: [archiver (db)] query failed: ERROR:  permission denied for relation 
person
pg_dump: [archiver (db)] query was: LOCK TABLE public.person IN ACCESS SHARE 
MODE
kgrittn@Kevin-Desktop:~/pg/master$ echo $?
1

 I agree with avoiding adding another superuser-only capability;
 see the other sub-thread about making this a per-user capability.

It should be possible to design something which does not have this
risk.  What I was saying was that what was being described at that
point wasn't it, and IMV was not acceptable.  I think that there
should never by any doubt that a pg_dump run which completes
without error copied all requested tables in their entirety, not a
subset of the rows in the tables.

A GUC which only caused an error on the attempt to actually read
specific rows which the user does not have permission to see would
leak too much information.  A GUC which caused a SELECT or COPY
from a table to throw an error if the user was not entitled to see
all rows in the table could work.  Another thing which could work,
if it can be coded, would be a GUC which would throw an error if
the there were not quals on the query to prohibit seeing rows which
the security conditions would prohibit, whether or not any matching
rows actually existed.  The latter would match the behavior of
column level security -- you get an error when trying to select a
prohibited column even if there are no rows in the table.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [REVIEW] Re: postgresql.auto.conf read from wrong directory

2014-06-16 Thread Abhijit Menon-Sen
Hi.

Just a few minor comments about your patch:

At 2014-06-13 11:46:21 +0530, amit.kapil...@gmail.com wrote:

 +  titleNotes/title
 +
 +  para
 +This command will not allow to set parameters that are disallowed or
 +excluded in postgresql.conf. It also disallows to set configuration
 +parameter xref linkend=guc-data-directory.
 +  /para
 + /refsect1

I suggest the following wording:

This command may not be used to set
xref linkend=guc-data-directory
or any parameters that are not allowed in postgresql.conf.

 + /*
 +  * Disallow parameter's that are excluded or disallowed in
 +  * postgresql.conf.
 +  */

parameters, no apostrophe.

   if ((record-context == PGC_INTERNAL) ||
 - (record-flags  GUC_DISALLOW_IN_FILE))
 - ereport(ERROR,
 - (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
 -  errmsg(parameter \%s\ cannot be changed,
 - name)));
 + (record-flags  GUC_DISALLOW_IN_FILE) ||
 + (record-flags  GUC_DISALLOW_IN_AUTO_FILE) ||
 + (record-flags  GUC_NOT_IN_SAMPLE))
 +  ereport(ERROR,
 +  (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
 +   errmsg(parameter \%s\ cannot be changed,
 +  name)));

I looked at the settings that are marked GUC_NOT_IN_SAMPLE but neither
PGC_INTERNAL nor GUC_DISALLOW_IN_*FILE. I don't feel strongly about it,
but I don't see any particularly good reason to exclude them here.

(I also agree with Fujii-san that it isn't worth making extensive
changes to avoid data_directory being offered via tab-completion.)

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API change advice: Passing plan invalidation info from the rewriter into the planner?

2014-06-16 Thread Stephen Frost
* Kevin Grittner (kgri...@ymail.com) wrote:
 Stephen Frost sfr...@snowman.net wrote:
  Any dump not run by a superuser is already in doubt, imv.  That
  is a problem we already have which really needs to be addressed,
  but I view that as an independent issue.
 
 I'm not seeing that.  If the user can't dump, you get an error and
 pg_dump returns something other than SUCCESS.

We've outlined an approach with RLS which would do the same.

I'm still of the opinion that, today, we have a problem that only a
superuser-run dump has any chance of success (and even if you get it
working today it'll probably break tomorrow, and you had better be
paying attention).  I'd like to fix that situation, but it's an
independent effort from this.  We've had issues in the past with pg_dump
creating things that can't be restored and they're certainly bugs but
trying to make that work with a regular user as a whole system backup
strategy, today, is just asking for trouble.

  I agree with avoiding adding another superuser-only capability;
  see the other sub-thread about making this a per-user capability.
 
 It should be possible to design something which does not have this
 risk.  

The risk that pg_dump might create a dump which can't be restored?
Agreed, and I'd love to hear your thoughts on the proposal.

 What I was saying was that what was being described at that
 point wasn't it, and IMV was not acceptable.  I think that there
 should never by any doubt that a pg_dump run which completes
 without error copied all requested tables in their entirety, not a
 subset of the rows in the tables.

pg_dump needs to be able to have an option to go either way on this
case, as I can see value in running pg_dump in RLS-enforcing mode, but
it could default to error-if-RLS.

 A GUC which only caused an error on the attempt to actually read
 specific rows which the user does not have permission to see would
 leak too much information.  A GUC which caused a SELECT or COPY
 from a table to throw an error if the user was not entitled to see
 all rows in the table could work.

Right- this would be the 'DIRECT SELECT' which would allow bypassing all
RLS and therefore mean that the user is allowed to see ALL rows of a
table.  That's one of the reasons why I agree with Dean's approach,
because we really need to know at the outset if the calling user is
allowed to extract all rows from a table or not- we can't go looking
through the entire table testing each row before we start running the
query.

   Another thing which could work,
 if it can be coded, would be a GUC which would throw an error if
 the there were not quals on the query to prohibit seeing rows which
 the security conditions would prohibit, whether or not any matching
 rows actually existed.  

If I'm following you correctly, this would be an optimization that
allows avoiding RLS in the case where some information about the user
causes the overall qual to always return 'true', correct?  I'd certainly
like to see what happens in that case today and agree that it'd be great
to optimize for and perhaps even allow a user for which that is true to
not need the 'DIRECT SELECT' privilege, but in practice, I don't think
it'll be possible in most cases (certainly not in the case where an
external security module is deciding the access) and the optimization
may not be worth it.

 The latter would match the behavior of
 column level security -- you get an error when trying to select a
 prohibited column even if there are no rows in the table.

Agreed, but that would be a relaxation of the proposed approach and
therefore something which could be added later, if it's deemed
worthwhile.

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] [REVIEW] Re: Fix xpath() to return namespace definitions

2014-06-16 Thread Abhijit Menon-Sen
At 2014-05-30 16:04:33 +0700, the.ap...@gmail.com wrote:

 While developing some XML processing queries, i stumbled on an old bug
 mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
 repeated xpath() that apparently mess up namespaces.

Thanks for the patch, and welcome to Postgres development.

I can confirm that it works fine. I have attached here a very slightly
tweaked version of the patch (removed trailing whitespace, and changed
some comment text).

I'm marking this ready for committer.

-- Abhijit
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 422be69..f9ad598 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3602,19 +3602,28 @@ xml_xmlnodetoxmltype(xmlNodePtr cur)
 	if (cur-type == XML_ELEMENT_NODE)
 	{
 		xmlBufferPtr buf;
+		xmlNodePtr cur_copy;
 
 		buf = xmlBufferCreate();
+
+		/* the result of xmlNodeDump won't contain namespace definitions
+		 * from parent nodes, but xmlCopyNode duplicates a node along
+		 * with its required namespace definitions.
+		 */
+		cur_copy = xmlCopyNode(cur, 1);
 		PG_TRY();
 		{
-			xmlNodeDump(buf, NULL, cur, 0, 1);
+			xmlNodeDump(buf, NULL, cur_copy, 0, 1);
 			result = xmlBuffer_to_xmltype(buf);
 		}
 		PG_CATCH();
 		{
+			xmlFreeNode(cur_copy);
 			xmlBufferFree(buf);
 			PG_RE_THROW();
 		}
 		PG_END_TRY();
+		xmlFreeNode(cur_copy);
 		xmlBufferFree(buf);
 	}
 	else
diff --git a/src/test/regress/expected/xml.out b/src/test/regress/expected/xml.out
index 382f9df..a6d26f7 100644
--- a/src/test/regress/expected/xml.out
+++ b/src/test/regress/expected/xml.out
@@ -584,6 +584,12 @@ SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=http://127.0.0.1;loc
  {1,2}
 (1 row)
 
+SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ xpath  
+
+ {local:piece xmlns:local=\http://127.0.0.1\; id=\1\number one/local:piece,local:piece xmlns:local=\http://127.0.0.1\; id=\2\/}
+(1 row)
+
 SELECT xpath('//b', 'aone btwo/b three betc/b/a');
   xpath  
 -
diff --git a/src/test/regress/expected/xml_1.out b/src/test/regress/expected/xml_1.out
index a34d1f4..c7bcf91 100644
--- a/src/test/regress/expected/xml_1.out
+++ b/src/test/regress/expected/xml_1.out
@@ -498,6 +498,12 @@ LINE 1: SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=ht...
 ^
 DETAIL:  This functionality requires the server to be built with libxml support.
 HINT:  You need to rebuild PostgreSQL using --with-libxml.
+SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+ERROR:  unsupported XML feature
+LINE 1: SELECT xpath('//loc:piece', 'local:data xmlns:local=http:/...
+^
+DETAIL:  This functionality requires the server to be built with libxml support.
+HINT:  You need to rebuild PostgreSQL using --with-libxml.
 SELECT xpath('//b', 'aone btwo/b three betc/b/a');
 ERROR:  unsupported XML feature
 LINE 1: SELECT xpath('//b', 'aone btwo/b three betc/b/a'...
diff --git a/src/test/regress/sql/xml.sql b/src/test/regress/sql/xml.sql
index 90d4d67..241a5d6 100644
--- a/src/test/regress/sql/xml.sql
+++ b/src/test/regress/sql/xml.sql
@@ -174,6 +174,7 @@ SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
 SELECT xpath('', '!-- error --');
 SELECT xpath('//text()', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data');
 SELECT xpath('//loc:piece/@id', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
+SELECT xpath('//loc:piece', 'local:data xmlns:local=http://127.0.0.1;local:piece id=1number one/local:piecelocal:piece id=2 //local:data', ARRAY[ARRAY['loc', 'http://127.0.0.1']]);
 SELECT xpath('//b', 'aone btwo/b three betc/b/a');
 SELECT xpath('//text()', 'rootlt;/root');
 SELECT xpath('//@value', 'root value=lt;/');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why is it JSQuery?

2014-06-16 Thread David E. Wheeler
On Jun 15, 2014, at 1:58 PM, Josh Berkus j...@agliodbs.com wrote:

 In other words, what I'm saying is: I don't think there's an existing,
 poplular syntax we could reasonably use.

Okay, I’m good with that. Would be handy to document it in such a way as to 
kind of put it forward as a standard. :-)

D



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [HACKERS] How to change the pgsql source code and build it??

2014-06-16 Thread Shreesha
@Craig Ringer, I am afraid I didn't understand your solution.
The scenario I am having is a system which has only user as root. I have
all the permissions and privileges of the system as the user is root. But
pgsql doesn't allow initdb to be executed by root.
I think the solution you proposed gives root permissions for a non-root
user.
But I believe, I am looking forward for the exact opposite of it. In other
words, a possible work around for a root user to execute certain
executable(s) as an unprivileged user.



On Sun, Jun 15, 2014 at 9:49 PM, Craig Ringer cr...@2ndquadrant.com wrote:

 On 06/13/2014 07:08 AM, Shreesha wrote:
  I need to initialize the db as the root and start the database server

 Assuming there's no way around doing this (it's generally not a good
 idea), you can just use the simple program 'fakeroot'.

 This program changes the return values from system calls via LD_PRELOAD,
 so PostgreSQL thinks that the user it is running as isn't root. It's
 commonly used in testing and packaging systems.

 http://man.he.net/man1/fakeroot

 --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services




-- 
~Shreesha.


Re: [HACKERS] How to change the pgsql source code and build it??

2014-06-16 Thread Shreesha
@Craig Ringer, I am afraid I didn't understand your solution.
The scenario I am having is a system which has only user as root. I have
all the permissions and privileges of the system as the user is root. But
pgsql doesn't allow initdb to be executed by root.
I think the solution you proposed gives root permissions for a non-root
user.
But I believe, I am looking forward for the exact opposite of it. In other
words, a possible work around for a root user to execute certain
executable(s) as an unprivileged user.
Please clarify if I am wrong in my understanding.


On Sun, Jun 15, 2014 at 9:49 PM, Craig Ringer cr...@2ndquadrant.com wrote:

 On 06/13/2014 07:08 AM, Shreesha wrote:
  I need to initialize the db as the root and start the database server

 Assuming there's no way around doing this (it's generally not a good
 idea), you can just use the simple program 'fakeroot'.

 This program changes the return values from system calls via LD_PRELOAD,
 so PostgreSQL thinks that the user it is running as isn't root. It's
 commonly used in testing and packaging systems.

 http://man.he.net/man1/fakeroot

 --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services




-- 
~Shreesha.


Re: [HACKERS] How to implement the skip errors for copy from ?

2014-06-16 Thread Alvaro Herrera
Pavel Stehule wrote:
 2014-06-16 11:46 GMT+02:00 xbzhang xbzh...@kingbase.com.cn:
 
 
  I want to implement the skip errors for copy from,lik as :
  create table A (c int primary key);
  copy A from stdin;
  1
  1
  2
  \.
 
  copy will failed:
  ERROR: duplicate key violates primary key constraint CC_PKEY
  CONTEXT: COPY CC, line 2: 1
 
  I want skip the error, and continue to copy the reset of tuple. The result
  will be that there are two rows in table A: 1 and 2.
 
  how to implement that ? Anybody give me some suggestion?
 
 you should to reimplement a copy procedure to use a subtransactions. Using
 subtransaction for any row is too expensive, but you can do subtransaction
 per 1000 rows, and when some exception is raised, then store data per one
 row/one subtransaction.

See http://pgloader.io/ for a ready-made solution.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IMPORT FOREIGN SCHEMA statement

2014-06-16 Thread Stephen Frost
* Ronan Dunklau (ronan.dunk...@dalibo.com) wrote:
 The problem with checking if the type is the same is deciding where to stop. 
 For composite types, sure it should be easy. But what about built-in types ? 

Haven't we already decided to trust these based on server version
information?  Look at what quals we find acceptable to push down to the
remote side...  Certainly, composites built off of built-in types should
work.

 Or types provided by an extension / a native library ? These could 
 theorically 
 change from one release to another.

This is definitely an issue which we really need to figure out how to
address.  I don't have any great ideas off-hand, but it feels like we'll
need a catalog and appropriate SQL-fu to allow users and extensions to
add other types and functions which can be pushed down..  For example,
it'd be great if PostGIS queries could be pushed down to the remote
server- and have that set up by the postgis extension on installation
(perhaps via a call-back hook which gives the extension a handle to the
remote server where it could interrogate the server and then a way to
store the information about what is allowed to be pushed to the remote
side, and how?).

That's certainly a rather complicated bit to address and I don't think
we should hold this up for that- but let's definitely be thinking about
how to add these things later and try to avoid putting anything in place
which would cause problems for us later...

Will try to come back to the rest of your questions later.. :)

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Re: popen and pclose redefinitions causing many warning in Windows build

2014-06-16 Thread Bruce Momjian
On Fri, Jun  6, 2014 at 01:18:24PM -0400, Alvaro Herrera wrote:
 Bruce Momjian wrote:
  On Wed, May 28, 2014 at 12:29:28PM -0400, Tom Lane wrote:
   Bruce Momjian br...@momjian.us writes:
I think this is caused because the variable is not defined as SOCKET. 
The attached patch fixes this.  This should prevent the warning.
   
   Surely that's just going to move the errors somewhere else.  The call
   site still expects the argument to be int[].
  
  Ah, yes, you are right.  This is a similar problem I had with libpq
  where PQsocket() had to return an int.
  
  Attached is an updated patch which follows my previous coding of
  checking for PGINVALID_SOCKET, and if not equal, assigns the value to an
  integer handle.  I would also like to rename variable 's' to
  'listen_sock', but that is not in the patch, for clarity reasons.
  
  Should this be held for 9.5?  I think it is only warning removal.  On
  the other hand, portability is what we do during beta testing.
 
 I think this should go in 9.4, but as you say it's only warning removal
 so there is probably little point in patching further back (this code
 dates back to 9.3.)

Agreed and applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] avoiding tuple copying in btree index builds

2014-06-16 Thread Robert Haas
On Tue, Jun 3, 2014 at 4:38 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jun 1, 2014 at 3:26 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Tue, May 6, 2014 at 12:04 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, May 5, 2014 at 2:13 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2014-05-05 13:52:39 -0400, Robert Haas wrote:
  Today, I discovered that when building a btree index, the btree code
  uses index_form_tuple() to create an index tuple from the heap tuple,
  calls tuplesort_putindextuple() to copy that tuple into the sort's
  memory context, and then frees the original one it built.  This seemed
  inefficient, so I wrote a patch to eliminate the tuple copying.  It
  works by adding a function tuplesort_putindextuplevalues(), which
  builds the tuple in the sort's memory context and thus avoids the need
  for a separate copy.  I'm not sure if that's the best approach, but
  the optimization seems wortwhile.
 
  Hm. It looks like we could quite easily just get rid of
  tuplesort_putindextuple(). The hash usage doesn't look hard to convert.

 I glanced at that, but it wasn't obvious to me how to convert the hash
 usage.  If you have an idea, I'm all ears.

 I also think it's possible to have similar optimization for hash index
 incase it has to spool the tuple for sorting.

 In function hashbuildCallback(), when buildstate-spool is true, we
 can avoid to form index tuple. To check for nulls before calling

 _h_spool(), we can traverse the isnull array.

 Hmm, that might work.  Arguably it's less efficient, but on the other
 hand if it avoids forming the tuple sometimes it might be MORE
 efficient.  And anyway the difference might not be enough to matter.

On further review, this is definitely the way to go: it's a
straight-up win.  The isnull array is never more than one element in
length, so testing the single element is quite trivial.   The
attached, revised patch provides a modest but useful speedup for both
hash and btree index builds.

Anyone see any reason NOT to do this?  So far it looks like a
slam-dunk from here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 7abb7a4..c30c6f9 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -142,26 +142,23 @@ hashbuildCallback(Relation index,
 	HashBuildState *buildstate = (HashBuildState *) state;
 	IndexTuple	itup;
 
-	/* form an index tuple and point it at the heap tuple */
-	itup = _hash_form_tuple(index, values, isnull);
-	itup-t_tid = htup-t_self;
-
 	/* Hash indexes don't index nulls, see notes in hashinsert */
-	if (IndexTupleHasNulls(itup))
-	{
-		pfree(itup);
+	if (isnull[0])
 		return;
-	}
 
 	/* Either spool the tuple for sorting, or just put it into the index */
 	if (buildstate-spool)
-		_h_spool(itup, buildstate-spool);
+		_h_spool(buildstate-spool, htup-t_self, values, isnull);
 	else
+	{
+		/* form an index tuple and point it at the heap tuple */
+		itup = _hash_form_tuple(index, values, isnull);
+		itup-t_tid = htup-t_self;
 		_hash_doinsert(index, itup);
+		pfree(itup);
+	}
 
 	buildstate-indtuples += 1;
-
-	pfree(itup);
 }
 
 /*
@@ -184,10 +181,6 @@ hashinsert(PG_FUNCTION_ARGS)
 #endif
 	IndexTuple	itup;
 
-	/* generate an index tuple */
-	itup = _hash_form_tuple(rel, values, isnull);
-	itup-t_tid = *ht_ctid;
-
 	/*
 	 * If the single index key is null, we don't insert it into the index.
 	 * Hash tables support scans on '='. Relational algebra says that A = B
@@ -197,11 +190,12 @@ hashinsert(PG_FUNCTION_ARGS)
 	 * NOTNULL scans, but that's an artifact of the strategy map architecture
 	 * chosen in 1986, not of the way nulls are handled here.
 	 */
-	if (IndexTupleHasNulls(itup))
-	{
-		pfree(itup);
+	if (isnull[0])
 		PG_RETURN_BOOL(false);
-	}
+
+	/* generate an index tuple */
+	itup = _hash_form_tuple(rel, values, isnull);
+	itup-t_tid = *ht_ctid;
 
 	_hash_doinsert(rel, itup);
 
diff --git a/src/backend/access/hash/hashsort.c b/src/backend/access/hash/hashsort.c
index c0d6fec..3a70bc1 100644
--- a/src/backend/access/hash/hashsort.c
+++ b/src/backend/access/hash/hashsort.c
@@ -90,9 +90,10 @@ _h_spooldestroy(HSpool *hspool)
  * spool an index entry into the sort file.
  */
 void
-_h_spool(IndexTuple itup, HSpool *hspool)
+_h_spool(HSpool *hspool, ItemPointerData self, Datum *values, bool *isnull)
 {
-	tuplesort_putindextuple(hspool-sortstate, itup);
+	tuplesort_putindextuplevalues(hspool-sortstate, hspool-index,
+  self, values, isnull);
 }
 
 /*
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 36dc6c2..1ea4a2c 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -171,28 +171,21 @@ btbuildCallback(Relation index,
 void *state)
 {
 	BTBuildState *buildstate = (BTBuildState *) state;
-	IndexTuple	itup;
-
-	/* form an index tuple and point it at the heap tuple 

Re: [HACKERS] review: Non-recursive processing of AND/OR lists

2014-06-16 Thread Tom Lane
I wrote:
 Gurjeet Singh gurj...@singh.im writes:
 I tried to eliminate the 'pending' list, but I don't see a way around it.
 We need temporary storage somewhere to store the branches encountered on
 the right; in recursion case the call stack was serving that purpose.

 I still think we should fix this in the grammar, rather than introducing
 complicated logic to try to get rid of the recursion later.  For example,
 as attached.

I went looking for (and found) some additional obsoleted comments, and
convinced myself that ruleutils.c is okay as-is, and pushed this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] wrapping in extended mode doesn't work well with default pager

2014-06-16 Thread Robert Haas
On Wed, Jun 11, 2014 at 10:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Noah Misch n...@leadboat.com writes:
 Based on the commit message and procedural history, I thought commit 6513633
 was changing behavior solely for the combination of \pset expanded and
 \pset format wrapped.  Peter's and my test cases show that it also changed
 behavior for \pset expanded alone.  That's a bug, unless someone sees to
 argue that the new \pset expanded behavior is a desirable improvement in
 spite of its origin as an accident.  Altering an entrenched psql output 
 format
 is a big deal.

 TBH I'm wondering if we shouldn't just revert that patch (and the
 subsequent fix attempts).  It was not a major feature and I'm thinking
 we have better things to do right now than try to fix the multiple
 logic holes it evidently has.  The author's certainly welcome to try
 again with a more carefully thought-through patch for 9.5.

So, it seems like we need to do something about this one way or
another.  Who's working on that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in support for a memory consumption ulimit?

2014-06-16 Thread Robert Haas
On Sat, Jun 14, 2014 at 10:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 After giving somebody advice, for the Nth time, to install a
 memory-consumption ulimit instead of leaving his database to the tender
 mercies of the Linux OOM killer, it occurred to me to wonder why we don't
 provide a built-in feature for that, comparable to the ulimit -c max
 option that already exists in pg_ctl.  A reasonably low-overhead way
 to do that would be to define it as something a backend process sets
 once at startup, if told to by a GUC.  The GUC could possibly be
 PGC_BACKEND level though I'm not sure if we want unprivileged users
 messing with it.

 Thoughts?

What happens if the limit is exceeded?  ERROR?  FATAL?  PANIC?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Audit of logout

2014-06-16 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Fujii Masao masao.fu...@gmail.com writes:
  That's harmful for audit purpose. I think that we should make
  log_disconnections PGC_SUSET rather than PGC_BACKEND in order
  to forbid non-superusers from changing its setting. Attached
  patch does this.

I tend to agree with this- those are things which regular users really
shouldn't be able to modify.  Policy enforcement can be done when there
isn't system enforcement, but you really don't want to fall back to
policy enforcement when you don't need to.

 TBH, this complaint seems entirely artificial.  If somebody needs to audit
 disconnections, and they see a connection record with no disconnection
 record but the session's no longer there, they certainly know that a
 disconnection occurred.  And if there wasn't a server crash to explain it,
 they go slap the wrist of whoever violated corporate policy by turning off
 log_disconnections.  Why do we need system-level enforcement of this?

Going through every log file, and writing something to address log file
changes, to hunt for those cases is no small amount of effort which
you're asking every administrator with an audit requirement to write
code to do to provide something which we make it appear as if we're
doing for them.  It's certainly a violation of POLA that users can
decide on their own that their disconnections don't get logged.

 Moreover, your proposed fix breaks the entire point of the PGC_BACKEND
 setting, which was to try to prevent disconnections from being logged
 or not logged when the corresponding connection was not logged or logged
 respectively.  If an auditor wants the system to enforce that there are
 matched pairs of those records, he's not exactly going to be satisfied by
 being told that only superusers can cause them to not match.

This is only accurate when a superuser exists in the system and even so,
superuser access can be much more easily reviewed as it's going to be a
lot less traffic and there may be other auditing mechanisms in place for
that access.

 I wonder whether we should just get rid of log_disconnections as a
 separate variable, instead logging disconnections when log_connections
 is set.
 
 Another answer is to make both variables PGC_SIGHUP, on the grounds
 that it doesn't make much sense for them not to be applied system-wide;
 except that I think there was some idea that logging might be enabled
 per-user or per-database using ALTER ROLE/DATABASE.

Both of these options look pretty reasonable to me as a way to improve
the current situation.  I'm not really sure that I see the use-case for
only logging connections/disconnections on a per-user or per-database
basis; in my experience it's not a lot of traffic to log it all and I
don't recall ever seeing anyone set those anything other than
system-wide.

I like the idea of flexibility in what's logged, I just don't see this
specific case as really needing it. 

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] wrapping in extended mode doesn't work well with default pager

2014-06-16 Thread Greg Stark
On Mon, Jun 16, 2014 at 9:05 PM, Robert Haas robertmh...@gmail.com wrote:
 So, it seems like we need to do something about this one way or
 another.  Who's working on that?

So I'm fine finishing what I started. I've just been a bit busy this past week.

My inclination is to try to push forward and commit this patch,
document the changes and make sure we check for any consequences of
them.

The alternate plan is to revert it for 9.4 and commit the changes to
9.5 and that gives us more time to be sure we're ok with them.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction (WIP)

2014-06-16 Thread Alvaro Herrera
What's the status of this patch?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] delta relations in AFTER triggers

2014-06-16 Thread Alvaro Herrera
Kevin Grittner wrote:
 Attached is a WIP patch for implementing the capture of delta
 relations for a DML statement, in the form of two tuplestores --
 one for the old versions and one for the new versions.  In the
 short term it is intended to make these relations available in
 trigger functions, although the patch so far doesn't touch any PLs
 -- it just takes things as far as providing the relations as
 tuplestores in the TriggerData structure when appropriate, for the
 PLs to pick up from there.  It seemed best to get agreement on the
 overall approach before digging into all the PLs.  This is
 implemented only for INSERT, UPDATE, and DELETE since it wasn't
 totally clear what the use cases and proper behavior was for other
 triggers.  Opinions on whether we should try to provide deltas for
 other cases, and if so what the semantics are, are welcome.

TRUNCATE triggers shouldn't have delta relations, that's for sure, or
it'd completely negate the point of TRUNCATE triggers.  I don't think we
have any other event, do we?  People who get delta relations for
deleting all rows should be using DELETE, I think.

I am not sure about providing delta relations in the FOR EACH ROW case.
For what cases are them useful?  In all cases I think NEW and OLD are
sufficient.  I didn't read the standard, but if it's saying that in FOR
EACH ROW the new/deleted/changed row should be accessible by way of a
delta relation, then perhaps we should look into making NEW and OLD be
accesible via that relation rather than adding delta relations.  It
might be more code, though.

Now, if you only have delta relations in FOR EACH STATEMENT, then it
seems to me you can optimize obtaining them only when such triggers
are defined; this lets you do away with the reloption entirely, doesn't
it?

I noticed that GetCurrentFDWTuplestore() changed its identity without
getting its comment updated.  The new name seems awfully generic, and I
don't think it really describes what the function does.  I think it
needs more renaminguu

 (1)  My first impulse was to capture this delta data in the form of
 tuplestores of just TIDs, and fetching the tuples themselves from
 the heap on reference.  In earlier discussions others have argued
 for using tuplestores of the actual rows themselves.

Can you please supply pointers to such discussion?  I don't see any
point in not just storing TIDs, but perhaps I'm missing something.

 (2)  Do we want to just pick names for these in the PLs rather than
 using the standard syntax?  Implementing the standard syntax seemed
 to require three new (unreserved) keywords, changes to the catalogs
 to store the chosen relations names, and some way to tie the
 specified relation names in to the various PLs.

I think the only one for which we have a compulsion to follow someone
in this area would be PL/pgSQL, which probably needs to follow PL/SQL's
lead if there is one.  Other than that I don't think we need to do
anything standard.  We don't (yet) have PL/PSM which would need to have
the standard-mandated syntax.

 The way I have
 gone here just adds two new fields to the TriggerData structure and
 leaves it to each PL how to deal with that.  Failure to do anything
 in a PL just leaves it at the status quo with no real harm done --
 it just won't have the new delta relations available.

It seems to me that plpgsql support is mandatory, but other PLs can add
support as interested parties weigh in with patches.  As with event
triggers, I don't feel the author of the main feature is responsible for
patching all PLs.


 +varlistentry
 + termliteralgenerate_deltas/literal (typeboolean/type)/term
 + listitem
 +  para
 +   Declare that a table generates delta relations when modified.  This
 +   allows literalAFTER/ triggers to reference the set of rows 
 modified
 +   by a statement. See
 +   xref linkend=sql-createtrigger for details.
 +  /para
 + /listitem
 +/varlistentry

Are you intentionally omitting the corresponding sql-createtrigger patch?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in support for a memory consumption ulimit?

2014-06-16 Thread Greg Stark
On Mon, Jun 16, 2014 at 9:08 PM, Robert Haas robertmh...@gmail.com wrote:
 What happens if the limit is exceeded?  ERROR?  FATAL?  PANIC?

Well presumably it just makes malloc return NULL which causes an
ERROR. One advantage to setting it via a GUC is that it might be
possible to, for example, raise it automatically in critical sections
or during error unwinding.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in support for a memory consumption ulimit?

2014-06-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Sat, Jun 14, 2014 at 10:37 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 After giving somebody advice, for the Nth time, to install a
 memory-consumption ulimit instead of leaving his database to the tender
 mercies of the Linux OOM killer, it occurred to me to wonder why we don't
 provide a built-in feature for that, comparable to the ulimit -c max
 option that already exists in pg_ctl.  A reasonably low-overhead way
 to do that would be to define it as something a backend process sets
 once at startup, if told to by a GUC.  The GUC could possibly be
 PGC_BACKEND level though I'm not sure if we want unprivileged users
 messing with it.

 What happens if the limit is exceeded?  ERROR?  FATAL?  PANIC?

We'd presumably get NULL returns from malloc, which'd be reported
as elog(ERROR, out of memory).  It's not anything new for the
code to handle.

One issue if we allow this to be set on something other than PGC_BACKEND
timing is that somebody might try to reduce the setting to less than his
session is currently using.  I'm not sure what setrlimit would do in
such cases --- possibly fail, or maybe just set a limit that would cause
all subsequent malloc's to fail.  I doubt it'd actually take away existing
memory though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-06-16 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  ALTER TABLESPACE MOVE is a glorified ALTER TABLE.  If ALTER TABLESPACE
  MOVE returned ALTER TABLE as a tag, I think it'd work well too; but not
  ALTER TABLESPACE.  Individually, since the implementation works by
  calling AlterTableInternal(), it already works.
 
  Now if you state that the current design in event_triggers that works by
  slicing CommandTag and comparing the pieces is broken, I don't disagree
  and I think I have now (in the patch posted in a nearby thread) some
  more infrastructure to do it differently.  But even if we do that, I
  think we're going to need a way to differentiate ALTER TABLESPACE MOVE
  from other forms of ALTER TABLESPACE.  I haven't given this much
  thought, though.
 
 Yeah, I'd not paid much attention to it either.  Now that I look at it,
 ALTER TABLESPACE MOVE seems like a pretty unfortunate choice of naming
 all around, because (unless I'm misunderstanding) it doesn't actually
 alter any property of the tablespace itself.  It might be a bit late
 to propose this, but I wonder if some syntax along the lines of

I'm not against changing it- doing operations on a whole tablespace felt
like it would make sense under 'ALTER TABLESPACE' to me (hence the
implementation) but you're right, it's not actually changing properties
of the tablespaces themselves.

MOVE ALL [ TABLES | INDEXES | ... ] IN TABLESPACE foo TO bar

I'm not a huge fan of new top-level constructs and re-using MOVE feels
completely wrong to me as that's used for cursors..

 wouldn't be less confusing.  Not sure what we'd use as command tag
 for it though (not MOVE, since that's taken).

I would have thought something under ALTER TABLE would make more sense,
if we're going to change it, eg:

ALTER TABLE ALL [ TABLES | INDEXES | ... ] IN TABLESPACE SET TABLESPACE ...

or perhaps something like

ALTER TABLES IN TABLESPACE ...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] wrapping in extended mode doesn't work well with default pager

2014-06-16 Thread Jeff Janes
On Wed, Jun 11, 2014 at 12:59 PM, Greg Stark st...@mit.edu wrote:

 I think this whole exercise has mostly just convinced me we should
 implement an HTTP interface and reimplement psql as a browser app.

I certainly hope not.  I've seen lots of browser apps that were nice
enough to use for casual use by a casual user.  I've never seen one
that was an effective power tool for power users, the way psql is.
Now maybe they are out there and I just don't know about them, but I
have my doubts.

As an additional tool, to each his own.  But a browser-based
replacement for psql, -1 from me.

And as far browser-based things apply to this patch, I must say I've
tried micromanaging the way large amounts of data wrap in a HTML table
when I found the default to be inadequate, and I have not found that
to be noticeably easy, either.

The original version of this patch was only a few lines long and did
one very simple and useful thing: avoiding the printing of whole
screens full of hyphens when in 'expanded mode'.  If we end up
reverting the other parts of this patch, hopefully we don't lose that
part.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-16 Thread Josh Berkus
On 04/02/2014 01:16 PM, Andres Freund wrote:
 On 2014-04-02 21:08:47 +0100, Greg Stark wrote:
 Normally I'm not for adding gucs that just gate new features. But I think a
 simple guc to turn this on or off would be fine and alleviate any concerns.
 I think users would appreciate it quite a lot
 
 I don't have strong feelings about the feature, but introducing a guc
 for it feels entirely ridiculous to me. This is a minor detail in an
 error message, not more.
 
 It would even have a positive effect of helping raise awareness of the
 feature. I often scan the list of config options to get an idea of new
 features when I'm installing new software or upgrading.
 
 Really? Should we now add GUCs for every feature then?

-1 for having a GUC for this.

+1 on the feature.

Review with functional test coming up.

Question: How should we handle the issues with East Asian languages
(i.e. Japanese, Chinese) and this Hint?  Should we just avoid hinting
for a selected list of languages which don't work well with levenshtein?
 If so, how do we get that list?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-16 Thread Peter Geoghegan
On Mon, Jun 16, 2014 at 4:04 PM, Josh Berkus j...@agliodbs.com wrote:
 Question: How should we handle the issues with East Asian languages
 (i.e. Japanese, Chinese) and this Hint?  Should we just avoid hinting
 for a selected list of languages which don't work well with levenshtein?
  If so, how do we get that list?

I think that how useful Levenshtein distance is for users based in
east Asia generally, and how useful this patch is to those users are
two distinct questions. I have no idea how common it is for Japanese
users to just use Roman characters as table and attribute names. Since
they're very probably already writing application code that uses Roman
characters (except in the comments, user strings and so on), it might
make sense to do the same in the database. I would welcome further
input on that question. I don't know what the trends are in the real
world.

Also note that the patch scans the range table parse state to pick the
most probable candidate among all Vars/columns that already appear
there. The query would raise an error at an earlier point if a
non-existent relation was referenced, for example. We're only choosing
from a minimal list of possibilities, and pick one that is very
probably what was intended. Even if Levenshtein distance works badly
with Kanji (which is not obviously the case, at least to me), it might
not matter here.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] avoiding tuple copying in btree index builds

2014-06-16 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On further review, this is definitely the way to go: it's a
 straight-up win.  The isnull array is never more than one element in
 length, so testing the single element is quite trivial.   The
 attached, revised patch provides a modest but useful speedup for both
 hash and btree index builds.

 Anyone see any reason NOT to do this?  So far it looks like a
 slam-dunk from here.

If index_form_tuple leaks any memory in the sort context, this would be
not so much a slam-dunk win as a complete disaster.  I'm not sure that
no-leak is a safe assumption, especially in cases where we do toast
compression of the index datums.  (In fact, I'm pretty sure that the
reason it was coded like this originally was exactly to avoid that
assumption.)

Assuming that you inspect the code and determine it's safe, the patch
had better decorate index_form_tuple with dire warnings about not leaking
memory; because even if it's safe today, somebody might break it tomorrow.

On a micro-optimization level, it might be worth passing the TID as
ItemPointer not ItemPointerData (ie, pass a pointer until we get to
the point of actually inserting the TID into the index tuple).
I'm not sure that copying odd-size structs should be assumed to be
efficient.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in support for a memory consumption ulimit?

2014-06-16 Thread zhouqq.postg...@gmail.com
I like this feature but am wondering how to use it. If uses one value across 
all backends, we may have to set it conservatively to avoid OOM killer. But 
this does not promote resource sharing. If we set it per backend, what's the 
suggested value? One may is recommending user sorts his queries to small-q 
group and big-q group, where big-q connection sets a higher ulimit w.r.t 
work_mem. But this also has a limit mileage. 

An ideal way is PGC_SIGHUP, which implies all server process added up shall 
respect this setting and it is adjustable. Not sure how to implement it, as 
setrlimit() seems not supporting process group (and what about windows?). Even 
if it does, a small issue is that this might increase the chance we hit OOM at 
some inconvenient places. For example, here:

/* Special case for startup: use good ol' malloc */ 
node = (MemoryContext) malloc(needed); 
Assert(node != NULL); 

I wonder how far we want to go along the line. Consider this case: we have some 
concurrent big-q and med-q, the system may comfortably allowing 1 big-q running 
with 2 or 3 med-qs to zack the left-over memory. If with query throttling, we 
hopefully make all queries run successfully without middle-fail-surprises, and 
ulimit guards the bottomline if anything goes wrong. This may lead to a 
discussion of more complete workload management support.

Regards,
Qingqing

From: Tom Lane
Date: 2014-06-14 22:37
To: pgsql-hackers
Subject: [HACKERS] Built-in support for a memory consumption ulimit?
After giving somebody advice, for the Nth time, to install a
memory-consumption ulimit instead of leaving his database to the tender
mercies of the Linux OOM killer, it occurred to me to wonder why we don't
provide a built-in feature for that, comparable to the ulimit -c max
option that already exists in pg_ctl.  A reasonably low-overhead way
to do that would be to define it as something a backend process sets
once at startup, if told to by a GUC.  The GUC could possibly be
PGC_BACKEND level though I'm not sure if we want unprivileged users
messing with it.
 
Thoughts?
 
regards, tom lane
 
 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PG Manual: Clarifying the repeatable read isolation example

2014-06-16 Thread Bruce Momjian
On Sun, Jun  8, 2014 at 09:33:04AM +1200, Gavin Flower wrote:
 I know that I first look at the docs  seldom look at the Wiki - in
 fact it was only recently that I became aware of the Wiki, and it is
 still not the first thing I think of when I want to know something,
 and I often forget it exists.  I suspect many people are like me in
 this!
 
 Also the docs have a more authoritative air, and probably
 automatically assumed to be more up-to-date and relevant to the
 version of Postgres used.
 
 So I suggest that the docs should have an appropriate coverage of
 such topics, possibly mostly in an appendix with brief references in
 affected parts of the main docs) if it does not quite fit into the
 rest of the documentation (affects many different features, so no
 one place in the main docs is appropriate - or too detailed, or too
 much).  Also links to the Wiki, and to the more academic papers,
 could be provided for the really keen.

You can link to the wiki from our docs.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-16 Thread Ian Barwick
On 14/06/17 8:31, Peter Geoghegan wrote:
 On Mon, Jun 16, 2014 at 4:04 PM, Josh Berkus j...@agliodbs.com wrote:
 Question: How should we handle the issues with East Asian languages
 (i.e. Japanese, Chinese) and this Hint?  Should we just avoid hinting
 for a selected list of languages which don't work well with levenshtein?
  If so, how do we get that list?
 
 I think that how useful Levenshtein distance is for users based in
 east Asia generally, and how useful this patch is to those users are
 two distinct questions. I have no idea how common it is for Japanese
 users to just use Roman characters as table and attribute names. Since
 they're very probably already writing application code that uses Roman
 characters (except in the comments, user strings and so on), it might
 make sense to do the same in the database. I would welcome further
 input on that question. I don't know what the trends are in the real
 world.

From what I've seen in the wild in Japan, Roman/ASCII characters are
widely used for object/attribute names, as generally it's much less
hassle than switching between input methods, dealing with different
encodings etc. The only place where I've seen Japanese characters widely
used is in tutorials, examples etc. However that's only my personal
observation for one particular non-Roman language.


Regards

Ian Barwick

-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] /proc/self/oom_adj is deprecated in newer Linux kernels

2014-06-16 Thread Tom Lane
Gurjeet Singh gurj...@singh.im writes:
 Please find attached the patch. It includes the doc changes as well.

What exactly is the point of the static state you added here?  There
is no situation where that could possibly be useful, because this
code is executed at most once per process, and not at all in the
postmaster.  AFAICS, that's just complication (and permanent waste
of a kilobyte of static data space) for no return.

Also, this seems to try to write the file whether or not the environment
variable was set, which wasn't the plan.

I also don't really see the point of parsing the value variable into an
int.  Why not just spit it out as the original string?  It's not ours to
legislate whether the kernel will take a value that's not an integer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-16 Thread Michael Paquier
On Tue, Jun 17, 2014 at 9:30 AM, Ian Barwick i...@2ndquadrant.com wrote:
 From what I've seen in the wild in Japan, Roman/ASCII characters are
 widely used for object/attribute names, as generally it's much less
 hassle than switching between input methods, dealing with different
 encodings etc. The only place where I've seen Japanese characters widely
 used is in tutorials, examples etc. However that's only my personal
 observation for one particular non-Roman language.
And I agree to this remark, that's a PITA to manage database object
names with Japanese characters directly. I have ever seen some
applications using such ways to define objects though in the past, not
*that* many I concur..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-16 Thread Tom Lane
Michael Paquier michael.paqu...@gmail.com writes:
 On Tue, Jun 17, 2014 at 9:30 AM, Ian Barwick i...@2ndquadrant.com wrote:
 From what I've seen in the wild in Japan, Roman/ASCII characters are
 widely used for object/attribute names, as generally it's much less
 hassle than switching between input methods, dealing with different
 encodings etc. The only place where I've seen Japanese characters widely
 used is in tutorials, examples etc. However that's only my personal
 observation for one particular non-Roman language.

 And I agree to this remark, that's a PITA to manage database object
 names with Japanese characters directly. I have ever seen some
 applications using such ways to define objects though in the past, not
 *that* many I concur..

What exactly is the rationale for thinking that Levenshtein distance is
useless in non-Roman alphabets?  AFAIK it just counts insertions and
deletions of characters, which seems like a concept rather independent
of what those characters are.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] rm_desc signature

2014-06-16 Thread Jeff Janes
On Friday, June 13, 2014, Heikki Linnakangas hlinnakan...@vmware.com
wrote:

 As part of the WAL-format changing patch I've been working on, I changed
 the signature of the rm_desc function from:

 void (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec);
 void (*rm_desc) (StringInfo buf, XLogRecord *record);

 The WAL-format patch needed that because it added more functions/macros
 for working with XLogRecords, also used by rm_desc routines, but it seems
 like a sensible change anyway. IMHO it was always a bit strange that
 rm_desc was passed the info field and record payload separately.

 So I propose to do that change as a separate commit. Per attached. This
 has no functional changes, it's just refactoring.

 Any objections?


This commit, or a related one, changed the default (i.e. commented out)
nature of:

#define WAL_DEBUG

Cheers,

Jeff


[HACKERS] btreecheck extension

2014-06-16 Thread Peter Geoghegan
As discussed at the developer meeting at pgCon, I think that there is
a lot to be said for a tool that checks nbtree index invariants on
live systems.

Attached prototype patch adds contrib extension, btreecheck. This
extension provides SQL-callable functions for checking these
conditions on nbtree indexes on live systems. This code generally
works by using the same insertion ScanKey infrastructure that the
B-Tree code itself uses in servicing all kinds of index scans.

There are two major goals for this patch. In order of their importance
in my mind:

1. Improve testing around the B-Tree code. Give 9.4 beta testers the
tools needed to detect subtle errors in a semi-automated way. I'm
quite encouraged by recent efforts along these lines by Jeff Janes,
Heikki, and others. Beyond 9.4, I think that stress testing is likely
to become indispensable as more complex performance optimizations are
pursued. Personally, I see some good opportunities to improve the
performance of the B-Tree code, and if those ideas are ever to be put
into action, there is clearly a big need for automated stress-testing.
I see this tool as a start. I'm not just talking about my recent work
on sort support for text, and the idea of generalizing that; I think
that there are many techniques that we could pursue for decreasing the
I/O and CPU overhead of using B-Tree indexes that are just too risky
to pursue today because of the subtle interactions of many moving
parts. I'd like to lower the risk.

2. Allow users to sanity check indexes in production. This is not my
immediate goal here, though; I will not add this patch to the next
CommitFest. That's something to think about in more detail a bit
later. Clearly there is a big demand for this kind of thing, though.

Since I believe the B-Tree code changes of 9.4 are generally thought
to be the most plausible source of serious bugs in 9.4 purely because
of their subtlety and fundamental importance, it seemed valuable to
get this prototype into the hands of beta testers as soon as possible.
For that reason, I have not given some aspects of the design as much
thought as I would have preferred. I haven't really come up with a
plausible plan of attack for finding bugs in the B-Tree code. Rather,
I've more or less just considered what invariant conditions exist that
can be correctly verified without too much effort, and then
implemented checks for each. I've also stuck to a design that only has
the tool share lock a single B-Tree buffer at a time, and copy the
contents into local memory before operating on that (relation level
heavyweight locks are used in various different ways too), in a
similar manner to contrib/pageinspect. Copying the entire page into
local memory is probably the right thing to do even from a performance
perspective, since in each case the entire page is inspected, which
takes much longer than, say, a binary search. Even though it's
supposed to be possible to run this on a live system without causing
too much disruption, subtle buffer locking protocols were avoided
(although there is a rationale for why things are okay in the module,
that considers current buffer locking protocols particular to the
B-Tree code). This is a principle that I think I can stick to, and
would certainly prefer to stick to for obvious reasons.

The checks performed (the entire set of invariant conditions checked
by all SQL-callable functions in the patch) are currently:

* That data items on each B-Tree page are in logical order.

* That non-rightmost pages (which are the only pages that have a high
key) have only data items that respect that high key as an upper
bound.

* That a down-link data item in a parent page points to a child page
where the parent down-link data item is respected as a lower bound.

* That a down-link data item in a parent page points to a child pages
whose high key (if any) is equal to the next data item on the parent
page (if any), while taking into account incomplete page splits and
right-most-ness.

* That a down-link data item in a parent page points to a child page
whose right link (if any) points to the same block/page as pointed to
by the next data item on the parent page (if any). In other words, the
children and their parent are in agreement about the children being
siblings, while taking into account incomplete page splits and
right-most-ness. In other other words, the second phase of a page
split where a downlink is inserted into the parent was sane (so this
is similar to the previous invariant check listed).

The two prior checks may not be sufficient, though, if we want to
operate with weak, non-disruptive locks -- what about cousins and
grandparents, and even great-grandparents and second cousins? It's a
good start, though.

* That on each level, starting at the true root level and working down
the the leaf level (level 0), there is mutual agreement between
siblings that they are siblings (that is, their right and left links
comport).

I won't go into locking 

Re: [HACKERS] How to change the pgsql source code and build it??

2014-06-16 Thread Craig Ringer
On 06/17/2014 02:02 AM, Shreesha wrote:
 But I believe, I am looking forward for the exact opposite of it. In
 other words, a possible work around for a root user to execute certain
 executable(s) as an unprivileged user. 
 

So you want the su or sudo commands?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-16 Thread Ian Barwick
On 14/06/17 9:53, Tom Lane wrote:
 Michael Paquier michael.paqu...@gmail.com writes:
 On Tue, Jun 17, 2014 at 9:30 AM, Ian Barwick i...@2ndquadrant.com wrote:
 From what I've seen in the wild in Japan, Roman/ASCII characters are
 widely used for object/attribute names, as generally it's much less
 hassle than switching between input methods, dealing with different
 encodings etc. The only place where I've seen Japanese characters widely
 used is in tutorials, examples etc. However that's only my personal
 observation for one particular non-Roman language.
 
 And I agree to this remark, that's a PITA to manage database object
 names with Japanese characters directly. I have ever seen some
 applications using such ways to define objects though in the past, not
 *that* many I concur..
 
 What exactly is the rationale for thinking that Levenshtein distance is
 useless in non-Roman alphabets?  AFAIK it just counts insertions and
 deletions of characters, which seems like a concept rather independent
 of what those characters are.

With Japanese (which doesn't have an alphabet, but two syllabaries and
a bunch of logographic characters), Levenshtein distance is pretty useless
for examining similarities with words which can be written in either
syllabary (Michael's ramen example earlier in the thread); and when
catching typos caused by erroneous conversion from phonetic input to
characters - e.g. intending to input 成長 (seichou, growth) but
accidentally selecting 清聴 (seichou, courteous attention).

Howver in this particular use case, as long as it doesn't produce false
positives (I haven't looked at the patch) I don't think it would cause
any problems (of the kind which would require actively excluding certain
languages/character sets), it just wouldn't be quite as useful.


Regards

Ian Barwick

-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in support for a memory consumption ulimit?

2014-06-16 Thread Noah Misch
On Sat, Jun 14, 2014 at 10:37:36AM -0400, Tom Lane wrote:
 After giving somebody advice, for the Nth time, to install a
 memory-consumption ulimit instead of leaving his database to the tender
 mercies of the Linux OOM killer, it occurred to me to wonder why we don't
 provide a built-in feature for that, comparable to the ulimit -c max
 option that already exists in pg_ctl.

In principle, I would like to have such a feature.  The only limit I've found
reliable on the occasions I wanted this was RLIMIT_AS; RLIMIT_DATA has been
ineffective on Linux+GNU libc.  RLIMIT_AS has its own problems, of course:
address space usage is only tenuously connected to the definition of memory
consumption folks actually want.  Worse, one can often construct a query to
crash an RLIMIT_AS-affected backend.  Make the query use heap space until just
shy of the limit, then burn stack space until RLIMIT_AS halts stack growth.

I would welcome a feature for configuring each RLIMIT_* or some selection
thereof.  Then it's up to the administrator to know the (possibly
platform-specific) benefits and risks of each limit.  I don't think a high
level limit memory consumption feature is within reach, though.

 A reasonably low-overhead way
 to do that would be to define it as something a backend process sets
 once at startup, if told to by a GUC.  The GUC could possibly be
 PGC_BACKEND level though I'm not sure if we want unprivileged users
 messing with it.

Letting unprivileged users raise the limit is somewhat like allowing them to
raise work_mem.  On the other hand, while one can craft queries to consume
arbitrary multiples of work_mem, the combination of RLIMIT_AS and CONNECTION
LIMIT should be effective to cap a user's memory consumption.  Overall,
PGC_SUSET sounds good to me, for the reasons Craig gave.

Thanks,
nm

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Audit of logout

2014-06-16 Thread Abhijit Menon-Sen
At 2014-06-13 10:29:24 -0400, t...@sss.pgh.pa.us wrote:

 I wonder whether we should just get rid of log_disconnections as a
 separate variable, instead logging disconnections when log_connections
 is set.

I like that idea.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-16 Thread Peter Geoghegan
On Mon, Jun 16, 2014 at 7:09 PM, Ian Barwick i...@2ndquadrant.com wrote:
 Howver in this particular use case, as long as it doesn't produce false
 positives (I haven't looked at the patch) I don't think it would cause
 any problems (of the kind which would require actively excluding certain
 languages/character sets), it just wouldn't be quite as useful.

I'm not sure what you mean by false positives. The patch just shows a
HINT, where before there was none. It's possible for any number of
reasons that it isn't the most useful possible suggestion, since
Levenshtein distance is used as opposed to any other scheme that might
be better sometimes. I think that the hint given is a generally useful
piece of information in the event of an ERRCODE_UNDEFINED_COLUMN
error. Obviously I think the patch is worthwhile, but fundamentally
the HINT given is just a guess, as with the existing HINTs.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-16 Thread Ian Barwick
On 14/06/17 11:57, Peter Geoghegan wrote:
 On Mon, Jun 16, 2014 at 7:09 PM, Ian Barwick i...@2ndquadrant.com wrote:
 Howver in this particular use case, as long as it doesn't produce false
 positives (I haven't looked at the patch) I don't think it would cause
 any problems (of the kind which would require actively excluding certain
 languages/character sets), it just wouldn't be quite as useful.
 
 I'm not sure what you mean by false positives. The patch just shows a
 HINT, where before there was none. It's possible for any number of
 reasons that it isn't the most useful possible suggestion, since
 Levenshtein distance is used as opposed to any other scheme that might
 be better sometimes. I think that the hint given is a generally useful
 piece of information in the event of an ERRCODE_UNDEFINED_COLUMN
 error. Obviously I think the patch is worthwhile, but fundamentally
 the HINT given is just a guess, as with the existing HINTs.

I mean, does it come up with a suggestion in every case, even if there is
no remotely similar column? E.g. would

  SELECT foo FROM some_table

bring up column bar as a suggestion if bar is the only column in
the table?

Anyway, is there an up-to-date version of the patch available? The one from
March doesn't seem to apply cleanly to HEAD.


Thanks

Ian Barwick


-- 
 Ian Barwick   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-16 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 On Mon, Jun 16, 2014 at 7:09 PM, Ian Barwick i...@2ndquadrant.com wrote:
 Howver in this particular use case, as long as it doesn't produce false
 positives (I haven't looked at the patch) I don't think it would cause
 any problems (of the kind which would require actively excluding certain
 languages/character sets), it just wouldn't be quite as useful.

 I'm not sure what you mean by false positives. The patch just shows a
 HINT, where before there was none. It's possible for any number of
 reasons that it isn't the most useful possible suggestion, since
 Levenshtein distance is used as opposed to any other scheme that might
 be better sometimes. I think that the hint given is a generally useful
 piece of information in the event of an ERRCODE_UNDEFINED_COLUMN
 error. Obviously I think the patch is worthwhile, but fundamentally
 the HINT given is just a guess, as with the existing HINTs.

Not having looked at the patch, but: I think the probability of
useless-noise HINTs could be substantially reduced if the code prints a
HINT only when there is a single available alternative that is clearly
better than the others in Levenshtein distance.  I'm not sure how much
better is clearly better, but I exclude zero from that.  I see that
the original description of the patch says that it will arbitrarily
choose one alternative when there are several with equal Levenshtein
distance, and I'd say that's a bad idea.

You could possibly answer this objection by making the HINT list *all*
the alternatives meeting the minimum Levenshtein distance.  But I think
that's probably overcomplicated and of uncertain value anyhow.  I'd rather
have a rule that we print only the choice that is at least K units better
than any other choice, where K remains to be determined exactly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-06-16 Thread Amit Kapila
On Mon, Jun 16, 2014 at 7:06 PM, Fujii Masao masao.fu...@gmail.com wrote:
  Developer options are mainly for debugging information or might help in
one
  of the situations, so I thought somebody might not want them to be part
of
  server configuration once they are set.  We already disallow parameters
like
  config_file, transaction_isolation, etc. which are disallowed to be set
in
  postgresql.conf.  Could you please explain me a bit in which
  situations/scenarios, do you think that allowing developer options via
Alter
  System can be helpful?

 I think that's helpful. I've heard that some users enable the developer
option
 trace_sort in postgresql.conf in order to collect the information
 about sorting.
 They might want to set trace_sort via ALTER SYSTEM, for example.

Okay, if you have usecase where people use such parameters in
postegresql.conf, then it makes sense.  I have removed that restriction
from patch.

 I suggest the following wording:

 This command may not be used to set
 xref linkend=guc-data-directory
 or any parameters that are not allowed in postgresql.conf.

I think we should not use *may* in this line, because in no
circumstance we will allow this command for the parameters
mentioned.  I have changed it as per your suggestion.

  + /*
  +  * Disallow parameter's that are excluded or disallowed in
  +  * postgresql.conf.
  +  */

 parameters, no apostrophe.
okay.
I have changed above comment as earlier comment has
information about developer option which we don't need now.

if ((record-context == PGC_INTERNAL) ||
  - (record-flags  GUC_DISALLOW_IN_FILE))
  - ereport(ERROR,
  -
(errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
  -  errmsg(parameter \%s\ cannot be
changed,
  - name)));
  + (record-flags  GUC_DISALLOW_IN_FILE) ||
  + (record-flags  GUC_DISALLOW_IN_AUTO_FILE) ||
  + (record-flags  GUC_NOT_IN_SAMPLE))
  +  ereport(ERROR,
  +
 (errcode(ERRCODE_CANT_CHANGE_RUNTIME_PARAM),
  +   errmsg(parameter \%s\ cannot be
changed,
  +  name)));

 I looked at the settings that are marked GUC_NOT_IN_SAMPLE but neither
 PGC_INTERNAL nor GUC_DISALLOW_IN_*FILE. I don't feel strongly about it,
 but I don't see any particularly good reason to exclude them here.

By above do you mean that the patch should allow GUC_NOT_IN_SAMPLE or
something else, if earlier then I have removed it as per comment from
Fujji-san.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


prohibit_data_dir_by_alter_system-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-16 Thread Peter Geoghegan
On Mon, Jun 16, 2014 at 8:38 PM, Ian Barwick i...@2ndquadrant.com wrote:
 I mean, does it come up with a suggestion in every case, even if there is
 no remotely similar column? E.g. would

   SELECT foo FROM some_table

 bring up column bar as a suggestion if bar is the only column in
 the table?

Yes, it would, but I think that's the correct behavior.

 Anyway, is there an up-to-date version of the patch available? The one from
 March doesn't seem to apply cleanly to HEAD.

Are you sure? I think it might just be that patch is confused about
the deleted file contrib/fuzzystrmatch/levenshtein.c.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Doing better at HINTing an appropriate column within errorMissingColumn()

2014-06-16 Thread Peter Geoghegan
On Mon, Jun 16, 2014 at 8:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Not having looked at the patch, but: I think the probability of
 useless-noise HINTs could be substantially reduced if the code prints a
 HINT only when there is a single available alternative that is clearly
 better than the others in Levenshtein distance.  I'm not sure how much
 better is clearly better, but I exclude zero from that.  I see that
 the original description of the patch says that it will arbitrarily
 choose one alternative when there are several with equal Levenshtein
 distance, and I'd say that's a bad idea.

I disagree. I happen to think that making some guess is better than no
guess at all here, given the fact that there aren't too many
possibilities to choose from. I think that it might be particularly
annoying to not show some suggestion in the event of a would-be
ambiguous column reference where the column name is itself wrong,
since both mistakes are common. For example, order_id was specified
instead of one of either o.orderid or ol.orderid, as in my
original examples. If some correct alias was specified, that would
make the new code prefer the appropriate Var, but it might not be, and
that should be okay in my view.

I'm not trying to remove the need for human judgement here. We've all
heard stories about people who did things like input Portland into
their GPS only to end up in Maine rather than Oregon, but I think in
general you can only go so far in worrying about those cases.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-06-16 Thread Abhijit Menon-Sen
At 2014-06-17 09:49:35 +0530, amit.kapil...@gmail.com wrote:

 By above do you mean that the patch should allow GUC_NOT_IN_SAMPLE or
 something else, if earlier then I have removed it as per comment from
 Fujji-san.

Yes, what you've done in v3 of the patch is what I meant. I've marked
this as ready for committer now.

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat directory and pg_stat_statements

2014-06-16 Thread Shigeru Hanada
Fujii-san,

I agree not to backpatch, but I noticed that the 9.3 document about
stats collector doesn't mention $PGDATA/pg_stat.

http://www.postgresql.org/docs/current/static/monitoring-stats.html

It just says:
 When the server shuts down, a permanent copy of the statistics data is stored 
 in the global subdirectory, so that statistics can be retained across server 
 restarts.

I'm not sure whether we should modify the 9.3 document at the moment,
but actually the description made me confused a bit.


2014-05-29 12:22 GMT+09:00 Fujii Masao masao.fu...@gmail.com:
 On Thu, May 29, 2014 at 4:55 AM, Tomas Vondra t...@fuzzy.cz wrote:
 On 28.5.2014 19:52, Fujii Masao wrote:
 On Thu, May 29, 2014 at 12:37 AM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, May 28, 2014 at 7:01 AM, Fujii Masao masao.fu...@gmail.com wrote:
 But pg_stat_statements file is saved under $PGDATA/global yet.
 Is this intentional or just oversight?


 I think it's an oversight.

 OK, patch attached.

 I'm afraid that it's not okay to change the file layout in $PGDATA at this 
 beta1
 stage because that change basically seems to need initdb. Otherwise 
 something
 like no such file or directory error can happen. But in this case what we 
 need
 to change is only the location of the pg_stat_statements permanent stats 
 file.
 So, without initdb, the server will not be able to find the
 pg_stat_statements stats
 file, but this is not so harmful. Only the problem is that the
 pg_stat_statements
 stats which were collected in past would disappear. OTOH, the server can 
 keep
 running successfully from then and no critical data will not
 disappear. Therefore
 I think we can commit this patch even at beta1. Thought?

 For HEAD, probably yes. But what about backpatching 9.3?

 I think No. So we should not backpatch this to 9.3.

 Regards,

 --
 Fujii Masao


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to implement the skip errors for copy from ?

2014-06-16 Thread xbzhang

Use subtransaction , the tuples that had inserted into heap  must be inserted 
again  when some exception is raised,it is too expensive.My solution is :1. 
delete the tuple that caused the error tuple;2. release all the resources when  
inserting  the tuple;3. continue insert next tupleIs it feasible?  Anybody give 
me some suggestion?




张晓博   研发二部
北京人大金仓信息技术股份有限公司
地址:北京市海淀区上地西路八号院上地科技大厦4号楼501
邮编:100085
电话:(010) 5885 1118 - 8450
手机:15311394463
邮箱:xbzh...@kingbase.com.cn
 From: Alvaro HerreraDate: 2014-06-17 02:37To: Pavel StehuleCC: xbzhang; 
pgsql-hackersSubject: Re: [HACKERS] How to implement the skip errors for copy 
from ?Pavel Stehule wrote:
 2014-06-16 11:46 GMT+02:00 xbzhang xbzh...@kingbase.com.cn:
 
 
  I want to implement the skip errors for copy from,lik as :
  create table A (c int primary key);
  copy A from stdin;
  1
  1
  2
  \.
 
  copy will failed:
  ERROR: duplicate key violates primary key constraint CC_PKEY
  CONTEXT: COPY CC, line 2: 1
 
  I want skip the error, and continue to copy the reset of tuple. The result
  will be that there are two rows in table A: 1 and 2.
 
  how to implement that ? Anybody give me some suggestion?
 
 you should to reimplement a copy procedure to use a subtransactions. Using
 subtransaction for any row is too expensive, but you can do subtransaction
 per 1000 rows, and when some exception is raised, then store data per one
 row/one subtransaction.
 
See http://pgloader.io/ for a ready-made solution.
 
-- 
Álvaro Herrera    http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
 
 
 
 
-
???
:AVG - www.avg.com
??:2013.0.3480 / ?:3955/7685 - :06/16/14
 



Re: [HACKERS] rm_desc signature

2014-06-16 Thread Heikki Linnakangas

On 06/17/2014 04:19 AM, Jeff Janes wrote:

This commit, or a related one, changed the default (i.e. commented out)
nature of:

#define WAL_DEBUG


Oops. Fixed, thanks!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers