Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY

2012-09-18 Thread Craig Ringer

On 09/13/2012 10:25 PM, Tom Lane wrote:

I think it would be a lot better if this were designed so that the
processor programs executed on client side.  Which would probably make
it not a COPY patch at all, but something in psql.


Either that, or allow the pre- and post- processors to be programs 
written in a (possibly trusted) PL.


I can't say I really see the point though, when it's easy to just filter 
the csv through a pipeline.


--
Craig Ringer



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Invalid optimization of VOLATILE function in WHERE clause?

2012-09-18 Thread Florian Schoppmann
Hi all,

In PostgreSQL 9.1 and 9.2 (possibly also in earlier versions), the query

--8--
WITH source AS (
SELECT i FROM generate_series(1,10) AS i
)
SELECT
i
FROM
source, (
SELECT
count(*) AS _n
FROM source
) AS _stats
WHERE
random()  5::DOUBLE PRECISION/_n;
--8--

translates into the following query plan:

--8--
 Nested Loop  (cost=35.00..65.03 rows=1000 width=4)
   CTE source
 -  Function Scan on generate_series i  (cost=0.00..10.00 rows=1000
 width=4)
   -  Aggregate  (cost=25.00..25.02 rows=1 width=0)
 Filter: (random()  (5::double precision / (count(*))::double
 precision))
 -  CTE Scan on source  (cost=0.00..20.00 rows=1000 width=0)
   -  CTE Scan on source  (cost=0.00..20.00 rows=1000 width=4)
--8--

In other words, the query either gives exactly 0 or 10 rows, and both
cases happen with probability 0.5. Naturally, I would have expected
instead that each row is sampled independently with probability 0.5.

Since random() is volatile, so is the whole where-expression. So I
wonder why the condition is pushed down to the lowest level, given that
this changes results. Is this behavior correct, i.e., specified
somewhere? Or is this a bug?

Florian



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY

2012-09-18 Thread Etsuro Fujita
 From: Craig Ringer [mailto:ring...@ringerc.id.au]

 On 09/13/2012 10:25 PM, Tom Lane wrote:
  I think it would be a lot better if this were designed so that the
  processor programs executed on client side.  Which would probably make
  it not a COPY patch at all, but something in psql.

Maybe my explanation was insufficient.  Let me add one thing to my earlier
explanation.  The submitted patch allows the psql \copy instruction to be
executed like:

$ echo '/bin/gunzip -c $1'  decompress.sh
$ chmod +x decompress.sh

postgres=# \copy foo FROM '/home/pgsql/foo.csv.gz' WITH (format 'csv',
preprocessor '/home/pgsql/decompress.sh')

In this example, command /home/pgsql/decompress.sh /home/pgsql/foo.csv.gz is
executed on client side, by using popen(), and command COPY foo FROM STDIN WITH
(format 'csv') is sent to backend.  I apologize for not providing you with
enough explanation.

 Either that, or allow the pre- and post- processors to be programs
 written in a (possibly trusted) PL.

I would like to add the hooks not only for the psql \copy instrucntion, but also
for the SQL COPY command, because I think the hooks for the SQL COPY command
would allow to realize variants of contrib/file_fdw such as compressed file FDW
and Hadoop HDFS FDW, etc., which I think would be useful especially for DWH
environments.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH]Tablesample Submission

2012-09-18 Thread Hitoshi Harada
On Tue, Aug 21, 2012 at 8:08 AM, Qi Huang huangq...@outlook.com wrote:
 Please add your patch here:

 https://commitfest.postgresql.org/action/commitfest_view/open

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 Hi, Robert
 I added it under Miscellaneous.
 https://commitfest.postgresql.org/action/patch_view?id=918



Patch does not apply cleanly against latest master.  outfuncs.c,
allpath.c and cost.h have rejected parts.  The make check failed in a
lot of cases up to 26 out of 133.  I didn't look into each issue but I
suggest rebasing on the latest master and making sure the regression
test passes.

Some of the patch don't follow our coding standard.  Please adjust
brace positions, for example.  For the header include list and
Makefile, place a new files in alphabetical order.

The patch doesn't include any documentation.  Consider add some doc
patch for such a big feature like this.

You should update kwlist.h for REPEATABLE but I'm not sure if
REPEATABLE should become a reserved keyword yet.

I don't see why you created T_TableSampleInfo.  TableSampleInfo looks
fine with a simple struct rather than a Node.

If we want to disable a cursor over a sampling table, we should check
it in the parser not the planner.  In the wiki page, one of the TODOs
says about cursor support, but how much difficult is it?  How does the
standard say?

s/skiped/skipped/

Don't we need to reset seed on ExecReScanSampleScan?  Should we add a
new executor node SampleScan?  It seems everything about random
sampling is happening under heapam.

It looks like BERNOULLI allocates heap tuple array beforehand, and
copy all the tuples into it.  This doesn't look acceptable when you
are dealing with a large number of rows in the table.

As wiki says, BERNOULLI relies on the statistics of the table, which
doesn't sound good to me.  Of course we could say this is our
restriction and say good-bye to users who hadn't run ANALYZE first,
but it is too hard for a normal users to use it.  We may need
quick-and-rough count(*) for this.

That is pretty much I have so far.  I haven't read all the code nor
the standard, so I might be wrong somewhere.

Thanks,
-- 
Hitoshi Harada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-09-18 Thread Fujii Masao
On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila amit.kap...@huawei.com wrote:
 To define the behavior correctly, according to me there are 2 options now:

 Approach-1 :
 Document that both(sender and receiver) the timeout parameters should be
 greater than wal_receiver_status_interval.
 If both are greater, then I think it might never timeout due to Idle.

In this approach, keepalive messages are sent each wal_receiver_status_interval?

 Approach-2 :
 Provide a variable wal_send_status_interval, such that if this is 0, then
 the current behavior would prevail and if its non-zero then KeepAlive
 message would be send maximum after that time.
 The modified code of WALSendLoop will be as follows:
snip
 Which way you think is better or you have any other idea to handle.

I think #2 is better because it's more intuitive to a user.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #7534: walreceiver takes long time to detect n/w breakdown

2012-09-18 Thread Amit Kapila
On Tuesday, September 18, 2012 6:03 PM Fujii Masao wrote:
On Mon, Sep 17, 2012 at 4:03 PM, Amit Kapila amit.kap...@huawei.com wrote:
 To define the behavior correctly, according to me there are 2 options
now:

 Approach-1 :
 Document that both(sender and receiver) the timeout parameters should be
 greater than wal_receiver_status_interval.
 If both are greater, then I think it might never timeout due to Idle.

 In this approach, keepalive messages are sent each
wal_receiver_status_interval?
  wal_receiver_status_interval or sleeptime whichever is smaller.

 Approach-2 :
 Provide a variable wal_send_status_interval, such that if this is 0, then
 the current behavior would prevail and if its non-zero then KeepAlive
 message would be send maximum after that time.
 The modified code of WALSendLoop will be as follows:
snip
 Which way you think is better or you have any other idea to handle.

 I think #2 is better because it's more intuitive to a user.

I shall update the Patch as per Approach-2 and upload the same.

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proof of concept: auto updatable views [Review of Patch]

2012-09-18 Thread Amit kapila
On 31 August 2012 07:59, Dean Rasheed dean(dot)a(dot)rasheed(at)gmail(dot)com 
wrote:
 On 30 August 2012 20:05, Robert Haas robertmhaas(at)gmail(dot)com wrote:
 On Sun, Aug 12, 2012 at 5:14 PM, Dean Rasheed 
 dean(dot)a(dot)rasheed(at)gmail(dot)com wrote:

 Here's an updated patch for the base feature (without support for
 security barrier views) with updated docs and regression tests.



Please find the review of the patch.

Basic stuff:
--
- Patch applies OK
- Compiles cleanly with no warnings
- Regression tests pass.
- Documentation changes are mostly fine.

What it does:


The non-select DML operations on views which are simple updatable (The 
condition checks can be found in test section), can rewrite the query and 
execute it even if the view don't have rules and instead of triggers.


Testing:

The following test is carried out on the patch.

1. create a view which can be automatically updated.
- no of view columns are not matching with actual base relation
- column order should be different with base relation order
- view column aliases as another column name
- view with a where condition
- column contains some constraints.
- ORDER BY
- FOR

2. create a view with all the features where automatically update is not 
possible.
- Distinct clauses
- Every TLE is not a column reference.
- TLE appears more than once.
- Refers to more than one base relation.
- Other than relation in FROM clause.
- GROUP BY or HAVING clauses.
- set operations (UNION, INTERSECT or EXCEPT).
- sub-queries in the WHERE clause.
- DISTINCT ON clauses.
- window functions.
- CTEs (WITH or WITH RECURSIVE).
- OFFSET or LIMIT clauses.
- system columns.
- security_barrier;
- refers to whole rows of a table.
- column permissions are not there for users.
- refers to a sequence instead of relation.
-

3. create a view which is having instead of triggers.
4. create a view which is having rules.
5. create a view on a base relation which is also a view of automatically 
updated.
6. create a view on a base relation which is also a view having instead of 
triggers.
7. create a view on a base relation which is also a view having rules.

Extra test cases that can be added to regression suite are as below:

1. where clause in view select statement
2. ORDER BY, FOR, FETCH.
3. Temp views, views on temp tables.
4. Target entry JOIN, VALUES, FUNCTION
5. Toast column
6. System view
7. Lateral and outer join
8. auto increment columns
9. Triggers on tables
10.View with default values
11.Choosing base relation based on schema.
12.SECURITY DEFINER function execution


 The updated regression test sql file with extra test is attached with this 
mail, please check and add it to the patch.

Code Review:


1. In test_auto_update_view function
if (var-varattno == 0)
return Views that refer to whole rows from the base relation 
are not updatable;
   I have a doubt that when the above scenario will cover? And the examples 
provided for whole row are working.

2. In test_auto_update_view function
if (base_rte-rtekind != RTE_RELATION)
 return Views that are not based on tables or views are not 
updatable;
   for view on sequences also the query is rewritten and giving error while 
executing.
   Is it possible to check for a particular relkind before rewriting query?


3. In function rewriteTargetView
if (tle-resjunk || tle-resno = 0)
continue;
   The above scenario is not possible as the junk is already removed in above 
condition and also
   the view which is refering to the system columns are not auto update views.

4. In function rewriteTargetView
if (view_tle == NULL)
elog(ERROR, View column %d not found, tle-resno);
   The parsetree targetlist is already validated with view targetlist during 
transformstmt.
   Giving an ERROR is fine here? Shouldn't it be Assert?

5. if any derived columns are present on the view, at least UPDATE operation 
can be allowed for columns other than derived columns.

6. name test_auto_update_view can be changed. The word test can be changed.

7. From function get_view_query(), error message : invalid _RETURN rule action 
specification might not make much sense to user
 who is inserting in a view.


Defects from test
---

1. With a old database and new binaries the following test code results in 
wrong way.

CREATE TABLE base_tbl (a int PRIMARY KEY, b text DEFAULT 'Unspecified');
INSERT INTO base_tbl VALUES (1, 'Row 1');
INSERT INTO base_tbl VALUES (2, 'Row 2');

CREATE VIEW rw_view1 AS SELECT * FROM base_tbl;

SELECT table_name, is_updatable, is_insertable_into
FROM information_schema.views where table_name = 'rw_view1';

This will show is_insertable_into as 'no'. However below SQL statement is 
success

INSERT 

Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY

2012-09-18 Thread Tom Lane
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 Maybe my explanation was insufficient.  Let me add one thing to my earlier
 explanation.  The submitted patch allows the psql \copy instruction to be
 executed like:

 $ echo '/bin/gunzip -c $1'  decompress.sh
 $ chmod +x decompress.sh

 postgres=# \copy foo FROM '/home/pgsql/foo.csv.gz' WITH (format 'csv',
 preprocessor '/home/pgsql/decompress.sh')

 In this example, command /home/pgsql/decompress.sh /home/pgsql/foo.csv.gz is
 executed on client side, by using popen(), and command COPY foo FROM STDIN 
 WITH
 (format 'csv') is sent to backend.  I apologize for not providing you with
 enough explanation.

Well, in that case, you've got not only an explanation problem but a
syntax problem, because that syntax is utterly misleading.  Anybody
looking at it would think that the format option is one of the options
being sent to the backend.  The code required to pull it out of there
has got to be grossly overcomplicated (and likely bugprone), too.

I think it would be better to present this as something like

\copy foo from '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz |' with format 
'csv'

which would cue any reasonably Unix-savvy person that what's happening
is a popen on the client side.  It'd probably be a whole lot less
complicated to implement, too.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Support for Array ELEMENT Foreign Keys

2012-09-18 Thread Marco Nenciarini
Hi,
please find attached the refreshed v1 patch.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it 



Array-ELEMENT-foreign-key-v1-refreshed.patch.bz2
Description: application/bzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Confusing EXPLAIN output in case of inherited tables

2012-09-18 Thread Tom Lane
I got interested in this problem again now that we have a user complaint
about it (bug #7553).

Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes:
 On Wed, Feb 1, 2012 at 11:01 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Feb 1, 2012 at 12:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 ... It seems
 reasonable to me to try to handle relation renames but draw the line
 at disambiguating column names.  But others might find that distinction
 artificial.

 I sure do.

 For me the relation names problem and column aliases problems are two
 independent problems.

I think they are independent problems, and I also think that people are
far less likely to trip over column-name problems in practice.  Columns
of a table are not independent objects and so people aren't so likely
to think they can just rename them freely.  Moreover, if you rename
columns that are used in views, you can get breakage of things like
USING or NATURAL joins, and that is something we *cannot* provide a
workaround for --- it's a failure inherent in the language definition.

As far as the relation-rename problem goes, I propose that what we
should do is have ruleutils.c invent nonconflicting fake aliases for
each RTE in the query tree.  This would allow getting rid of some of the
dubious heuristics in get_variable: it should just print the chosen
alias and be done.  (It still has to do something different for unnamed
joins, but we can leave that part alone I think.)

We can do this as follows:

1. If there's a user-assigned alias, use that.  (It's possible this is
not unique within the query, but that's okay because any actual
variable reference must be to the most closely nested such RTE.)

2. Otherwise, if the relation's current name doesn't conflict with
any previously-assigned alias, use that.

3. Otherwise, append something (underscore and some digits probably)
to the relation's current name to construct a string not matching any
previously-assigned alias.

This might result in printouts that are a bit uglier than the old way
in such cases, but anybody who's offended can select their own aliases.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] pg_upgrade from 9.1.3 to 9.2 failed

2012-09-18 Thread Bruce Momjian
On Mon, Sep 17, 2012 at 05:07:23PM -0400, Bruce Momjian wrote:
  # select * from pg_tables where tablename='sql_features';
  schemaname |  tablename   | tableowner | tablespace |
  hasindexes | hasrules | hastriggers
  +--++++--+-
  information_schema | sql_features | postgres   || f
  | f| f
  (1 row)
 
 OK, good to know.  This is the query pg_upgrade 9.2 uses to pull
 information from 9.1 and 9.2:
 
   SELECT c.oid, n.nspname, c.relname,  c.relfilenode, c.reltablespace, 
 t.spclocation 
   FROM pg_catalog.pg_class c JOIN pg_catalog.pg_namespace n ON 
 c.relnamespace = n.oid   
   LEFT OUTER JOIN pg_catalog.pg_tablespace t ON c.reltablespace = 
 t.oid 
   WHERE relkind IN ('r','t', 'i', 'S') AND
   ((n.nspname !~ '^pg_temp_' AND
 n.nspname !~ '^pg_toast_temp_' AND 
 n.nspname NOT IN ('pg_catalog', 'information_schema', 
 'binary_upgrade') AND
 c.oid = 16384
)   
OR 
(n.nspname = 'pg_catalog' AND
 relname IN
 ('pg_largeobject', 'pg_largeobject_loid_pn_index', 
 'pg_largeobject_metadata', 'pg_largeobject_metadata_oid_index') 
)
   )
   ORDER BY 1;
 
 Based on the fact that sql_features exists in the information_schema
 schema, I don't think 'sql_features' table is actually being processed
 by pg_upgrade, but I think its TOAST table, because it has a high oid,
 is being processed because it is in the pg_toast schema.  This is
 causing the mismatch between the old and new clusters.
 
 I am thinking this query needs to be split apart into a UNION where the
 second part handles TOAST tables and looks at the schema of the _owner_
 of the TOAST table.  Needs to be backpatched too.

OK, I am at a conference now so will not be able to write-up a patch
until perhaps next week.  You can drop the information schema in the old
database and pg_upgrade should run fine.  I will test your failure once
I create a patch.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY

2012-09-18 Thread Etsuro Fujita
 From: Tom Lane [mailto:t...@sss.pgh.pa.us]

 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
  Maybe my explanation was insufficient.  Let me add one thing to my earlier
  explanation.  The submitted patch allows the psql \copy instruction to be
  executed like:
 
  $ echo '/bin/gunzip -c $1'  decompress.sh
  $ chmod +x decompress.sh
 
  postgres=# \copy foo FROM '/home/pgsql/foo.csv.gz' WITH (format 'csv',
  preprocessor '/home/pgsql/decompress.sh')

 Well, in that case, you've got not only an explanation problem but a
 syntax problem, because that syntax is utterly misleading.  Anybody
 looking at it would think that the format option is one of the options
 being sent to the backend.  The code required to pull it out of there
 has got to be grossly overcomplicated (and likely bugprone), too.
 
 I think it would be better to present this as something like
 
 \copy foo from '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz |' with
 format 'csv'
 
 which would cue any reasonably Unix-savvy person that what's happening
 is a popen on the client side.  It'd probably be a whole lot less
 complicated to implement, too.

Great!

I have a question.  I think it would be also better to extend the syntax for the
SQL COPY command in the same way, ie,

COPY foo from '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz |' with format
'csv'

Is this okay?

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY

2012-09-18 Thread Tom Lane
Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 From: Tom Lane [mailto:t...@sss.pgh.pa.us]
 I think it would be better to present this as something like
 \copy foo from '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz |' with
 format 'csv'

 I have a question.  I think it would be also better to extend the syntax for 
 the
 SQL COPY command in the same way, ie,
 COPY foo from '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz |' with format
 'csv'

Yeah, sure --- that case is already superuser-only, so why not give it
the option of being a popen instead of just fopen.  My objection was
only that it sounded like you were providing *only* the ability to run
the external processors on the server side.  Providing popen on both
sides seems completely sensible.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: add (PRE|POST)PROCESSOR options to COPY

2012-09-18 Thread Tom Lane
I wrote:
 Etsuro Fujita fujita.ets...@lab.ntt.co.jp writes:
 I have a question.  I think it would be also better to extend the syntax for 
 the
 SQL COPY command in the same way, ie,
 COPY foo from '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz |' with 
 format
 'csv'

 Yeah, sure --- that case is already superuser-only, so why not give it
 the option of being a popen instead of just fopen.

BTW, one thought that comes to mind is that such an operation is
extremely likely to fail under environments such as SELinux.  That's
not necessarily a reason not to do it, but we should be wary of
promising that it will work everywhere.  Probably a documentation note
about this would be enough.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers