Re: [HACKERS] Patch to add regression tests for SCHEMA

2013-05-08 Thread Fabien COELHO


Dear Robins,


Here is an updated patch that uses different schema / role names for
different tests (as per commitfest site feedback).


Short review about this version of the patch:

This patch work for me.

This test is a good thing and allows schema to be thoroughly tested, 
including corner cases which must fail because of errors or permissions.


Two remarks:

 - test 2 bis: why name 'pg_asdf'? why not 'pg_schema_schsome number'
   to be homogeneous with other tests?

 - test 3: why not WHERE schema_name='schema_sch3' instead of two
   negative comparisons? ISTM that if for some reason in the future a new
   schema name is added, the test will fail.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add some regression tests for SEQUENCE

2013-05-08 Thread Fabien COELHO


Have provided an updated patch as per Fabien's recent response on 
Commitfest site. Any and all feedback is appreciated.


Review:

This patch works for me.

It adds valuable sequence test cases, especially trying corner cases with 
expected errors and permission denials.


I suggest to accept it.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about index inheritance

2013-05-08 Thread Vincenzo Melandri
On Tue, May 7, 2013 at 11:55 PM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, May 6, 2013 at 9:30 AM, Vincenzo Melandri vmelan...@imolinfo.it
 wrote:
  Hi guys,
 
  My first post here :)
  I stumbled into the same problem as this guy
  http://www.postgresql.org/message-id/4be2835a.5020...@cybertec.at
  , so since I have some spare time recently, I've set-up the development
  environment for postgresql and I think I may be able to contibute for the
  feature of index inheritance, that is currently unsopported, but listed
 in
  TODOs.
 
  I've spent some time reading the docs and I took a look at the code. Is
  anybody out there working on this already? I don't want to overlap
 someone
  else effort, plus I'll gladly take any advice or join the community
 efforts
  if any, 'cause this feature seems pretty huge to me at a first glance..

 This is a really hard problem.  If you pick this as your first project
 hacking on PostgreSQL, you will almost certainly fail.


Thank you very much, i guessed that already -.-
Still, I needed that at my office for a long time, struggled with it many
times and had to come out with some exotic solutions...
Now I have spare time between projects, so I can work on it full-time. At
least it's worth a try, isn't it?

Anyway, I'm working to better understand the problem, trying to identify at
least the main involved points.
At the moment I'm figuring out how the inherit mechanism works for
relations (in tablecmds.c).. Then I'll figure out about how indexes work..

I guess you discussed this plenty of time already in the past, but I didn't
found it in the archive. Any hint for old discussions?
I'll try to come out with a list of potential things to do, for you guys to
validate and discuss.

PS: i wrote last mail from an address with which I had not subscribed to
the list, and still the message got through.. Odd..
-- 
Vincenzo.
http://www.linkedin.com/pub/vincenzo-melandri/14/16/730


Re: [HACKERS] GSOC Student Project Idea

2013-05-08 Thread Heikki Linnakangas

On 24.04.2013 14:31, Florian Pflug wrote:

On Apr23, 2013, at 23:25 , Alexander Korotkovaekorot...@gmail.com
wrote:

I've taken a brief look on the paper and implementation. As I can
see iDistance implements some global building strategy. I mean, for
example, it selects some point, calculates distances from selected
point to all points in dataset etc. So, it uses the whole dataset
at the same time.

However you can try to implement global index building in GiST or
SP-GiST. In this case I think you should carefully estimate your
capabilities during single GSoC project. You would need to extend
GiST or SP-GiST interface and write completely new implementation
of tree building algorithm. Question of how to exactly extend GiST
or SP-GiST interface for this case could appear to be very hard
even theoretically.


+1. That seemed to be a major roadblock to me too when I read the
paper.

You could work around that by making partition identification a
separate step. You'd have a function

idist_analyze(cfg name, table name, field name)

which'd identify suitable partitions for the data distribution in
table.field and store them somewhere. Such a set of pre-identified
partitions would be akin to a tsearch configuration, i.e. all other
parts of the iDistance machinery would use it to map points to index
keys and queries to ranges of those keys. You'll want to look at how
tsearch handles that, and check if the method can indeed be applied
to iDistance.


You could perform that step as part of the index build. Before the index 
build starts to add tuples to the index, it could scan a random sample 
of the heap and identify the partitions based on that.


If you need to store the metadata, like a map of partitions, it becomes 
difficult to cajole this into a normal GiST or SP-GiST opclass. The API 
doesn't have any support for storing such metadata.



In a first cut, you'd probably only allow inserts into index which
don't change the maximum distances from the partition centers that
idist_analyze() found.


That seems like a pretty serious restriction. I'd try to write it so 
that you can insert any value, but if the new values are very different 
from any existing values, it would be OK for the index quality to 
degrade. For example, you could simply add any out-of-bounds values to a 
separate branch in the index, which would have no particular structure 
and would just have to be scanned on every query. You can probably do 
better than that, but that would be a trivial way to deal with it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint

2013-05-08 Thread Andres Freund
On 2013-05-07 21:45:02 -0400, Tom Lane wrote:
 Greg Stark st...@mit.edu writes:
  If we just reverted your fix and didn't fix it in 9.2 that would also
  fix the crash right? The bug was only that it leaked the fact that the
  view was provably empty from the definition?
 
 Well, it might fail to report a permissions violation when the
 not-allowed-to-be-accessed relation could be proven to yield no rows.
 I agree that it's a bit hard to call that a security issue as long as
 you assume that the attacker has access to the system catalogs; and
 even if you don't assume that, being able to discern that there's a
 check constraint on some table doesn't seem like a big leakage.

Couldn't it also cause tables not to be locked that ought to be? That
seems to be the nastier part to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC Student Project Idea

2013-05-08 Thread Heikki Linnakangas

On 24.04.2013 22:10, Michael Schuh wrote:

Thank you both for the very helpful feedback. Perhaps the scope of this
project (application's completeness criteria) is better as
a feasibility prototyping of the global/distance-based index strategy with
B+-tree and/or GiST extension possibilities.


For GSoC, we'd really like to see some code that can be committed as a 
result. Prototyping is important, but if that's all you do during the 
summer, the work is likely going to waste if no-one is going to work 
actively on the prototype afterwards.


At this point, I think we need a more concrete plan on how this would be 
implemented. The idea of using a regular B-tree for this, with some 
functions to do the partition mapping might work. However, that would be 
a clunky interface - I don't think that would be accepted into 
PostgreSQL. So I don't think that makes a good GSoC project.


If you think this can be done with the existing GiST or SP-GiST APIs, 
I'd like to see a more concrete plan on how that would work. What 
datatype would this be for? How would the partitioning be done? If the 
APIs need to be extended, what would the extensions look like? The 
summer is short, so there isn't much time for exploration - we need to 
have a pretty good idea of what the result will look like, right now.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cube extension improvement, GSoC

2013-05-08 Thread Alexander Korotkov
On Sat, May 4, 2013 at 11:19 PM, Stas Kelvich stanc...@gmail.com wrote:

  I think we have at least 3 data types more or less similar to cube.
   1) array of ranges
  2) range of arrays
  3) 2d arrays
  Semantically cube is most close to array or ranges. However array of
 ranges have huge storage overhead.
  Also we can declare cube as domain on 2d arrays and declare operations
 of that domain.

 But what we should do when arrays in different records have different
 numbers of element?


We can be faced with absolutely same situation with cube.

test=# create table cube_test (v cube);
CREATE TABLE

test=# insert into cube_test values (cube(array[1,2])),
(cube(array[1,2,3]));
INSERT 0 2

In order to force all cubes to have same number of dimensions excplicit
CHECK on table is required.
As I remember cube treats absent dimensions as zeros.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Terminology issue: suffix tree

2013-05-08 Thread Heikki Linnakangas

On 06.05.2013 14:10, Alexander Korotkov wrote:

On Sat, May 4, 2013 at 10:27 PM, Alexander Korotkovaekorot...@gmail.comwrote:

In suffix tree we insert every suffix of source string into the tree.
http://en.wikipedia.org/wiki/Suffix_tree
Actually opclass implemented radix tree or patricia tree.
http://en.wikipedia.org/wiki/Radix_tree
Likely we need a patch to rename it in all the places it mentioned.


Patch is attached.


Thanks, committed.


Apparently, we have same issue in contrib/unaccent.


Yeah. The data structure in contrib/unaccent seems to be a plain old 
trie, rather than a radix trie, though. According to wikipedia at least, 
the difference is that in a radix tree, the edges are labeled with 
sequences of elements, rather than single elements. Want to patch that too?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminology issue: suffix tree

2013-05-08 Thread Alexander Korotkov
On Wed, May 8, 2013 at 3:50 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:

 On 06.05.2013 14:10, Alexander Korotkov wrote:

 On Sat, May 4, 2013 at 10:27 PM, Alexander Korotkovaekorot...@gmail.com
 **wrote:

 In suffix tree we insert every suffix of source string into the tree.

 http://en.wikipedia.org/wiki/**Suffix_treehttp://en.wikipedia.org/wiki/Suffix_tree
 Actually opclass implemented radix tree or patricia tree.
 http://en.wikipedia.org/wiki/**Radix_treehttp://en.wikipedia.org/wiki/Radix_tree
 Likely we need a patch to rename it in all the places it mentioned.


 Patch is attached.


 Thanks, committed.


Thanks!


  Apparently, we have same issue in contrib/unaccent.


 Yeah. The data structure in contrib/unaccent seems to be a plain old trie,
 rather than a radix trie, though. According to wikipedia at least, the
 difference is that in a radix tree, the edges are labeled with sequences of
 elements, rather than single elements. Want to patch that too?


Agree, trie is most comforming term here. Patch is attached.

--
With best regards,
Alexander Korotkov.


unaccent-suffix-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint

2013-05-08 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-05-07 21:45:02 -0400, Tom Lane wrote:
 Well, it might fail to report a permissions violation when the
 not-allowed-to-be-accessed relation could be proven to yield no rows.

 Couldn't it also cause tables not to be locked that ought to be? That
 seems to be the nastier part to me.

In ordinary immediate execution the parser or planner would have
obtained the relevant table lock.  I think what you say is possible if a
prepared plan is re-executed, but TBH it doesn't sound like much of an
issue to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Karol Trzcionka
Hello,
as a continuation of my proposal expanding RETURNING syntax by
AFTER/BEFORE, there can be enough time to implement RETURNING for COPY.
I'd like to hear your opinion on that. My draft idea is:
COPY FROM ... RETURNING table_name.* - returns all values copied to
table after all triggers invoke
COPY FROM ... RETURNING BEFORE.* - returns all values copied to table
before triggers (all raw values from file)
COPY TO ... RETURNING table_name.* - returns all values from table
COPY TO ... RETURNING AFTER.*/BEFORE.* - the same
What do you think about?
Regards,
Karol Trzcionka


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving PL/Python builds on OS X

2013-05-08 Thread Dave Page
On Fri, Dec 21, 2012 at 5:45 PM, Peter Eisentraut pete...@gmx.net wrote:
 The PL/Python build on OS X is currently hardcoded to use the system
 Python install.  If you try to override this when running configure, you
 get a mysterious mix-and-match build.  If you want to build against your
 own Python build, or MacPorts or Homebrew, you can't.

 This is straightforward to fix.  In configure, besides checking Python
 include and library paths, we can also check whether it's a framework
 build and the right parameters for that.  The attached patch does that
 and does the job for me.  Please test it.

 One constraint, which is explained in the comment in
 src/pl/plpython/Makefile is that in Python 2.5, there is no official
 way to detect either framework builds or shared libpython builds, so we
 can't support those versions on OS X, at least without more hardcoding
 of things.  I'd rather phase some of that out, but if someone needs to
 continue to use Python 2.4 or earlier on OS X, let me know.  (Or more
 proper fix would be to split DLSUFFIX into two variables, but that seems
 more work than it's worth right now.)

This patch seems to have broken the build for our installers for 9.3.
Because we need a consistent build of the PL interpretors on all the
platforms we support, we use the ActiveState distributions of Perl,
Python and TCL (we can't rely on vendor supplied packages, because
their versions vary between different Linux distros and different OS X
versions). However, ActivePython doesn't include a shared library,
which this change seems to require.

Can that requirement be reverted?

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XLogFlush invoked about twice as much after 9.2 group commit enhancement

2013-05-08 Thread Jeff Janes
On Tue, May 7, 2013 at 9:23 PM, Peter Geoghegan p...@heroku.com wrote:

 On Tue, May 7, 2013 at 12:48 PM, Jeff Janes jeff.ja...@gmail.com wrote:
  Anyway, I don't see this behavior change when turning on wal_debug and
  looking in the logfiles for 'xlog flush request' messages.

 That could have everything to do with the hardware you're using. In
 general, the higher the cost of an fsync, the more useful it is to
 amortize that cost among concurrently committing transactions.


True, but that is going to happen with a proportional increase in
throughput, which he reported not seeing.  At least originally.  I'm not
sure what to think now.

Cheers,

Jeff


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint

2013-05-08 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 That's kind of dismaying. ORMs have a tendency to create queries like
 this and people may have even written such queries by hand and tested
 them to determine that postgres was able to exclude the useless
 relation. To have them install a security update and discover that
 something they had previously tested no longer worked would be
 annoying.

Turns out to be more to this than I realized before.  In an example
such as the one I showed

select * from
  ((select f1 as x from t1 offset 0)
   union all
   (select f2 as x from t2 offset 0)) ss
where false;

where an appendrel subselect member can be proven empty on the basis
of outer-query clauses alone, *we don't even plan that subquery*.
The fix I had in mind for this fails to capture table references from
such a subquery.

We could extend setrefs.c to dig into unplanned subqueries and grab RTEs
out of them, but that would not be a complete fix.  In particular, RTEs
would not get made for inheritance children of parent tables mentioned
in the query, since inheritance expansion is done by the planner.  Now,
that wouldn't affect permissions checks because no extra permissions
checks are done on inheritance children, but it would affect the locking
behavior that Andres was worried about.

I think the only reliable way to make this optimization fully
transparent would be to go ahead and plan every subquery, even when we
know we'll discard the planning results immediately.  That seems like
quite a lot of overkill.  I'm not really sure I buy Greg's argument
that people might be depending on the performance benefits of not
planning such subqueries, but I guess it's not impossible either.

My inclination is to go ahead and write the extra code to dig RTEs out
of unplanned subqueries, and not worry about failing to lock inheritance
children in them.

I'm also now pretty firmly in the camp of let's not try at all to fix
this in the back branches.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving PL/Python builds on OS X

2013-05-08 Thread Peter Eisentraut
On 5/8/13 11:51 AM, Dave Page wrote:
 This patch seems to have broken the build for our installers for 9.3.
 Because we need a consistent build of the PL interpretors on all the
 platforms we support, we use the ActiveState distributions of Perl,
 Python and TCL (we can't rely on vendor supplied packages, because
 their versions vary between different Linux distros and different OS X
 versions). However, ActivePython doesn't include a shared library,
 which this change seems to require.
 
 Can that requirement be reverted?

There was no change in this regard.  A shared library was always
required on OS X.

ActivePython does include a shared library.  I just tried it and it
builds fine.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Pavel Stehule
Hello

why? What is motivation? What is use case?

Regards

Pavel


2013/5/8 Karol Trzcionka karl...@gmail.com

 Hello,
 as a continuation of my proposal expanding RETURNING syntax by
 AFTER/BEFORE, there can be enough time to implement RETURNING for COPY.
 I'd like to hear your opinion on that. My draft idea is:
 COPY FROM ... RETURNING table_name.* - returns all values copied to
 table after all triggers invoke
 COPY FROM ... RETURNING BEFORE.* - returns all values copied to table
 before triggers (all raw values from file)
 COPY TO ... RETURNING table_name.* - returns all values from table
 COPY TO ... RETURNING AFTER.*/BEFORE.* - the same
 What do you think about?
 Regards,
 Karol Trzcionka


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Tom Lane
Karol Trzcionka karl...@gmail.com writes:
 as a continuation of my proposal expanding RETURNING syntax by
 AFTER/BEFORE, there can be enough time to implement RETURNING for COPY.

No there isn't; what you suggest would require FE/BE protocol
extensions, making it several orders of magnitude more work than the
other thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Heikki Linnakangas

On 08.05.2013 19:44, Tom Lane wrote:

Karol Trzcionkakarl...@gmail.com  writes:

as a continuation of my proposal expanding RETURNING syntax by
AFTER/BEFORE, there can be enough time to implement RETURNING for COPY.


No there isn't; what you suggest would require FE/BE protocol
extensions, making it several orders of magnitude more work than the
other thing.


I'd imagine that the flow would go something like this:

BE  FE

CopyInResponse
CopyData
CopyData
...
CopyDone
RowDescription
DataRow
DataRow
CommandComplete

A well-written client should be able to handle that. But if one expects 
the backend to always send a CommandComplete after CopyDone, it will get 
confused. Implementing that doesn't seem too difficult to me.


I agree that this is much more work than the UPDATE RETURNING 
BEFORE/AFTER, though. Not sure if that's a good or a bad thing.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 08.05.2013 19:44, Tom Lane wrote:
 No there isn't; what you suggest would require FE/BE protocol
 extensions, making it several orders of magnitude more work than the
 other thing.

 I'd imagine that the flow would go something like this:

 BEFE

 CopyInResponse
   CopyData
   CopyData
   ...
   CopyDone
 RowDescription
 DataRow
 DataRow
 CommandComplete

That would require the backend to buffer the entire query response,
which isn't a great idea.  I would expect that such an operation would
need to interleave CopyData to the backend with DataRow responses.  Such
a thing could possibly be built on COPY_BOTH mode, but it would be a lot
of work (at both ends) for extremely debatable value.

The general idea of COPY is to load data as fast as possible, so weighing
it down with processing options seems like a pretty dubious idea even if
the implementation were easy.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving PL/Python builds on OS X

2013-05-08 Thread Dave Page
On Wed, May 8, 2013 at 5:34 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 5/8/13 11:51 AM, Dave Page wrote:
 This patch seems to have broken the build for our installers for 9.3.
 Because we need a consistent build of the PL interpretors on all the
 platforms we support, we use the ActiveState distributions of Perl,
 Python and TCL (we can't rely on vendor supplied packages, because
 their versions vary between different Linux distros and different OS X
 versions). However, ActivePython doesn't include a shared library,
 which this change seems to require.

 Can that requirement be reverted?

 There was no change in this regard.  A shared library was always
 required on OS X.

It's failing on Linux. Even worse, it configures fine and then builds
without error. There is a message spewed out by configure, but it
doesn't contain the words warning or error. Given that I explicitly
said I wanted Python support when I ran configure, it should certainly
fail with an error at configure time. We only noticed this was a
problem when the QA guys started diving in to more detailed tested, as
we don't watch for every message in the 50+ MB of logs our automated
build systems generate.

 ActivePython does include a shared library.  I just tried it and it
 builds fine.



--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Dimitri Fontaine
Karol Trzcionka karl...@gmail.com writes:
 as a continuation of my proposal expanding RETURNING syntax by

What about implementing support for OLD/NEW in per-statement triggers? I
guess you would expose the data via a SRF.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Andrew Dunstan


On 05/08/2013 01:16 PM, Tom Lane wrote:


That would require the backend to buffer the entire query response,
which isn't a great idea.  I would expect that such an operation would
need to interleave CopyData to the backend with DataRow responses.  Such
a thing could possibly be built on COPY_BOTH mode, but it would be a lot
of work (at both ends) for extremely debatable value.

The general idea of COPY is to load data as fast as possible, so weighing
it down with processing options seems like a pretty dubious idea even if
the implementation were easy.





That was my reaction. I would look it this very skeptically.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Jonathan S. Katz
On May 8, 2013, at 1:16 PM, Tom Lane wrote:

 Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 08.05.2013 19:44, Tom Lane wrote:
 No there isn't; what you suggest would require FE/BE protocol
 extensions, making it several orders of magnitude more work than the
 other thing.
 
 I'd imagine that the flow would go something like this:
 
 BE   FE
 
 CopyInResponse
  CopyData
  CopyData
  ...
  CopyDone
 RowDescription
 DataRow
 DataRow
 CommandComplete
 
 That would require the backend to buffer the entire query response,
 which isn't a great idea.  I would expect that such an operation would
 need to interleave CopyData to the backend with DataRow responses.  Such
 a thing could possibly be built on COPY_BOTH mode, but it would be a lot
 of work (at both ends) for extremely debatable value.
 
 The general idea of COPY is to load data as fast as possible, so weighing
 it down with processing options seems like a pretty dubious idea even if
 the implementation were easy.

There are cases that I indeed want to load data very quickly, but I want to 
perform an operation on it immediately after, e.g. removing bad data that was 
immediately added from that copy.  For instance, I do have this scenario:

WITH new_data AS (
COPY FROM ...
RETURNING id, field_to_check
)
DELETE FROM table
USING new_data
WHERE
table.id = new_data.id AND
new_data.field_to_check ~* 'bad data';

Now I can take care of that all in one step, and I know I'm only removing 
fields I just added.  This comes up when I am importing external files from 
other sources where I may not necessarily want all of the rows or some of the 
rows contain bad data.

This also presumes that COPY works in a CTE, which I'm not sure it does (and I 
will do the TIAS test after I hit send on this message).

Jonathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread David Fetter
On Wed, May 08, 2013 at 01:16:14PM -0400, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
  On 08.05.2013 19:44, Tom Lane wrote:
  No there isn't; what you suggest would require FE/BE protocol
  extensions, making it several orders of magnitude more work than the
  other thing.
 
  I'd imagine that the flow would go something like this:
 
  BE  FE
 
  CopyInResponse
  CopyData
  CopyData
  ...
  CopyDone
  RowDescription
  DataRow
  DataRow
  CommandComplete
 
 That would require the backend to buffer the entire query response,
 which isn't a great idea.  I would expect that such an operation would
 need to interleave CopyData to the backend with DataRow responses.  Such
 a thing could possibly be built on COPY_BOTH mode, but it would be a lot
 of work (at both ends) for extremely debatable value.
 
 The general idea of COPY is to load data as fast as possible,

With utmost respect, that is one of several use cases, and any change
would need to keep that use case unburdened.  A sometimes overlapping
set of use cases move data in and out of the database in a simple
manner.  In some of these, people might wish to trade some performance
for the feature.

A particular example would be one where there are several tables to be
loaded, some with generated columns that the future ones would depend
on.  Yes, it's possible (kinda) to do this with the FDW machinery, but
the burden is much higher as it requires DDL permission in general
each time.

 so weighing it down with processing options seems like a pretty
 dubious idea even if the implementation were easy.

Totally agreed that the fast load/unload code path must not be
affected by any such changes.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint

2013-05-08 Thread Andres Freund
On 2013-05-08 12:30:31 -0400, Tom Lane wrote:
 Greg Stark st...@mit.edu writes:
  That's kind of dismaying. ORMs have a tendency to create queries like
  this and people may have even written such queries by hand and tested
  them to determine that postgres was able to exclude the useless
  relation. To have them install a security update and discover that
  something they had previously tested no longer worked would be
  annoying.
 
 Turns out to be more to this than I realized before.  In an example
 such as the one I showed
 
 select * from
   ((select f1 as x from t1 offset 0)
union all
(select f2 as x from t2 offset 0)) ss
 where false;
 
 where an appendrel subselect member can be proven empty on the basis
 of outer-query clauses alone, *we don't even plan that subquery*.
 The fix I had in mind for this fails to capture table references from
 such a subquery.

 We could extend setrefs.c to dig into unplanned subqueries and grab RTEs
 out of them, but that would not be a complete fix.  In particular, RTEs
 would not get made for inheritance children of parent tables mentioned
 in the query, since inheritance expansion is done by the planner.  Now,
 that wouldn't affect permissions checks because no extra permissions
 checks are done on inheritance children, but it would affect the locking
 behavior that Andres was worried about.

I first thought that is fair enough since I thought that in most if not
all cases where locking plays a user visible role the parent relation
would get locked anyway when a child relations gets locked. Turns out,
we do it only the other way round, i.e. lock child relations when we
lock a parent relation, even for most ddl in child relations.

I am not sure if its really problematic, but it seems to allow scenarios
like:

S1: BEGIN;
S1: SELECT * FROM ((SELECT * FROM parent OFFSET 0) UNION ALL (SELECT * FROM 
parent OFFSET 0)) f WHERE false;
-- parent is locked now, children are not
S2: BEGIN;
S2: ALTER TABLE child_1 DROP CONSTRAINT foo;
S1: SELECT * FROM parent WHERE ...
-- blocks, waiting for S1 since child_1 is locked.

This seems like somewhat confusing behaviour, although it has gone
unnoticed so far, since one normally expect that a previous lock allows
you to continue workin with a relation.

But I guess this is better fixed by making all DDL on child relations
also lock their parent relation? That seems like a good idea anyway.

I am not at all convinced that this must be fixed, but also not the
other way round. I just wanted to point this out since I am not sure
there aren't any more problematic cases.

 I think the only reliable way to make this optimization fully
 transparent would be to go ahead and plan every subquery, even when we
 know we'll discard the planning results immediately.  That seems like
 quite a lot of overkill.  I'm not really sure I buy Greg's argument
 that people might be depending on the performance benefits of not
 planning such subqueries, but I guess it's not impossible either.

I didn't understand Greg's argument as being based on performance but as
being worried about the changed locking and such from a functional
perspective. Greg?

I don't really buy the performance argument either, but I agree that we
shouldn't do all this in the back branches as the bug isn't really bad
and it has some potential for introducing problems.

 I'm also now pretty firmly in the camp of let's not try at all to fix
 this in the back branches.

+1 independent of where this goes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminology issue: suffix tree

2013-05-08 Thread Heikki Linnakangas

On 08.05.2013 15:49, Alexander Korotkov wrote:

On Wed, May 8, 2013 at 3:50 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:


Yeah. The data structure in contrib/unaccent seems to be a plain old trie,
rather than a radix trie, though. According to wikipedia at least, the
difference is that in a radix tree, the edges are labeled with sequences of
elements, rather than single elements. Want to patch that too?


Agree, trie is most comforming term here. Patch is attached.


Ok, applied.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Ryan Kelly
On Wed, May 05/08/13, 2013 at 10:55:40AM -0700, David Fetter wrote:
 On Wed, May 08, 2013 at 01:16:14PM -0400, Tom Lane wrote:
  Heikki Linnakangas hlinnakan...@vmware.com writes:
   On 08.05.2013 19:44, Tom Lane wrote:
   No there isn't; what you suggest would require FE/BE protocol
   extensions, making it several orders of magnitude more work than the
   other thing.
  
   I'd imagine that the flow would go something like this:
  
   BEFE
  
   CopyInResponse
 CopyData
 CopyData
 ...
 CopyDone
   RowDescription
   DataRow
   DataRow
   CommandComplete
  
  That would require the backend to buffer the entire query response,
  which isn't a great idea.  I would expect that such an operation would
  need to interleave CopyData to the backend with DataRow responses.  Such
  a thing could possibly be built on COPY_BOTH mode, but it would be a lot
  of work (at both ends) for extremely debatable value.
  
  The general idea of COPY is to load data as fast as possible,
 
 With utmost respect, that is one of several use cases, and any change
 would need to keep that use case unburdened.  A sometimes overlapping
 set of use cases move data in and out of the database in a simple
 manner.  In some of these, people might wish to trade some performance
 for the feature.

99% of my uses at work for COPY are as a general data import and export
facility. I often find myself loading CSV files into our database for
analysis and further cleanup, and then use COPY to output queries as CSV
files for consumption by other members of the business.

The recent work for (PRE|POST)PROCESSOR options to COPY is indicative of
the fact that users are not merely using COPY to load data as fast as
possible.

Other discussions around a COMPRESSED option are more than just a
performance enhancement, in my view, as I oftern receive files
compressed and decompressing the data is just another step standing in
the way of myself importing the data into the database.

Additionally, once I have the data imported, I often take many steps to
cleanup and format the data, prior to applying actual typing to a table
(which invariably fails due to invalid dates, and other nonsense).

COPY ... RETURNING would certainly be useful to apply additional
transformations to the data before finally sending it to its ultimate
destination.

 A particular example would be one where there are several tables to be
 loaded, some with generated columns that the future ones would depend
 on.  Yes, it's possible (kinda) to do this with the FDW machinery, but
 the burden is much higher as it requires DDL permission in general
 each time.

I find using the FDW machinery to perform many queries to be much slower
than importing the data once and then running my queries. There is also
no ability to use indexes.

  so weighing it down with processing options seems like a pretty
  dubious idea even if the implementation were easy.
 
 Totally agreed that the fast load/unload code path must not be
 affected by any such changes.

Agreed here as well.

-Ryan P. Kelly



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Stephen Frost
* Ryan Kelly (rpkell...@gmail.com) wrote:
 COPY ... RETURNING would certainly be useful to apply additional
 transformations to the data before finally sending it to its ultimate
 destination.

If we really think that COPY ... RETURNING is only going to be used in a
CTE or similar, then we could always only support that and forgo any
changes to the FE/BE protocol to support it.  Or, at least, take the
simplest approach to supporting it which would involve cacheing the data
entirely before sending it back to the client (isn't that what we do on
a big INSERT ... VALUES ... RETURNING anyway?  people can transfer in
blocks if they want to with INSERT .. VALUES or COPY .. RETURNING).

Thanks,

Stephen


signature.asc
Description: Digital signature


[HACKERS] Taking the varattno in args (where part of a query)

2013-05-08 Thread carla celiberti
Hi everyone,
I'm working on a project and one of my task is trying to take the where part 
of the queries.I'm programming inside the function exec_simple_query in 
postgres.c.I searched inside the source code and how I've understood I need 
to take the varattno in args in qpqual, is it right?If yes, how do I have 
to save it in order to have it in that function?If no, what is the easiest way 
to take it?
Example: For a query: select firstname from friend where age=33The attributes 
of the table friends are: firstname, lastname, city, state, ageI need to have 
the number 5 because age is the fifth attribute of friend.
Thanks

Re: [HACKERS] Graph datatype addition

2013-05-08 Thread Atri Sharma
On Thu, May 2, 2013 at 7:58 AM, Atri Sharma atri.j...@gmail.com wrote:


 Sent from my iPad

 On 02-May-2013, at 4:33, Misa Simic misa.si...@gmail.com wrote:



 On Wednesday, May 1, 2013, Atri Sharma wrote:

 Hi all,

 Please find a probable prototype for the same:

 struct GraphNode
 {
 Oid NodeOid;// Oid of the row which is the node here. We will
 store an identifier to it here rather than the complete row(with data)
 itself.
 AdjacencyList *list;   // Pointer to the node's adjacency list.
 };

 struct AdjacencyList
 {
   Oid[] neighbours_list;
 };

 struct AdjacencyList is probably the 'hottest' data structure in our
 entire implementation. We can think of making a cache of recently
 accessed struct AdjacencyList instances, or the AdjacencyList(s) of
 the neighbours of the recently accessed nodes, because, they are most
 likely to be accessed in near future. Advice here, please?

 So.

 struct AdjacencyCache
 {
  Oid[] cache_values;
 };

 push and pop functions for AdjacencyCache follow.

 We need a replacement and invalidation algorithm for the cache. I feel
 a version of LRU should be good here.

 I have not given a prototype for operations and algorithm implementations.

 I feel,as suggested by Peter and Jaime, we can look at pgRouting code
 for algorithm implementations.

 Florian's concerns are mitigated here to some extent,IMO. Since the
 nodes and linkings are loosely coupled, and not represented as a
 single representation, updating or changing of any part or adding a
 new edge is no longer an expensive operation, as it only requires a
 lookup of GraphNode and then its AdjacencyList. If we use the cache as
 well, it will further reduce the lookup costs.

 I have not yet thought of the user visible layer as suggested by Jim.
 Probably. once we are ok with the internal layer, we can move to the
 user visible layer.

 Advice/Comments/Feedback please?


 Honestly - I think I dont understand proposal...

 Datatypes - are about values - what will be stored in that column in a
 table

 Datatype - cant have any clue about rows

 How I understand what you described - you can achieve the same with pure SQL
 - struct are equvalent to graph tables... Instead od Oid column will store
 PKs of nodes table...


 Yes, I agree.I need to think more.

 Let me get back with a deeper proposal.

 Regards,

 Atri

Hi all,

In continuation of the above discussion,I have been thinking about the
design of the data type. I have been thinking on the lines of making a
multi dimensional data structure for the same:

Specifically, I have been thinking about making multi lists for
representing data. After some research, I think that the following
data structure may help:

Each node will be represented as:

[Down Pointer][Atom][Right Pointer]

Suppose, a graph is like(sorry for the bad drawing):

 B
   /
AD
  \  /
   C
\
 E

can be represented as:
 C's dataE's data
  D's data
 ^   ^
 ^
A's data
[|][1][--[|][1][--[|][1][NULL]
 ^  ^
[|][1][--[|][0][-[|][1][NULL]
   ^
   B's data


Essentially, the Atom flag denotes if the node has any out edges from
it. If it has no out edge, ATOM is 0 and Down Pointer points to an
auxiliary structure which holds the node's data(hence, the data can be
of arbitrary size).

If the node has some out degree, then, those nodes are added to a new
sublist which starts from the node which spawns those nodes.Node's
down pointer points to the head of the new sublist.

Essentially, a sublist has all the nodes directly spawning from the
head node of the sublist.

This approach has multiple advantages in term of memory and
efficiency. Also, isolating sub graphs based on some criteria is
pretty efficient, which is good for many analytics based operations.

Access time is restricted as well.

Thoughts/Comments/Feedback please?

Regards,

Atri

--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Graph datatype addition

2013-05-08 Thread Jim Nasby

On 5/8/13 1:40 PM, Atri Sharma wrote:

On Thu, May 2, 2013 at 7:58 AM, Atri Sharma atri.j...@gmail.com wrote:



Sent from my iPad

On 02-May-2013, at 4:33, Misa Simic misa.si...@gmail.com wrote:



On Wednesday, May 1, 2013, Atri Sharma wrote:


Hi all,

Please find a probable prototype for the same:

struct GraphNode
{
 Oid NodeOid;// Oid of the row which is the node here. We will
store an identifier to it here rather than the complete row(with data)
itself.
 AdjacencyList *list;   // Pointer to the node's adjacency list.
};

struct AdjacencyList
{
   Oid[] neighbours_list;
};

struct AdjacencyList is probably the 'hottest' data structure in our
entire implementation. We can think of making a cache of recently
accessed struct AdjacencyList instances, or the AdjacencyList(s) of
the neighbours of the recently accessed nodes, because, they are most
likely to be accessed in near future. Advice here, please?

So.

struct AdjacencyCache
{
  Oid[] cache_values;
};

push and pop functions for AdjacencyCache follow.

We need a replacement and invalidation algorithm for the cache. I feel
a version of LRU should be good here.

I have not given a prototype for operations and algorithm implementations.

I feel,as suggested by Peter and Jaime, we can look at pgRouting code
for algorithm implementations.

Florian's concerns are mitigated here to some extent,IMO. Since the
nodes and linkings are loosely coupled, and not represented as a
single representation, updating or changing of any part or adding a
new edge is no longer an expensive operation, as it only requires a
lookup of GraphNode and then its AdjacencyList. If we use the cache as
well, it will further reduce the lookup costs.

I have not yet thought of the user visible layer as suggested by Jim.
Probably. once we are ok with the internal layer, we can move to the
user visible layer.

Advice/Comments/Feedback please?



Honestly - I think I dont understand proposal...

Datatypes - are about values - what will be stored in that column in a
table

Datatype - cant have any clue about rows

How I understand what you described - you can achieve the same with pure SQL
- struct are equvalent to graph tables... Instead od Oid column will store
PKs of nodes table...


Yes, I agree.I need to think more.

Let me get back with a deeper proposal.

Regards,

Atri


Hi all,

In continuation of the above discussion,I have been thinking about the
design of the data type. I have been thinking on the lines of making a
multi dimensional data structure for the same:

Specifically, I have been thinking about making multi lists for
representing data. After some research, I think that the following
data structure may help:

Each node will be represented as:

[Down Pointer][Atom][Right Pointer]

Suppose, a graph is like(sorry for the bad drawing):

  B
/
AD
   \  /
C
 \
  E

can be represented as:
  C's dataE's data
   D's data
  ^   ^
  ^
A's data
[|][1][--[|][1][--[|][1][NULL]
  ^  ^
[|][1][--[|][0][-[|][1][NULL]
^
B's data


Essentially, the Atom flag denotes if the node has any out edges from
it. If it has no out edge, ATOM is 0 and Down Pointer points to an
auxiliary structure which holds the node's data(hence, the data can be
of arbitrary size).

If the node has some out degree, then, those nodes are added to a new
sublist which starts from the node which spawns those nodes.Node's
down pointer points to the head of the new sublist.

Essentially, a sublist has all the nodes directly spawning from the
head node of the sublist.

This approach has multiple advantages in term of memory and
efficiency. Also, isolating sub graphs based on some criteria is
pretty efficient, which is good for many analytics based operations.

Access time is restricted as well.

Thoughts/Comments/Feedback please?


Your second drawing didn't really make any sense to me. :(

I do think it would be most productive to focus on what the API for dealing 
with graph data would look like before trying to handle the storage aspect. The 
storage is potentially dirt-simple, as others have shown. The only challenge 
would be efficiency, but it's impossible to discuss efficiency without some 
clue of how the data will be accessed. Frankly, for the first round of this I 
think it would be best if the storage really was just some raw tables. Once 
something is available people will start figuring out how to use it, and where 
the API needs to be improved.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list 

Re: [HACKERS] Graph datatype addition

2013-05-08 Thread Atri Sharma

 Your second drawing didn't really make any sense to me. :(

 I do think it would be most productive to focus on what the API for dealing
 with graph data would look like before trying to handle the storage aspect.
 The storage is potentially dirt-simple, as others have shown. The only
 challenge would be efficiency, but it's impossible to discuss efficiency
 without some clue of how the data will be accessed. Frankly, for the first
 round of this I think it would be best if the storage really was just some
 raw tables. Once something is available people will start figuring out how
 to use it, and where the API needs to be improved.

 --

Thanks for your reply.

Yes,my drawing sucks.heh.

Ok,I agree. I was pretty perked up about efficiency in storage, hence
started designing.

Sketching out an API in terms of functionalities will require a
different viewpoint. I think make, insert, search, delete
functionalities would be straightly exposed to the user.Then,
functionalities to isolate sub graphs based on some criterion/criteria
and implementations of standard graph algorithms(BFS,DFS,Djikstra's
algorithm) can be exposed through single functions.

Regards,

Atri


--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about index inheritance

2013-05-08 Thread Martijn van Oosterhout
On Wed, May 08, 2013 at 10:19:08AM +0200, Vincenzo Melandri wrote:
 On Tue, May 7, 2013 at 11:55 PM, Robert Haas robertmh...@gmail.com wrote:
  This is a really hard problem.  If you pick this as your first project
  hacking on PostgreSQL, you will almost certainly fail.
 
 Thank you very much, i guessed that already -.-
 Still, I needed that at my office for a long time, struggled with it many
 times and had to come out with some exotic solutions...
 Now I have spare time between projects, so I can work on it full-time. At
 least it's worth a try, isn't it?

Well, you can work on it but I think it will be less programming and
more coming up with a feasable solution.

 Anyway, I'm working to better understand the problem, trying to identify at
 least the main involved points.
 At the moment I'm figuring out how the inherit mechanism works for
 relations (in tablecmds.c).. Then I'll figure out about how indexes work..

While there are probably old threads in the archives, I find the
easiest way to look at the problem is in the locking.  In particular, I
think if you can get unique indexes to work then the rest will follow.

Consider the case of an inheritence hierarchy and you want a unique
index on a column.  Since you want to be able to create and drop
children easily, each childs need to have an index just for them.  But
if you insert a row into one child you need to, somehow, prevent other
people also inserting the same value in a different child.  Efficiently
and deadlock-free.  This is hard, though we're up for crazy,
out-of-the-box ideas.

Note, there is one very special case, namely:

- The children are used for partitioning.

- The unique index you want is on the partition key.

Since each value can only possibly appear in one table your locking
problems vanish. The question is: how often does this happen?

Hope this helps,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Jim Nasby

On 5/8/13 12:54 PM, Jonathan S. Katz wrote:

On May 8, 2013, at 1:16 PM, Tom Lane wrote:


Heikki Linnakangas hlinnakan...@vmware.com writes:

On 08.05.2013 19:44, Tom Lane wrote:

No there isn't; what you suggest would require FE/BE protocol
extensions, making it several orders of magnitude more work than the
other thing.



I'd imagine that the flow would go something like this:



BE  FE



CopyInResponse
CopyData
CopyData
...
CopyDone
RowDescription
DataRow
DataRow
CommandComplete


That would require the backend to buffer the entire query response,
which isn't a great idea.  I would expect that such an operation would
need to interleave CopyData to the backend with DataRow responses.  Such
a thing could possibly be built on COPY_BOTH mode, but it would be a lot
of work (at both ends) for extremely debatable value.

The general idea of COPY is to load data as fast as possible, so weighing
it down with processing options seems like a pretty dubious idea even if
the implementation were easy.


There are cases that I indeed want to load data very quickly, but I want to 
perform an operation on it immediately after, e.g. removing bad data that was 
immediately added from that copy.  For instance, I do have this scenario:

WITH new_data AS (
COPY FROM ...
RETURNING id, field_to_check
)
DELETE FROM table
USING new_data
WHERE
table.id = new_data.id AND
new_data.field_to_check ~* 'bad data';

Now I can take care of that all in one step, and I know I'm only removing 
fields I just added.  This comes up when I am importing external files from 
other sources where I may not necessarily want all of the rows or some of the 
rows contain bad data.

This also presumes that COPY works in a CTE, which I'm not sure it does (and I 
will do the TIAS test after I hit send on this message).


What you're really asking for here is some kind of stream processing 
capability. There are spin-offs of Postgres that provide that capability (I 
know Neil Conway worked on some). Those are geared completely around stream 
processing, but I think it would be extremely interesting to provide some 
minimal support for that in community Postgres.

Using your use case as an example, something like this would be very 
interesting:

COPY table FROM ...
  WHERE field_to_check !~* 'bad data'
;

In this case we're just applying a simple WHERE clause against each incoming 
row.

Perhaps what I'm suggesting could be implemented with a CTE, but I'm not sure 
it makes sense to do it the way you propose, at least not initially. A CTE 
would provide so much flexibility that it'd be difficult for the optimizer to 
be efficient about it. Something like a WHERE clause directly on COPY would be 
a lot easier to handle. As someone mentioned, FDW might be another option there.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Jim Nasby

On 5/8/13 12:33 PM, Dimitri Fontaine wrote:

Karol Trzcionka karl...@gmail.com writes:

as a continuation of my proposal expanding RETURNING syntax by


What about implementing support for OLD/NEW in per-statement triggers? I
guess you would expose the data via a SRF.


Per statement NEW/OLD is an interesting case, in that it shares some of the 
same challenges; namely how to store the NEW and OLD recordsets efficiently. 
I've wondered if there'd be some way to do that by just storing a list of CTIDs 
(not sure if that'd work with HOT for OLD though).

I still like the idea of being able to exclude certain records during COPY 
though; not writing a tuple will always be more efficient than creating one and 
then nuking it after the fact. There's a similar argument to be made about 
in-line transforms too.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about index inheritance

2013-05-08 Thread Jim Nasby

On 5/8/13 2:17 PM, Martijn van Oosterhout wrote:

On Wed, May 08, 2013 at 10:19:08AM +0200, Vincenzo Melandri wrote:

On Tue, May 7, 2013 at 11:55 PM, Robert Haas robertmh...@gmail.com wrote:

This is a really hard problem.  If you pick this as your first project
hacking on PostgreSQL, you will almost certainly fail.


Thank you very much, i guessed that already -.-
Still, I needed that at my office for a long time, struggled with it many
times and had to come out with some exotic solutions...
Now I have spare time between projects, so I can work on it full-time. At
least it's worth a try, isn't it?


Well, you can work on it but I think it will be less programming and
more coming up with a feasable solution.


Anyway, I'm working to better understand the problem, trying to identify at
least the main involved points.
At the moment I'm figuring out how the inherit mechanism works for
relations (in tablecmds.c).. Then I'll figure out about how indexes work..


While there are probably old threads in the archives, I find the
easiest way to look at the problem is in the locking.  In particular, I
think if you can get unique indexes to work then the rest will follow.

Consider the case of an inheritence hierarchy and you want a unique
index on a column.  Since you want to be able to create and drop
children easily, each childs need to have an index just for them.  But
if you insert a row into one child you need to, somehow, prevent other
people also inserting the same value in a different child.  Efficiently
and deadlock-free.  This is hard, though we're up for crazy,
out-of-the-box ideas.

Note, there is one very special case, namely:

- The children are used for partitioning.

- The unique index you want is on the partition key.

Since each value can only possibly appear in one table your locking
problems vanish. The question is: how often does this happen?


I would also consider indexes that span multiple tables that are do NOT involve 
inheritance. That's the most generic case, so if you can make that work 
everything else should fall into place. The only caveat is that UPDATE and 
DELETE in an inheritance tree could produce unique challenges since they would 
start off by reading from more than one table.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Andrew Dunstan


On 05/08/2013 03:23 PM, Jim Nasby wrote:

WITH new_data AS (
COPY FROM ...
RETURNING id, field_to_check
)




Why is this better than this, which you can do today?

   WITH new_data AS (
INSERT into ... FROM foreign_table_with_file_fdw RETURNING ...
   )


The whole reason I abandoned trying to do this sort of thing with COPY 
was that I realized the FDW would provide what I wanted.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Ryan Kelly
On Wed, May 05/08/13, 2013 at 03:38:10PM -0400, Andrew Dunstan wrote:
 
 On 05/08/2013 03:23 PM, Jim Nasby wrote:
 WITH new_data AS (
 COPY FROM ...
 RETURNING id, field_to_check
 )
 
 
 Why is this better than this, which you can do today?
 
WITH new_data AS (
 INSERT into ... FROM foreign_table_with_file_fdw RETURNING ...
)
 
 
 The whole reason I abandoned trying to do this sort of thing with
 COPY was that I realized the FDW would provide what I wanted.

You need to first CREATE EXTENSION file_fdw. Then you need to CREATE
SERVER. Then CREATE FOREIGN TABLE. Which requires appropriate permission
to do those things, and certainly has no hope of working on the client
side.

-Ryan P. Kelly



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC Student Project Idea

2013-05-08 Thread Jim Nasby

On 5/8/13 3:54 AM, Heikki Linnakangas wrote:

On 24.04.2013 14:31, Florian Pflug wrote:

On Apr23, 2013, at 23:25 , Alexander Korotkovaekorot...@gmail.com
wrote:

I've taken a brief look on the paper and implementation. As I can
see iDistance implements some global building strategy. I mean, for
example, it selects some point, calculates distances from selected
point to all points in dataset etc. So, it uses the whole dataset
at the same time.

However you can try to implement global index building in GiST or
SP-GiST. In this case I think you should carefully estimate your
capabilities during single GSoC project. You would need to extend
GiST or SP-GiST interface and write completely new implementation
of tree building algorithm. Question of how to exactly extend GiST
or SP-GiST interface for this case could appear to be very hard
even theoretically.


+1. That seemed to be a major roadblock to me too when I read the
paper.

You could work around that by making partition identification a
separate step. You'd have a function

idist_analyze(cfg name, table name, field name)

which'd identify suitable partitions for the data distribution in
table.field and store them somewhere. Such a set of pre-identified
partitions would be akin to a tsearch configuration, i.e. all other
parts of the iDistance machinery would use it to map points to index
keys and queries to ranges of those keys. You'll want to look at how
tsearch handles that, and check if the method can indeed be applied
to iDistance.


You could perform that step as part of the index build. Before the index build 
starts to add tuples to the index, it could scan a random sample of the heap 
and identify the partitions based on that.

If you need to store the metadata, like a map of partitions, it becomes 
difficult to cajole this into a normal GiST or SP-GiST opclass. The API doesn't 
have any support for storing such metadata.


In a first cut, you'd probably only allow inserts into index which
don't change the maximum distances from the partition centers that
idist_analyze() found.


That seems like a pretty serious restriction. I'd try to write it so that you 
can insert any value, but if the new values are very different from any 
existing values, it would be OK for the index quality to degrade. For example, 
you could simply add any out-of-bounds values to a separate branch in the 
index, which would have no particular structure and would just have to be 
scanned on every query. You can probably do better than that, but that would be 
a trivial way to deal with it.


Or you could use the new insert to start a new partition.

Heck, maybe the focus should actually be on partitions and not individual 
records/points. ISTM the entire challenge here is figuring out a way to 
maintain a set of partitions that:

- Are limited enough in number that you can quickly perform operations/searches 
across all partitions
- Yet small enough that once you've narrowed down a set of partitions you don't 
have a ton of raw records to still look at

Before we had range types I experimented with representing time ranges as rectangles 
of varying size (ie: for (start, end), create rectangle(point(start,start), 
point(end,end)). The problem with that is you had to convert timestamp into a float, 
which was not exact. So when querying you could use a GiST index on all the 
rectangles to narrow your scope, but you still needed a set of exact clauses (ie: 
start = now() - '1 year' AND end = now()). Partitions would be similar in 
that they wouldn't be exact but could greatly narrow the search space (of course we'd 
want to handle the secondary exact checking internally instead of exposing the user 
to that).
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GENERAL] pg_upgrade fails, mismatch of relation OID - 9.1.9 to 9.2.4

2013-05-08 Thread Bruce Momjian
On Wed, May  8, 2013 at 02:27:18PM -0400, Evan D. Hoffman wrote:
 If you want to start the old cluster, you will need to remove
 the .old suffix from /var/lib/pgsql/9.1/data/global/pg_control.old.
 Because link mode was used, the old cluster cannot be safely
 started once the new cluster has been started.

 Linking user relation files
   /var/lib/pgsql/9.1/data/base/16406/3016054
 Mismatch of relation OID in database dbname: old OID 2938685, new OID 299721
 Failure, exiting

[ Moved to hackers ]

OK, that is odd.  We preserve old/new OIDs, (not relfilenode, as someone
suggested in this thread);  FYI:

 *  FYI, while pg_class.oid and pg_class.relfilenode are initially the same
 *  in a cluster, but they can diverge due to CLUSTER, REINDEX, or VACUUM
 *  FULL.  The new cluster will have matching pg_class.oid and
 *  pg_class.relfilenode values and be based on the old oid value.  This can
 *  cause the old and new pg_class.relfilenode values to differ.  In summary,
 *  old and new pg_class.oid and new pg_class.relfilenode will have the
 *  same value, and old pg_class.relfilenode might differ.

The problem reported is that pg_dump was not able to preserve the
old/new oids between clusters.  Can you get the answer for this query on
the old cluster:

SELECT relname from pg_class where oid = 2938685;

and on the new cluster, assuming you used 'copy' mode so you can start
the old/new clusters indepdendently:

SELECT relname from pg_class where oid = 299721;

I think we will find that there is something in pg_dump related to this
table that isn't preserving the oids.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-08 Thread Jim Nasby

On 4/5/13 6:39 PM, Jeff Davis wrote:

On Fri, 2013-04-05 at 10:34 +0200, Florian Pflug wrote:

Maybe we could scan forward to check whether a corrupted WAL record is
followed by one or more valid ones with sensible LSNs. If it is,
chances are high that we haven't actually hit the end of the WAL. In
that case, we could either log a warning, or (better, probably) abort
crash recovery.


+1.


Corruption of fields which we require to scan past the record would
cause false negatives, i.e. no trigger an error even though we do
abort recovery mid-way through. There's a risk of false positives too,
but they require quite specific orderings of writes and thus seem
rather unlikely. (AFAICS, the OS would have to write some parts of
record N followed by the whole of record N+1 and then crash to cause a
false positive).


Does the xlp_pageaddr help solve this?

Also, we'd need to be a little careful when written-but-not-flushed WAL
data makes it to disk, which could cause a false positive and may be a
fairly common case.


Apologies if this is a stupid question, but is this mostly an issue due to torn 
pages? IOW, if we had a way to ensure we never see torn pages, would that mean 
an invalid CRC on a WAL page indicated there really was corruption on that page?

Maybe it's worth putting (yet more) thought into the torn page issue... :/
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add --single-row to psql

2013-05-08 Thread Jim Nasby

On 5/1/13 7:36 PM, Robert Haas wrote:

On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasbyj...@nasby.net  wrote:

On 4/28/13 7:50 AM, Craig Ringer wrote:


I find it frustrating that I've never seen an @paraccel email address here
and that few of the other vendors of highly customised Pg offshoots are
contributing back. It's almost enough to make me like the GPL.


FWIW, I think there's a pretty large barrier to these folks contributing
back. Would the community really want to add a bunch of hooks to support
something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have
to change significant amounts of PG code, so much so that it's actually hard
for them to stay current (which is why most of them just fork).

I do think this is a shame, but I'm not sure of any good way to fix it.

Yep.  There are plenty of things that we do at EDB for good and valid
business reasons that I can't imagine the community accepting under
any circumstances.  For example, Oracle compatibility is not something
the community values as highly as EnterpriseDB (and our customers) do.
  I'm sure that many of those vendors are in similar situations - they
write code that only runs on specialized hardware, or (rather
commonly, I suspect) they remove parts of the functionality in order
to make certain things very fast.  Those are not trade-offs that make
sense for PostgreSQL, but I find it hard to understand what we'd gain
from preventing other people from making them.  There are in fact a
pretty large number of companies - EnterpriseDB, obviously, but there
are many, many others - that are choosing to build businesses around
PostgreSQL precisely because it*isn't*  GPL.  Personally, I think
that's a good thing for our community in terms of mindshare even when
companies choose not to contribute back - and it's even better when
they do.


FWIW, one point I was trying to make that was overlooked is that it seems to be 
exceptionally difficult for companies to fork Postgres and then stay current 
(AFAIK EnterpriseDB and Mammoth are the only products that have pulled that 
feat off). I believe that makes it significantly harder for them to actually 
contribute code back that doesn't give them a business advantage, as well as 
making it harder to justify hacking on the community codebase because they'll 
just face a very large hurdle when it comes to pulling that code back into 
their proprietary product.

I don't know of any good way to solve that problem. Maybe it's not worth 
solving... but I do suspect there's some useful stuff that the community has 
lost out on because of this.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add --single-row to psql

2013-05-08 Thread David Fetter
On Wed, May 08, 2013 at 06:08:28PM -0500, Jim Nasby wrote:
 On 5/1/13 7:36 PM, Robert Haas wrote:
 On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasbyj...@nasby.net  wrote:
 On 4/28/13 7:50 AM, Craig Ringer wrote:
 
 I find it frustrating that I've never seen an @paraccel email address 
 here
 and that few of the other vendors of highly customised Pg offshoots are
 contributing back. It's almost enough to make me like the GPL.
 
 FWIW, I think there's a pretty large barrier to these folks contributing
 back. Would the community really want to add a bunch of hooks to support
 something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys 
 have
 to change significant amounts of PG code, so much so that it's actually 
 hard
 for them to stay current (which is why most of them just fork).
 
 I do think this is a shame, but I'm not sure of any good way to fix it.
 Yep.  There are plenty of things that we do at EDB for good and valid
 business reasons that I can't imagine the community accepting under
 any circumstances.  For example, Oracle compatibility is not something
 the community values as highly as EnterpriseDB (and our customers) do.
   I'm sure that many of those vendors are in similar situations - they
 write code that only runs on specialized hardware, or (rather
 commonly, I suspect) they remove parts of the functionality in order
 to make certain things very fast.  Those are not trade-offs that make
 sense for PostgreSQL, but I find it hard to understand what we'd gain
 from preventing other people from making them.  There are in fact a
 pretty large number of companies - EnterpriseDB, obviously, but there
 are many, many others - that are choosing to build businesses around
 PostgreSQL precisely because it*isn't*  GPL.  Personally, I think
 that's a good thing for our community in terms of mindshare even when
 companies choose not to contribute back - and it's even better when
 they do.
 
 FWIW, one point I was trying to make that was overlooked is that it
 seems to be exceptionally difficult for companies to fork Postgres
 and then stay current (AFAIK EnterpriseDB and Mammoth are the only
 products that have pulled that feat off).

VMware and CitusDB are doing pretty well so far, but it's early days.

 I believe that makes it significantly harder for them to actually
 contribute code back that doesn't give them a business advantage, as
 well as making it harder to justify hacking on the community
 codebase because they'll just face a very large hurdle when it comes
 to pulling that code back into their proprietary product.
 
 I don't know of any good way to solve that problem. Maybe it's not
 worth solving... but I do suspect there's some useful stuff that the
 community has lost out on because of this.

Some of this is getting solved by making PostgreSQL more pluggable in
ways that isolate the proprietary stuff, i.e. make people not have to
touch the PostgreSQL core code much, if at all, in order to provide
whatever special features they provide.  Hooks and FDWs are two such
pluggable components.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-08 Thread Jeff Davis
On Wed, 2013-05-08 at 17:56 -0500, Jim Nasby wrote:
 Apologies if this is a stupid question, but is this mostly an issue
 due to torn pages? IOW, if we had a way to ensure we never see torn
 pages, would that mean an invalid CRC on a WAL page indicated there
 really was corruption on that page?
 
 Maybe it's worth putting (yet more) thought into the torn page
 issue... :/

Sort of. For data, a page is the logically-atomic unit that is expected
to be intact. For WAL, a record is the logically-atomic unit that is
expected to be intact.

So it might be better to say that the issue for the WAL is torn
records. A record might be larger than a page (it can hold up to three
full-page images in one record), but is often much smaller.

We use a CRC to validate that the WAL record is fully intact. The
concern is that, if it fails the CRC check, we *assume* that it's
because it wasn't completely flushed yet (i.e. a torn record). Based
on that assumption, neither that record nor any later record contains
committed transactions, so we can safely consider that the end of the
WAL (as of the crash) and bring the system up.

The problem is that the assumption is not always true: a CRC failure
could also indicate real corruption of WAL records that have been
previously flushed successfully, and may contain committed transactions.
That can mean we bring the system up way too early, corrupting the
database.

Unfortunately, it seems that doing any kind of validation to determine
that we have a valid end-of-the-WAL inherently requires some kind of
separate durable write somewhere. It would be a tiny amount of data (an
LSN and maybe some extra crosscheck information), so I could imagine
that would be just fine given the right hardware; but if we just write
to disk that would be pretty bad. Ideas welcome.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion failure

2013-05-08 Thread Fujii Masao
On Tue, May 7, 2013 at 6:57 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 While testing the bug from the Assertion failure at standby promotion, I
 bumped into a different bug in fast promotion. When the first checkpoint
 after fast promotion is performed, there is no guarantee that the
 checkpointer process is running with the correct, new, ThisTimeLineID. In
 CreateCheckPoint(), we have this:

 /*
  * An end-of-recovery checkpoint is created before anyone is
 allowed to
  * write WAL. To allow us to write the checkpoint record,
 temporarily
  * enable XLogInsertAllowed.  (This also ensures ThisTimeLineID is
  * initialized, which we need here and in AdvanceXLInsertBuffer.)
  */
 if (flags  CHECKPOINT_END_OF_RECOVERY)
 LocalSetXLogInsertAllowed();


 That ensures that ThisTimeLineID is updated when performing an
 end-of-recovery checkpoint, but it doesn't get executed with fast promotion.
 The consequence is that the checkpoint is created with the old timeline, and
 subsequent recovery from it will fail.

 I ran into this with the attached script. It sets up a master (M), a standby
 (B), and a cascading standby (C). I'm not sure why, but when I tried to
 simplify the script by removing the cascading standby, it started to work.
 The bug occurs in standby B, so I'm not sure why the presence of the
 cascading standby makes any difference. Maybe it just affects the timing.

Can this really happen? ISTM that the checkpointer should detect that
the recovery mode ends and call RecoveryInProgress()-InitXLOGAccess()
before calling CreateCheckPoint().

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add regression tests for COLLATE

2013-05-08 Thread Robins Tharakan
Hi,

Please find attached the updated patch.

Fabien pointed out that currently does not check for non-trivial locales. I
am still on the learning curve about LOCALEs and so, let me know if this is
a show-stopper. I guess I could look at it and get back in some time with
more tests as Fabien points out.

(Apologies for the delay though. An update to the patch was mostly done
back in April, but since most of the other Code-Coverage patches
(SCHEMA/ROLE/etc.) had no other feedback, I worked on all of them together
just this week).

--
Robins Tharakan


On 12 April 2013 09:28, Michael Paquier michael.paqu...@gmail.com wrote:




 On Thu, Apr 11, 2013 at 4:14 PM, Robins Tharakan thara...@gmail.comwrote:

  Hi,

 Please find attached a patch to take 'make check' code-coverage of
 COLLATE (/src/backend/commands/collationcmds) from 0% to 96%.

 Any feedback is more than welcome. Also posting this to Commitfest-next.

 Just by having a quick look at the patch, using object names of the type
 cX is too generic even if the tests are done in a private schema. Why not
 using a name like collate_obj_X or similar?
 --
 Michael



regress_collate_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add regression tests for SCHEMA

2013-05-08 Thread Robins Tharakan
Hi,

Please find attached an updated patch with the said changes.
I'll try to update the other patches (if they pertain to this feedback) and
update on their respective threads (as well as on Commitfest).

--
Robins Tharakan


On 8 May 2013 13:01, Fabien COELHO coe...@cri.ensmp.fr wrote:


 Dear Robins,


  Here is an updated patch that uses different schema / role names for
 different tests (as per commitfest site feedback).


 Short review about this version of the patch:

 This patch work for me.

 This test is a good thing and allows schema to be thoroughly tested,
 including corner cases which must fail because of errors or permissions.

 Two remarks:

  - test 2 bis: why name 'pg_asdf'? why not 'pg_schema_schsome number'
to be homogeneous with other tests?

  - test 3: why not WHERE schema_name='schema_sch3' instead of two
negative comparisons? ISTM that if for some reason in the future a new
schema name is added, the test will fail.

 --
 Fabien.



regress_schema_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving PL/Python builds on OS X

2013-05-08 Thread Peter Eisentraut
On Wed, 2013-05-08 at 18:24 +0100, Dave Page wrote:
 It's failing on Linux. Even worse, it configures fine and then builds
 without error. There is a message spewed out by configure, but it
 doesn't contain the words warning or error. Given that I explicitly
 said I wanted Python support when I ran configure, it should certainly
 fail with an error at configure time. We only noticed this was a
 problem when the QA guys started diving in to more detailed tested, as
 we don't watch for every message in the 50+ MB of logs our automated
 build systems generate.

It worked before because we used to allow linking shared libraries
against static libraries on some platforms.  But that was more or less a
lie because it doesn't work on 64-bit platforms.

ActiveState Python contains a static library with PIC files.  There is
no obvious way to detect that, which is why we don't support it
directly.  You can sort it out yourself by building with

make shared_libpython=yes

In the long term, find a way to detect whether the library is usable.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add regression tests for COLLATE

2013-05-08 Thread Tom Lane
Robins Tharakan thara...@gmail.com writes:
 Fabien pointed out that currently does not check for non-trivial locales. I
 am still on the learning curve about LOCALEs and so, let me know if this is
 a show-stopper. I guess I could look at it and get back in some time with
 more tests as Fabien points out.

You really can't, because there is no guarantee that any given machine
will have anything except C and POSIX.  But there's another problem:
I believe this test will fail on any machine where the database is
created with an encoding different from UTF8, because that encoding is
named in some of the error messages in the expected output.

This stuff is not easy to test in a portable way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Logging of PAM Authentication Failure

2013-05-08 Thread Amit Langote
Hello,

When client authentication method is set to pam in pg_hba.conf,
connecting using psql results in logging of authentication failure
even before a password prompt is provided, nonetheless user is
subsequently able to connect by providing a password. Following is
what is logged:

Password: LOG:  pam_authenticate failed: Conversation error
FATAL:  PAM authentication failed for user amit

To see what's going on I debugged psql and found that without a -W
option, this is bound to happen, since psql first attempts to connect
and without a password (which it doesn't know is required for the
first time), it fails and subsequently prompts for password. Correct
password then leads to successful connection.

I tried to observe the behavior with md5 method (without -W) and
observed that no authentication failure is logged, since server
probably behaves differently in response to the psql's first
connection request in that case. But, pam method leads to it being
logged.

Is this a problem?

--

Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add regression tests for ROLE (USER)

2013-05-08 Thread Robins Tharakan
Hi,

Please find an updated patch as per comments on Commitfest (comments
replicated below for ease of understanding).

Feedback 1:
fc: role_ro2/3 used twice?
rt: Corrected in this update.

Feedback 2:
fc: I do not understand why asdf conveys anything about an expected
failure. Association of Scientists, Developers and Faculties? :-)
rt: ASDF is a pattern that I learnt in one of the tests (SEQUENCE?) that
pre-existed when I started working. Its a slang for arbit text that I just
reused thinking that it is normal practice here. Anyway, have corrected
that in this update.

Feedback 3:
fc: 2030/1/1 - 2030-01-01? maybe use a larger date?
rt: 2030/1/1 date is not a failure point of the test. It needs to be a
valid date (but sufficiently distant that so that tests don't fail). I
tried setting this to 2200/1/1 and I get the same error message. Let me
know if this still needs to be a large date.
fb: VALID UNTIL '-12-31' works for me...
rt: I thought 20 years is a date sufficiently far ahead to ensure that this
test doesn't fail. Sure, have updated the test to use /1/1. Also, have
added more tests at the end to ensure date-checks are also being validated
in ALTER ROLE VALID UNTIL.

Let me know if you need anything else changed in this.
--
Robins Tharakan


On 20 March 2013 03:41, Robins Tharakan thara...@gmail.com wrote:

 Hi,

 Please find attached a patch to take 'make check' code-coverage of ROLE
 (USER) from 59% to 91%.

 Any feedback is more than welcome.
 --
 Robins Tharakan



regress_user_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion failure

2013-05-08 Thread Amit Kapila
On Thursday, May 09, 2013 6:29 AM Fujii Masao wrote:
 On Tue, May 7, 2013 at 6:57 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  While testing the bug from the Assertion failure at standby
 promotion, I
  bumped into a different bug in fast promotion. When the first
 checkpoint
  after fast promotion is performed, there is no guarantee that the
  checkpointer process is running with the correct, new,
 ThisTimeLineID. In
  CreateCheckPoint(), we have this:
 
  /*
   * An end-of-recovery checkpoint is created before anyone is
  allowed to
   * write WAL. To allow us to write the checkpoint record,
  temporarily
   * enable XLogInsertAllowed.  (This also ensures
 ThisTimeLineID is
   * initialized, which we need here and in
 AdvanceXLInsertBuffer.)
   */
  if (flags  CHECKPOINT_END_OF_RECOVERY)
  LocalSetXLogInsertAllowed();
 
 
  That ensures that ThisTimeLineID is updated when performing an
  end-of-recovery checkpoint, but it doesn't get executed with fast
 promotion.
  The consequence is that the checkpoint is created with the old
 timeline, and
  subsequent recovery from it will fail.
 
  I ran into this with the attached script. It sets up a master (M), a
 standby
  (B), and a cascading standby (C). I'm not sure why, but when I tried
 to
  simplify the script by removing the cascading standby, it started to
 work.
  The bug occurs in standby B, so I'm not sure why the presence of the
  cascading standby makes any difference. Maybe it just affects the
 timing.
 
 Can this really happen? ISTM that the checkpointer should detect that
 the recovery mode ends and call RecoveryInProgress()-InitXLOGAccess()
 before calling CreateCheckPoint().

Without fast-promotion, it will request/perform End of Recovery checkpoint
while still in recovery (before setting xlogctl-SharedRecoveryInProgress),
So I think before any new operation can start, it can make sure that
Checkpoint with new timeline is performed.

However with fast promotion, the request for checkpoint is done after
recovery; so some operations can happen before checkpoint with new timeline.
I think it can so happen that last checkpoint is with old timeline and there
are operations with new timeline which might have caused the problem Heikki
has seen.

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers