Re: [HACKERS] Add regression tests for COLLATE

2013-05-08 Thread Fabien COELHO



Fabien pointed out that currently does not check for non-trivial locales.


Indeed, but although it was not very from my point (my wish), and as 
pointed out by Tom, it is not quite possible to test non trivial locales 
because you cannot assume that a given locale is available on any test 
machine.


Maybe some tests could be applied under some condition, say a given locale 
is indeed available, but ISTM that it would require to change the test 
infrastructure in a portable way to add such feature.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion failure

2013-05-08 Thread Amit Kapila
On Thursday, May 09, 2013 6:29 AM Fujii Masao wrote:
> On Tue, May 7, 2013 at 6:57 PM, Heikki Linnakangas
>  wrote:
> > While testing the bug from the "Assertion failure at standby
> promotion", I
> > bumped into a different bug in fast promotion. When the first
> checkpoint
> > after fast promotion is performed, there is no guarantee that the
> > checkpointer process is running with the correct, new,
> ThisTimeLineID. In
> > CreateCheckPoint(), we have this:
> >
> >> /*
> >>  * An end-of-recovery checkpoint is created before anyone is
> >> allowed to
> >>  * write WAL. To allow us to write the checkpoint record,
> >> temporarily
> >>  * enable XLogInsertAllowed.  (This also ensures
> ThisTimeLineID is
> >>  * initialized, which we need here and in
> AdvanceXLInsertBuffer.)
> >>  */
> >> if (flags & CHECKPOINT_END_OF_RECOVERY)
> >> LocalSetXLogInsertAllowed();
> >
> >
> > That ensures that ThisTimeLineID is updated when performing an
> > end-of-recovery checkpoint, but it doesn't get executed with fast
> promotion.
> > The consequence is that the checkpoint is created with the old
> timeline, and
> > subsequent recovery from it will fail.
> >
> > I ran into this with the attached script. It sets up a master (M), a
> standby
> > (B), and a cascading standby (C). I'm not sure why, but when I tried
> to
> > simplify the script by removing the cascading standby, it started to
> work.
> > The bug occurs in standby B, so I'm not sure why the presence of the
> > cascading standby makes any difference. Maybe it just affects the
> timing.
> 
> Can this really happen? ISTM that the checkpointer should detect that
> the recovery mode ends and call RecoveryInProgress()->InitXLOGAccess()
> before calling CreateCheckPoint().

Without fast-promotion, it will request/perform End of Recovery checkpoint
while still in recovery (before setting xlogctl->SharedRecoveryInProgress),
So I think before any new operation can start, it can make sure that
Checkpoint with new timeline is performed.

However with fast promotion, the request for checkpoint is done after
recovery; so some operations can happen before checkpoint with new timeline.
I think it can so happen that last checkpoint is with old timeline and there
are operations with new timeline which might have caused the problem Heikki
has seen.

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add regression tests for ROLE (USER)

2013-05-08 Thread Robins Tharakan
Hi,

Please find an updated patch as per comments on Commitfest (comments
replicated below for ease of understanding).

Feedback 1:
fc: role_ro2/3 used twice?
rt: Corrected in this update.

Feedback 2:
fc: I do not understand why "asdf" conveys anything about an expected
failure. Association of Scientists, Developers and Faculties? :-)
rt: ASDF is a pattern that I learnt in one of the tests (SEQUENCE?) that
pre-existed when I started working. Its a slang for arbit text that I just
reused thinking that it is normal practice here. Anyway, have corrected
that in this update.

Feedback 3:
fc: 2030/1/1 -> 2030-01-01? maybe use a larger date?
rt: 2030/1/1 date is not a failure point of the test. It needs to be a
valid date (but sufficiently distant that so that tests don't fail). I
tried setting this to 2200/1/1 and I get the same error message. Let me
know if this still needs to be a large date.
fb: VALID UNTIL '-12-31' works for me...
rt: I thought 20 years is a date sufficiently far ahead to ensure that this
test doesn't fail. Sure, have updated the test to use /1/1. Also, have
added more tests at the end to ensure date-checks are also being validated
in ALTER ROLE VALID UNTIL.

Let me know if you need anything else changed in this.
--
Robins Tharakan


On 20 March 2013 03:41, Robins Tharakan  wrote:

> Hi,
>
> Please find attached a patch to take 'make check' code-coverage of ROLE
> (USER) from 59% to 91%.
>
> Any feedback is more than welcome.
> --
> Robins Tharakan
>


regress_user_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Logging of PAM Authentication Failure

2013-05-08 Thread Amit Langote
Hello,

When client authentication method is set to "pam" in pg_hba.conf,
connecting using psql results in logging of authentication failure
even before a password prompt is provided, nonetheless user is
subsequently able to connect by providing a password. Following is
what is logged:

Password: LOG:  pam_authenticate failed: Conversation error
FATAL:  PAM authentication failed for user "amit"

To see what's going on I debugged psql and found that without a -W
option, this is bound to happen, since psql first attempts to connect
and without a password (which it doesn't know is required for the
first time), it fails and subsequently prompts for password. Correct
password then leads to successful connection.

I tried to observe the behavior with md5 method (without -W) and
observed that no authentication failure is logged, since server
probably behaves differently in response to the psql's first
connection request in that case. But, pam method leads to it being
logged.

Is this a problem?

--

Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add regression tests for COLLATE

2013-05-08 Thread Tom Lane
Robins Tharakan  writes:
> Fabien pointed out that currently does not check for non-trivial locales. I
> am still on the learning curve about LOCALEs and so, let me know if this is
> a show-stopper. I guess I could look at it and get back in some time with
> more tests as Fabien points out.

You really can't, because there is no guarantee that any given machine
will have anything except "C" and "POSIX".  But there's another problem:
I believe this test will fail on any machine where the database is
created with an encoding different from UTF8, because that encoding is
named in some of the error messages in the expected output.

This stuff is not easy to test in a portable way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving PL/Python builds on OS X

2013-05-08 Thread Peter Eisentraut
On Wed, 2013-05-08 at 18:24 +0100, Dave Page wrote:
> It's failing on Linux. Even worse, it configures fine and then builds
> without error. There is a message spewed out by configure, but it
> doesn't contain the words warning or error. Given that I explicitly
> said I wanted Python support when I ran configure, it should certainly
> fail with an error at configure time. We only noticed this was a
> problem when the QA guys started diving in to more detailed tested, as
> we don't watch for every message in the 50+ MB of logs our automated
> build systems generate.

It worked before because we used to allow linking shared libraries
against static libraries on some platforms.  But that was more or less a
lie because it doesn't work on 64-bit platforms.

ActiveState Python contains a static library with PIC files.  There is
no obvious way to detect that, which is why we don't support it
directly.  You can sort it out yourself by building with

make shared_libpython=yes

In the long term, find a way to detect whether the library is usable.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add regression tests for SCHEMA

2013-05-08 Thread Robins Tharakan
Hi,

Please find attached an updated patch with the said changes.
I'll try to update the other patches (if they pertain to this feedback) and
update on their respective threads (as well as on Commitfest).

--
Robins Tharakan


On 8 May 2013 13:01, Fabien COELHO  wrote:

>
> Dear Robins,
>
>
>  Here is an updated patch that uses different schema / role names for
>> different tests (as per commitfest site feedback).
>>
>
> Short review about this version of the patch:
>
> This patch work for me.
>
> This test is a good thing and allows schema to be thoroughly tested,
> including corner cases which must fail because of errors or permissions.
>
> Two remarks:
>
>  - test 2 bis: why name 'pg_asdf'? why not 'pg_schema_sch'
>to be homogeneous with other tests?
>
>  - test 3: why not WHERE schema_name='schema_sch3' instead of two
>negative comparisons? ISTM that if for some reason in the future a new
>schema name is added, the test will fail.
>
> --
> Fabien.
>


regress_schema_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add regression tests for COLLATE

2013-05-08 Thread Robins Tharakan
Hi,

Please find attached the updated patch.

Fabien pointed out that currently does not check for non-trivial locales. I
am still on the learning curve about LOCALEs and so, let me know if this is
a show-stopper. I guess I could look at it and get back in some time with
more tests as Fabien points out.

(Apologies for the delay though. An update to the patch was mostly done
back in April, but since most of the other Code-Coverage patches
(SCHEMA/ROLE/etc.) had no other feedback, I worked on all of them together
just this week).

--
Robins Tharakan


On 12 April 2013 09:28, Michael Paquier  wrote:

>
>
>
> On Thu, Apr 11, 2013 at 4:14 PM, Robins Tharakan wrote:
>
>>  Hi,
>>
>> Please find attached a patch to take 'make check' code-coverage of
>> COLLATE (/src/backend/commands/collationcmds) from 0% to 96%.
>>
>> Any feedback is more than welcome. Also posting this to Commitfest-next.
>>
> Just by having a quick look at the patch, using object names of the type
> cX is too generic even if the tests are done in a private schema. Why not
> using a name like collate_obj_X or similar?
> --
> Michael
>


regress_collate_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion failure

2013-05-08 Thread Fujii Masao
On Tue, May 7, 2013 at 6:57 PM, Heikki Linnakangas
 wrote:
> While testing the bug from the "Assertion failure at standby promotion", I
> bumped into a different bug in fast promotion. When the first checkpoint
> after fast promotion is performed, there is no guarantee that the
> checkpointer process is running with the correct, new, ThisTimeLineID. In
> CreateCheckPoint(), we have this:
>
>> /*
>>  * An end-of-recovery checkpoint is created before anyone is
>> allowed to
>>  * write WAL. To allow us to write the checkpoint record,
>> temporarily
>>  * enable XLogInsertAllowed.  (This also ensures ThisTimeLineID is
>>  * initialized, which we need here and in AdvanceXLInsertBuffer.)
>>  */
>> if (flags & CHECKPOINT_END_OF_RECOVERY)
>> LocalSetXLogInsertAllowed();
>
>
> That ensures that ThisTimeLineID is updated when performing an
> end-of-recovery checkpoint, but it doesn't get executed with fast promotion.
> The consequence is that the checkpoint is created with the old timeline, and
> subsequent recovery from it will fail.
>
> I ran into this with the attached script. It sets up a master (M), a standby
> (B), and a cascading standby (C). I'm not sure why, but when I tried to
> simplify the script by removing the cascading standby, it started to work.
> The bug occurs in standby B, so I'm not sure why the presence of the
> cascading standby makes any difference. Maybe it just affects the timing.

Can this really happen? ISTM that the checkpointer should detect that
the recovery mode ends and call RecoveryInProgress()->InitXLOGAccess()
before calling CreateCheckPoint().

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-08 Thread Jeff Davis
On Wed, 2013-05-08 at 17:56 -0500, Jim Nasby wrote:
> Apologies if this is a stupid question, but is this mostly an issue
> due to torn pages? IOW, if we had a way to ensure we never see torn
> pages, would that mean an invalid CRC on a WAL page indicated there
> really was corruption on that page?
> 
> Maybe it's worth putting (yet more) thought into the torn page
> issue... :/

Sort of. For data, a page is the logically-atomic unit that is expected
to be intact. For WAL, a record is the logically-atomic unit that is
expected to be intact.

So it might be better to say that the issue for the WAL is "torn
records". A record might be larger than a page (it can hold up to three
full-page images in one record), but is often much smaller.

We use a CRC to validate that the WAL record is fully intact. The
concern is that, if it fails the CRC check, we *assume* that it's
because it wasn't completely flushed yet (i.e. a "torn record"). Based
on that assumption, neither that record nor any later record contains
committed transactions, so we can safely consider that the end of the
WAL (as of the crash) and bring the system up.

The problem is that the assumption is not always true: a CRC failure
could also indicate real corruption of WAL records that have been
previously flushed successfully, and may contain committed transactions.
That can mean we bring the system up way too early, corrupting the
database.

Unfortunately, it seems that doing any kind of validation to determine
that we have a valid end-of-the-WAL inherently requires some kind of
separate durable write somewhere. It would be a tiny amount of data (an
LSN and maybe some extra crosscheck information), so I could imagine
that would be just fine given the right hardware; but if we just write
to disk that would be pretty bad. Ideas welcome.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add --single-row to psql

2013-05-08 Thread David Fetter
On Wed, May 08, 2013 at 06:08:28PM -0500, Jim Nasby wrote:
> On 5/1/13 7:36 PM, Robert Haas wrote:
> >On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasby  wrote:
> >>>On 4/28/13 7:50 AM, Craig Ringer wrote:
> >
> >I find it frustrating that I've never seen an @paraccel email address 
> >here
> >and that few of the other vendors of highly customised Pg offshoots are
> >contributing back. It's almost enough to make me like the GPL.
> >>>
> >>>FWIW, I think there's a pretty large barrier to these folks contributing
> >>>back. Would the community really want to add a bunch of hooks to support
> >>>something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys 
> >>>have
> >>>to change significant amounts of PG code, so much so that it's actually 
> >>>hard
> >>>for them to stay current (which is why most of them just fork).
> >>>
> >>>I do think this is a shame, but I'm not sure of any good way to fix it.
> >Yep.  There are plenty of things that we do at EDB for good and valid
> >business reasons that I can't imagine the community accepting under
> >any circumstances.  For example, Oracle compatibility is not something
> >the community values as highly as EnterpriseDB (and our customers) do.
> >  I'm sure that many of those vendors are in similar situations - they
> >write code that only runs on specialized hardware, or (rather
> >commonly, I suspect) they remove parts of the functionality in order
> >to make certain things very fast.  Those are not trade-offs that make
> >sense for PostgreSQL, but I find it hard to understand what we'd gain
> >from preventing other people from making them.  There are in fact a
> >pretty large number of companies - EnterpriseDB, obviously, but there
> >are many, many others - that are choosing to build businesses around
> >PostgreSQL precisely because it*isn't*  GPL.  Personally, I think
> >that's a good thing for our community in terms of mindshare even when
> >companies choose not to contribute back - and it's even better when
> >they do.
> 
> FWIW, one point I was trying to make that was overlooked is that it
> seems to be exceptionally difficult for companies to fork Postgres
> and then stay current (AFAIK EnterpriseDB and Mammoth are the only
> products that have pulled that feat off).

VMware and CitusDB are doing pretty well so far, but it's early days.

> I believe that makes it significantly harder for them to actually
> contribute code back that doesn't give them a business advantage, as
> well as making it harder to justify hacking on the community
> codebase because they'll just face a very large hurdle when it comes
> to pulling that code back into their proprietary product.
> 
> I don't know of any good way to solve that problem. Maybe it's not
> worth solving... but I do suspect there's some useful stuff that the
> community has lost out on because of this.

Some of this is getting solved by making PostgreSQL more pluggable in
ways that isolate the proprietary stuff, i.e. make people not have to
touch the PostgreSQL core code much, if at all, in order to provide
whatever special features they provide.  Hooks and FDWs are two such
pluggable components.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add --single-row to psql

2013-05-08 Thread Jim Nasby

On 5/1/13 7:36 PM, Robert Haas wrote:

On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasby  wrote:

>On 4/28/13 7:50 AM, Craig Ringer wrote:

>>
>>I find it frustrating that I've never seen an @paraccel email address here
>>and that few of the other vendors of highly customised Pg offshoots are
>>contributing back. It's almost enough to make me like the GPL.

>
>FWIW, I think there's a pretty large barrier to these folks contributing
>back. Would the community really want to add a bunch of hooks to support
>something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have
>to change significant amounts of PG code, so much so that it's actually hard
>for them to stay current (which is why most of them just fork).
>
>I do think this is a shame, but I'm not sure of any good way to fix it.

Yep.  There are plenty of things that we do at EDB for good and valid
business reasons that I can't imagine the community accepting under
any circumstances.  For example, Oracle compatibility is not something
the community values as highly as EnterpriseDB (and our customers) do.
  I'm sure that many of those vendors are in similar situations - they
write code that only runs on specialized hardware, or (rather
commonly, I suspect) they remove parts of the functionality in order
to make certain things very fast.  Those are not trade-offs that make
sense for PostgreSQL, but I find it hard to understand what we'd gain
from preventing other people from making them.  There are in fact a
pretty large number of companies - EnterpriseDB, obviously, but there
are many, many others - that are choosing to build businesses around
PostgreSQL precisely because it*isn't*  GPL.  Personally, I think
that's a good thing for our community in terms of mindshare even when
companies choose not to contribute back - and it's even better when
they do.


FWIW, one point I was trying to make that was overlooked is that it seems to be 
exceptionally difficult for companies to fork Postgres and then stay current 
(AFAIK EnterpriseDB and Mammoth are the only products that have pulled that 
feat off). I believe that makes it significantly harder for them to actually 
contribute code back that doesn't give them a business advantage, as well as 
making it harder to justify hacking on the community codebase because they'll 
just face a very large hurdle when it comes to pulling that code back into 
their proprietary product.

I don't know of any good way to solve that problem. Maybe it's not worth 
solving... but I do suspect there's some useful stuff that the community has 
lost out on because of this.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] corrupt pages detected by enabling checksums

2013-05-08 Thread Jim Nasby

On 4/5/13 6:39 PM, Jeff Davis wrote:

On Fri, 2013-04-05 at 10:34 +0200, Florian Pflug wrote:

Maybe we could scan forward to check whether a corrupted WAL record is
followed by one or more valid ones with sensible LSNs. If it is,
chances are high that we haven't actually hit the end of the WAL. In
that case, we could either log a warning, or (better, probably) abort
crash recovery.


+1.


Corruption of fields which we require to scan past the record would
cause false negatives, i.e. no trigger an error even though we do
abort recovery mid-way through. There's a risk of false positives too,
but they require quite specific orderings of writes and thus seem
rather unlikely. (AFAICS, the OS would have to write some parts of
record N followed by the whole of record N+1 and then crash to cause a
false positive).


Does the xlp_pageaddr help solve this?

Also, we'd need to be a little careful when written-but-not-flushed WAL
data makes it to disk, which could cause a false positive and may be a
fairly common case.


Apologies if this is a stupid question, but is this mostly an issue due to torn 
pages? IOW, if we had a way to ensure we never see torn pages, would that mean 
an invalid CRC on a WAL page indicated there really was corruption on that page?

Maybe it's worth putting (yet more) thought into the torn page issue... :/
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GENERAL] pg_upgrade fails, "mismatch of relation OID" - 9.1.9 to 9.2.4

2013-05-08 Thread Bruce Momjian
On Wed, May  8, 2013 at 02:27:18PM -0400, Evan D. Hoffman wrote:
> If you want to start the old cluster, you will need to remove
> the ".old" suffix from /var/lib/pgsql/9.1/data/global/pg_control.old.
> Because "link" mode was used, the old cluster cannot be safely
> started once the new cluster has been started.
>
> Linking user relation files
>   /var/lib/pgsql/9.1/data/base/16406/3016054
> Mismatch of relation OID in database "dbname": old OID 2938685, new OID 299721
> Failure, exiting

[ Moved to hackers ]

OK, that is odd.  We preserve old/new OIDs, (not relfilenode, as someone
suggested in this thread);  FYI:

 *  FYI, while pg_class.oid and pg_class.relfilenode are initially the same
 *  in a cluster, but they can diverge due to CLUSTER, REINDEX, or VACUUM
 *  FULL.  The new cluster will have matching pg_class.oid and
 *  pg_class.relfilenode values and be based on the old oid value.  This can
 *  cause the old and new pg_class.relfilenode values to differ.  In summary,
 *  old and new pg_class.oid and new pg_class.relfilenode will have the
 *  same value, and old pg_class.relfilenode might differ.

The problem reported is that pg_dump was not able to preserve the
old/new oids between clusters.  Can you get the answer for this query on
the old cluster:

SELECT relname from pg_class where oid = 2938685;

and on the new cluster, assuming you used 'copy' mode so you can start
the old/new clusters indepdendently:

SELECT relname from pg_class where oid = 299721;

I think we will find that there is something in pg_dump related to this
table that isn't preserving the oids.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC Student Project Idea

2013-05-08 Thread Jim Nasby

On 5/8/13 3:54 AM, Heikki Linnakangas wrote:

On 24.04.2013 14:31, Florian Pflug wrote:

On Apr23, 2013, at 23:25 , Alexander Korotkov
wrote:

I've taken a brief look on the paper and implementation. As I can
see iDistance implements some global building strategy. I mean, for
example, it selects some point, calculates distances from selected
point to all points in dataset etc. So, it uses the whole dataset
at the same time.

However you can try to implement global index building in GiST or
SP-GiST. In this case I think you should carefully estimate your
capabilities during single GSoC project. You would need to extend
GiST or SP-GiST interface and write completely new implementation
of tree building algorithm. Question of how to exactly extend GiST
or SP-GiST interface for this case could appear to be very hard
even theoretically.


+1. That seemed to be a major roadblock to me too when I read the
paper.

You could work around that by making partition identification a
separate step. You'd have a function

idist_analyze(cfg name, table name, field name)

which'd identify suitable partitions for the data distribution in
table.field and store them somewhere. Such a set of pre-identified
partitions would be akin to a tsearch configuration, i.e. all other
parts of the iDistance machinery would use it to map points to index
keys and queries to ranges of those keys. You'll want to look at how
tsearch handles that, and check if the method can indeed be applied
to iDistance.


You could perform that step as part of the index build. Before the index build 
starts to add tuples to the index, it could scan a random sample of the heap 
and identify the partitions based on that.

If you need to store the metadata, like a map of partitions, it becomes 
difficult to cajole this into a normal GiST or SP-GiST opclass. The API doesn't 
have any support for storing such metadata.


In a first cut, you'd probably only allow inserts into index which
don't change the maximum distances from the partition centers that
idist_analyze() found.


That seems like a pretty serious restriction. I'd try to write it so that you 
can insert any value, but if the new values are very different from any 
existing values, it would be OK for the index quality to degrade. For example, 
you could simply add any out-of-bounds values to a separate branch in the 
index, which would have no particular structure and would just have to be 
scanned on every query. You can probably do better than that, but that would be 
a trivial way to deal with it.


Or you could use the new insert to start a new partition.

Heck, maybe the focus should actually be on partitions and not individual 
records/points. ISTM the entire challenge here is figuring out a way to 
maintain a set of partitions that:

- Are limited enough in number that you can quickly perform operations/searches 
across all partitions
- Yet small enough that once you've narrowed down a set of partitions you don't 
have a ton of raw records to still look at

Before we had range types I experimented with representing time ranges as rectangles 
of varying size (ie: for (start, end), create rectangle(point(start,start), 
point(end,end)). The problem with that is you had to convert timestamp into a float, 
which was not exact. So when querying you could use a GiST index on all the 
rectangles to narrow your scope, but you still needed a set of exact clauses (ie: 
start >= now() - '1 year' AND end <= now()). Partitions would be similar in 
that they wouldn't be exact but could greatly narrow the search space (of course we'd 
want to handle the secondary exact checking internally instead of exposing the user 
to that).
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Ryan Kelly
On Wed, May 05/08/13, 2013 at 03:38:10PM -0400, Andrew Dunstan wrote:
> 
> On 05/08/2013 03:23 PM, Jim Nasby wrote:
> >>WITH new_data AS (
> >>COPY FROM ...
> >>RETURNING id, field_to_check
> >>)
> >
> 
> Why is this better than this, which you can do today?
> 
>WITH new_data AS (
> INSERT into ... FROM foreign_table_with_file_fdw RETURNING ...
>)
> 
> 
> The whole reason I abandoned trying to do this sort of thing with
> COPY was that I realized the FDW would provide what I wanted.

You need to first CREATE EXTENSION file_fdw. Then you need to CREATE
SERVER. Then CREATE FOREIGN TABLE. Which requires appropriate permission
to do those things, and certainly has no hope of working on the client
side.

-Ryan P. Kelly



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Andrew Dunstan


On 05/08/2013 03:23 PM, Jim Nasby wrote:

WITH new_data AS (
COPY FROM ...
RETURNING id, field_to_check
)




Why is this better than this, which you can do today?

   WITH new_data AS (
INSERT into ... FROM foreign_table_with_file_fdw RETURNING ...
   )


The whole reason I abandoned trying to do this sort of thing with COPY 
was that I realized the FDW would provide what I wanted.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about index inheritance

2013-05-08 Thread Jim Nasby

On 5/8/13 2:17 PM, Martijn van Oosterhout wrote:

On Wed, May 08, 2013 at 10:19:08AM +0200, Vincenzo Melandri wrote:

On Tue, May 7, 2013 at 11:55 PM, Robert Haas  wrote:

This is a really hard problem.  If you pick this as your first project
hacking on PostgreSQL, you will almost certainly fail.


Thank you very much, i guessed that already -.-
Still, I needed that at my office for a long time, struggled with it many
times and had to come out with some "exotic" solutions...
Now I have spare time between projects, so I can work on it full-time. At
least it's worth a try, isn't it?


Well, you can work on it but I think it will be less programming and
more coming up with a feasable solution.


Anyway, I'm working to better understand the problem, trying to identify at
least the main involved points.
At the moment I'm figuring out how the inherit mechanism works for
relations (in tablecmds.c).. Then I'll figure out about how indexes work..


While there are probably old threads in the archives, I find the
easiest way to look at the problem is in the locking.  In particular, I
think if you can get unique indexes to work then the rest will follow.

Consider the case of an inheritence hierarchy and you want a unique
index on a column.  Since you want to be able to create and drop
children easily, each childs need to have an index just for them.  But
if you insert a row into one child you need to, somehow, prevent other
people also inserting the same value in a different child.  Efficiently
and deadlock-free.  This is hard, though we're up for crazy,
out-of-the-box ideas.

Note, there is one very special case, namely:

- The children are used for partitioning.

- The unique index you want is on the partition key.

Since each value can only possibly appear in one table your locking
problems vanish. The question is: how often does this happen?


I would also consider indexes that span multiple tables that are do NOT involve 
inheritance. That's the most generic case, so if you can make that work 
everything else should fall into place. The only caveat is that UPDATE and 
DELETE in an inheritance tree could produce unique challenges since they would 
start off by reading from more than one table.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Jim Nasby

On 5/8/13 12:33 PM, Dimitri Fontaine wrote:

Karol Trzcionka  writes:

as a continuation of my proposal expanding RETURNING syntax by


What about implementing support for OLD/NEW in per-statement triggers? I
guess you would expose the data via a SRF.


Per statement NEW/OLD is an interesting case, in that it shares some of the 
same challenges; namely how to store the NEW and OLD recordsets efficiently. 
I've wondered if there'd be some way to do that by just storing a list of CTIDs 
(not sure if that'd work with HOT for OLD though).

I still like the idea of being able to exclude certain records during COPY 
though; not writing a tuple will always be more efficient than creating one and 
then nuking it after the fact. There's a similar argument to be made about 
in-line transforms too.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Jim Nasby

On 5/8/13 12:54 PM, Jonathan S. Katz wrote:

On May 8, 2013, at 1:16 PM, Tom Lane wrote:


Heikki Linnakangas  writes:

On 08.05.2013 19:44, Tom Lane wrote:

No there isn't; what you suggest would require FE/BE protocol
extensions, making it several orders of magnitude more work than the
other thing.



I'd imagine that the flow would go something like this:



BE  FE



CopyInResponse
CopyData
CopyData
...
CopyDone
RowDescription
DataRow
DataRow
CommandComplete


That would require the backend to buffer the entire query response,
which isn't a great idea.  I would expect that such an operation would
need to interleave CopyData to the backend with DataRow responses.  Such
a thing could possibly be built on COPY_BOTH mode, but it would be a lot
of work (at both ends) for extremely debatable value.

The general idea of COPY is to load data as fast as possible, so weighing
it down with processing options seems like a pretty dubious idea even if
the implementation were easy.


There are cases that I indeed want to load data very quickly, but I want to 
perform an operation on it immediately after, e.g. removing bad data that was 
immediately added from that copy.  For instance, I do have this scenario:

WITH new_data AS (
COPY FROM ...
RETURNING id, field_to_check
)
DELETE FROM table
USING new_data
WHERE
table.id = new_data.id AND
new_data.field_to_check ~* 'bad data';

Now I can take care of that all in one step, and I know I'm only removing 
fields I just added.  This comes up when I am importing external files from 
other sources where I may not necessarily want all of the rows or some of the 
rows contain bad data.

This also presumes that COPY works in a CTE, which I'm not sure it does (and I 
will do the TIAS test after I hit send on this message).


What you're really asking for here is some kind of stream processing 
capability. There are spin-offs of Postgres that provide that capability (I 
know Neil Conway worked on some). Those are geared completely around stream 
processing, but I think it would be extremely interesting to provide some 
minimal support for that in community Postgres.

Using your use case as an example, something like this would be very 
interesting:

COPY table FROM ...
  WHERE field_to_check !~* 'bad data'
;

In this case we're just applying a simple WHERE clause against each incoming 
row.

Perhaps what I'm suggesting could be implemented with a CTE, but I'm not sure 
it makes sense to do it the way you propose, at least not initially. A CTE 
would provide so much flexibility that it'd be difficult for the optimizer to 
be efficient about it. Something like a WHERE clause directly on COPY would be 
a lot easier to handle. As someone mentioned, FDW might be another option there.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about index inheritance

2013-05-08 Thread Martijn van Oosterhout
On Wed, May 08, 2013 at 10:19:08AM +0200, Vincenzo Melandri wrote:
> On Tue, May 7, 2013 at 11:55 PM, Robert Haas  wrote:
> > This is a really hard problem.  If you pick this as your first project
> > hacking on PostgreSQL, you will almost certainly fail.
> >
> Thank you very much, i guessed that already -.-
> Still, I needed that at my office for a long time, struggled with it many
> times and had to come out with some "exotic" solutions...
> Now I have spare time between projects, so I can work on it full-time. At
> least it's worth a try, isn't it?

Well, you can work on it but I think it will be less programming and
more coming up with a feasable solution.

> Anyway, I'm working to better understand the problem, trying to identify at
> least the main involved points.
> At the moment I'm figuring out how the inherit mechanism works for
> relations (in tablecmds.c).. Then I'll figure out about how indexes work..

While there are probably old threads in the archives, I find the
easiest way to look at the problem is in the locking.  In particular, I
think if you can get unique indexes to work then the rest will follow.

Consider the case of an inheritence hierarchy and you want a unique
index on a column.  Since you want to be able to create and drop
children easily, each childs need to have an index just for them.  But
if you insert a row into one child you need to, somehow, prevent other
people also inserting the same value in a different child.  Efficiently
and deadlock-free.  This is hard, though we're up for crazy,
out-of-the-box ideas.

Note, there is one very special case, namely:

- The children are used for partitioning.

- The unique index you want is on the partition key.

Since each value can only possibly appear in one table your locking
problems vanish. The question is: how often does this happen?

Hope this helps,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] Graph datatype addition

2013-05-08 Thread Atri Sharma
>
> Your second drawing didn't really make any sense to me. :(
>
> I do think it would be most productive to focus on what the API for dealing
> with graph data would look like before trying to handle the storage aspect.
> The storage is potentially dirt-simple, as others have shown. The only
> challenge would be efficiency, but it's impossible to discuss efficiency
> without some clue of how the data will be accessed. Frankly, for the first
> round of this I think it would be best if the storage really was just some
> raw tables. Once something is available people will start figuring out how
> to use it, and where the API needs to be improved.
>
> --

Thanks for your reply.

Yes,my drawing sucks.heh.

Ok,I agree. I was pretty perked up about efficiency in storage, hence
started designing.

Sketching out an API in terms of functionalities will require a
different viewpoint. I think make, insert, search, delete
functionalities would be straightly exposed to the user.Then,
functionalities to isolate sub graphs based on some criterion/criteria
and implementations of standard graph algorithms(BFS,DFS,Djikstra's
algorithm) can be exposed through single functions.

Regards,

Atri


--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Graph datatype addition

2013-05-08 Thread Jim Nasby

On 5/8/13 1:40 PM, Atri Sharma wrote:

On Thu, May 2, 2013 at 7:58 AM, Atri Sharma  wrote:



Sent from my iPad

On 02-May-2013, at 4:33, Misa Simic  wrote:



On Wednesday, May 1, 2013, Atri Sharma wrote:


Hi all,

Please find a probable prototype for the same:

struct GraphNode
{
 Oid NodeOid;// Oid of the row which is the node here. We will
store an identifier to it here rather than the complete row(with data)
itself.
 AdjacencyList *list;   // Pointer to the node's adjacency list.
};

struct AdjacencyList
{
   Oid[] neighbours_list;
};

struct AdjacencyList is probably the 'hottest' data structure in our
entire implementation. We can think of making a cache of recently
accessed struct AdjacencyList instances, or the AdjacencyList(s) of
the neighbours of the recently accessed nodes, because, they are most
likely to be accessed in near future. Advice here, please?

So.

struct AdjacencyCache
{
  Oid[] cache_values;
};

push and pop functions for AdjacencyCache follow.

We need a replacement and invalidation algorithm for the cache. I feel
a version of LRU should be good here.

I have not given a prototype for operations and algorithm implementations.

I feel,as suggested by Peter and Jaime, we can look at pgRouting code
for algorithm implementations.

Florian's concerns are mitigated here to some extent,IMO. Since the
nodes and linkings are loosely coupled, and not represented as a
single representation, updating or changing of any part or adding a
new edge is no longer an expensive operation, as it only requires a
lookup of GraphNode and then its AdjacencyList. If we use the cache as
well, it will further reduce the lookup costs.

I have not yet thought of the user visible layer as suggested by Jim.
Probably. once we are ok with the internal layer, we can move to the
user visible layer.

Advice/Comments/Feedback please?



Honestly - I think I dont understand proposal...

Datatypes - are about values - what will be stored in that column in a
table

Datatype - cant have any clue about "rows"

How I understand what you described - you can achieve the same with pure SQL
- struct are equvalent to graph tables... Instead od Oid column will store
PKs of nodes table...


Yes, I agree.I need to think more.

Let me get back with a deeper proposal.

Regards,

Atri


Hi all,

In continuation of the above discussion,I have been thinking about the
design of the data type. I have been thinking on the lines of making a
multi dimensional data structure for the same:

Specifically, I have been thinking about making multi lists for
representing data. After some research, I think that the following
data structure may help:

Each node will be represented as:

[Down Pointer][Atom][Right Pointer]

Suppose, a graph is like(sorry for the bad drawing):

  B
/
AD
   \  /
C
 \
  E

can be represented as:
  C's dataE's data
   D's data
  ^   ^
  ^
A's data
[|][1][-->[|][1][-->[|][1][NULL]
  ^  ^
[|][1][-->[|][0][->[|][1][NULL]
^
B's data


Essentially, the Atom flag denotes if the node has any out edges from
it. If it has no out edge, ATOM is 0 and Down Pointer points to an
auxiliary structure which holds the node's data(hence, the data can be
of arbitrary size).

If the node has some out degree, then, those nodes are added to a new
sublist which starts from the node which spawns those nodes.Node's
down pointer points to the head of the new sublist.

Essentially, a sublist has all the nodes directly spawning from the
head node of the sublist.

This approach has multiple advantages in term of memory and
efficiency. Also, isolating sub graphs based on some criteria is
pretty efficient, which is good for many analytics based operations.

Access time is restricted as well.

Thoughts/Comments/Feedback please?


Your second drawing didn't really make any sense to me. :(

I do think it would be most productive to focus on what the API for dealing 
with graph data would look like before trying to handle the storage aspect. The 
storage is potentially dirt-simple, as others have shown. The only challenge 
would be efficiency, but it's impossible to discuss efficiency without some 
clue of how the data will be accessed. Frankly, for the first round of this I 
think it would be best if the storage really was just some raw tables. Once 
something is available people will start figuring out how to use it, and where 
the API needs to be improved.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make c

Re: [HACKERS] Graph datatype addition

2013-05-08 Thread Atri Sharma
On Thu, May 2, 2013 at 7:58 AM, Atri Sharma  wrote:
>
>
> Sent from my iPad
>
> On 02-May-2013, at 4:33, Misa Simic  wrote:
>
>
>
> On Wednesday, May 1, 2013, Atri Sharma wrote:
>>
>> Hi all,
>>
>> Please find a probable prototype for the same:
>>
>> struct GraphNode
>> {
>> Oid NodeOid;// Oid of the row which is the node here. We will
>> store an identifier to it here rather than the complete row(with data)
>> itself.
>> AdjacencyList *list;   // Pointer to the node's adjacency list.
>> };
>>
>> struct AdjacencyList
>> {
>>   Oid[] neighbours_list;
>> };
>>
>> struct AdjacencyList is probably the 'hottest' data structure in our
>> entire implementation. We can think of making a cache of recently
>> accessed struct AdjacencyList instances, or the AdjacencyList(s) of
>> the neighbours of the recently accessed nodes, because, they are most
>> likely to be accessed in near future. Advice here, please?
>>
>> So.
>>
>> struct AdjacencyCache
>> {
>>  Oid[] cache_values;
>> };
>>
>> push and pop functions for AdjacencyCache follow.
>>
>> We need a replacement and invalidation algorithm for the cache. I feel
>> a version of LRU should be good here.
>>
>> I have not given a prototype for operations and algorithm implementations.
>>
>> I feel,as suggested by Peter and Jaime, we can look at pgRouting code
>> for algorithm implementations.
>>
>> Florian's concerns are mitigated here to some extent,IMO. Since the
>> nodes and linkings are loosely coupled, and not represented as a
>> single representation, updating or changing of any part or adding a
>> new edge is no longer an expensive operation, as it only requires a
>> lookup of GraphNode and then its AdjacencyList. If we use the cache as
>> well, it will further reduce the lookup costs.
>>
>> I have not yet thought of the user visible layer as suggested by Jim.
>> Probably. once we are ok with the internal layer, we can move to the
>> user visible layer.
>>
>> Advice/Comments/Feedback please?
>>
>
> Honestly - I think I dont understand proposal...
>
> Datatypes - are about values - what will be stored in that column in a
> table
>
> Datatype - cant have any clue about "rows"
>
> How I understand what you described - you can achieve the same with pure SQL
> - struct are equvalent to graph tables... Instead od Oid column will store
> PKs of nodes table...
>
>
> Yes, I agree.I need to think more.
>
> Let me get back with a deeper proposal.
>
> Regards,
>
> Atri

Hi all,

In continuation of the above discussion,I have been thinking about the
design of the data type. I have been thinking on the lines of making a
multi dimensional data structure for the same:

Specifically, I have been thinking about making multi lists for
representing data. After some research, I think that the following
data structure may help:

Each node will be represented as:

[Down Pointer][Atom][Right Pointer]

Suppose, a graph is like(sorry for the bad drawing):

 B
   /
AD
  \  /
   C
\
 E

can be represented as:
 C's dataE's data
  D's data
 ^   ^
 ^
A's data
[|][1][-->[|][1][-->[|][1][NULL]
 ^  ^
[|][1][-->[|][0][->[|][1][NULL]
   ^
   B's data


Essentially, the Atom flag denotes if the node has any out edges from
it. If it has no out edge, ATOM is 0 and Down Pointer points to an
auxiliary structure which holds the node's data(hence, the data can be
of arbitrary size).

If the node has some out degree, then, those nodes are added to a new
sublist which starts from the node which spawns those nodes.Node's
down pointer points to the head of the new sublist.

Essentially, a sublist has all the nodes directly spawning from the
head node of the sublist.

This approach has multiple advantages in term of memory and
efficiency. Also, isolating sub graphs based on some criteria is
pretty efficient, which is good for many analytics based operations.

Access time is restricted as well.

Thoughts/Comments/Feedback please?

Regards,

Atri

--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Taking the "varattno" in "args" (where part of a query)

2013-05-08 Thread carla celiberti
Hi everyone,
I'm working on a project and one of my task is trying to take the "where" part 
of the queries.I'm programming inside the function "exec_simple_query" in 
"postgres.c".I searched inside the source code and how I've understood I need 
to take the "varattno" in "args" in "qpqual", is it right?If yes, how do I have 
to save it in order to have it in that function?If no, what is the easiest way 
to take it?
Example: For a query: select firstname from friend where age=33The attributes 
of the table friends are: firstname, lastname, city, state, ageI need to have 
the number 5 because "age" is the fifth attribute of "friend".
Thanks

Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Stephen Frost
* Ryan Kelly (rpkell...@gmail.com) wrote:
> COPY ... RETURNING would certainly be useful to apply additional
> transformations to the data before finally sending it to its ultimate
> destination.

If we really think that COPY ... RETURNING is only going to be used in a
CTE or similar, then we could always only support that and forgo any
changes to the FE/BE protocol to support it.  Or, at least, take the
simplest approach to supporting it which would involve cacheing the data
entirely before sending it back to the client (isn't that what we do on
a big INSERT ... VALUES ... RETURNING anyway?  people can transfer in
blocks if they want to with INSERT .. VALUES or COPY .. RETURNING).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Ryan Kelly
On Wed, May 05/08/13, 2013 at 10:55:40AM -0700, David Fetter wrote:
> On Wed, May 08, 2013 at 01:16:14PM -0400, Tom Lane wrote:
> > Heikki Linnakangas  writes:
> > > On 08.05.2013 19:44, Tom Lane wrote:
> > >> No there isn't; what you suggest would require FE/BE protocol
> > >> extensions, making it several orders of magnitude more work than the
> > >> other thing.
> > 
> > > I'd imagine that the flow would go something like this:
> > 
> > > BEFE
> > 
> > > CopyInResponse
> > >   CopyData
> > >   CopyData
> > >   ...
> > >   CopyDone
> > > RowDescription
> > > DataRow
> > > DataRow
> > > CommandComplete
> > 
> > That would require the backend to buffer the entire query response,
> > which isn't a great idea.  I would expect that such an operation would
> > need to interleave CopyData to the backend with DataRow responses.  Such
> > a thing could possibly be built on COPY_BOTH mode, but it would be a lot
> > of work (at both ends) for extremely debatable value.
> > 
> > The general idea of COPY is to load data as fast as possible,
> 
> With utmost respect, that is one of several use cases, and any change
> would need to keep that use case unburdened.  A sometimes overlapping
> set of use cases move data in and out of the database in a simple
> manner.  In some of these, people might wish to trade some performance
> for the feature.

99% of my uses at work for COPY are as a general data import and export
facility. I often find myself loading CSV files into our database for
analysis and further cleanup, and then use COPY to output queries as CSV
files for consumption by other members of the business.

The recent work for (PRE|POST)PROCESSOR options to COPY is indicative of
the fact that users are not merely using COPY to "load data as fast as
possible".

Other discussions around a COMPRESSED option are more than just a
performance enhancement, in my view, as I oftern receive files
compressed and decompressing the data is just another step standing in
the way of myself importing the data into the database.

Additionally, once I have the data imported, I often take many steps to
cleanup and format the data, prior to applying actual typing to a table
(which invariably fails due to invalid dates, and other nonsense).

COPY ... RETURNING would certainly be useful to apply additional
transformations to the data before finally sending it to its ultimate
destination.

> A particular example would be one where there are several tables to be
> loaded, some with generated columns that the future ones would depend
> on.  Yes, it's possible (kinda) to do this with the FDW machinery, but
> the burden is much higher as it requires DDL permission in general
> each time.

I find using the FDW machinery to perform many queries to be much slower
than importing the data once and then running my queries. There is also
no ability to use indexes.

> > so weighing it down with processing options seems like a pretty
> > dubious idea even if the implementation were easy.
> 
> Totally agreed that the "fast load/unload" code path must not be
> affected by any such changes.

Agreed here as well.

-Ryan P. Kelly



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminology issue: suffix tree

2013-05-08 Thread Heikki Linnakangas

On 08.05.2013 15:49, Alexander Korotkov wrote:

On Wed, May 8, 2013 at 3:50 PM, Heikki Linnakangas
wrote:


Yeah. The data structure in contrib/unaccent seems to be a plain old trie,
rather than a radix trie, though. According to wikipedia at least, the
difference is that in a radix tree, the edges are labeled with sequences of
elements, rather than single elements. Want to patch that too?


Agree, trie is most comforming term here. Patch is attached.


Ok, applied.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint

2013-05-08 Thread Andres Freund
On 2013-05-08 12:30:31 -0400, Tom Lane wrote:
> Greg Stark  writes:
> > That's kind of dismaying. ORMs have a tendency to create queries like
> > this and people may have even written such queries by hand and tested
> > them to determine that postgres was able to exclude the useless
> > relation. To have them install a security update and discover that
> > something they had previously tested no longer worked would be
> > annoying.
> 
> Turns out to be more to this than I realized before.  In an example
> such as the one I showed
> 
> select * from
>   ((select f1 as x from t1 offset 0)
>union all
>(select f2 as x from t2 offset 0)) ss
> where false;
> 
> where an appendrel subselect member can be proven empty on the basis
> of outer-query clauses alone, *we don't even plan that subquery*.
> The fix I had in mind for this fails to capture table references from
> such a subquery.

> We could extend setrefs.c to dig into unplanned subqueries and grab RTEs
> out of them, but that would not be a complete fix.  In particular, RTEs
> would not get made for inheritance children of parent tables mentioned
> in the query, since inheritance expansion is done by the planner.  Now,
> that wouldn't affect permissions checks because no extra permissions
> checks are done on inheritance children, but it would affect the locking
> behavior that Andres was worried about.

I first thought that is fair enough since I thought that in most if not
all cases where locking plays a user visible role the parent relation
would get locked anyway when a child relations gets locked. Turns out,
we do it only the other way round, i.e. lock child relations when we
lock a parent relation, even for most ddl in child relations.

I am not sure if its really problematic, but it seems to allow scenarios
like:

S1: BEGIN;
S1: SELECT * FROM ((SELECT * FROM parent OFFSET 0) UNION ALL (SELECT * FROM 
parent OFFSET 0)) f WHERE false;
-- parent is locked now, children are not
S2: BEGIN;
S2: ALTER TABLE child_1 DROP CONSTRAINT foo;
S1: SELECT * FROM parent WHERE ...
-- blocks, waiting for S1 since child_1 is locked.

This seems like somewhat confusing behaviour, although it has gone
unnoticed so far, since one normally expect that a previous lock allows
you to continue workin with a relation.

But I guess this is better fixed by making all DDL on child relations
also lock their parent relation? That seems like a good idea anyway.

I am not at all convinced that this must be fixed, but also not the
other way round. I just wanted to point this out since I am not sure
there aren't any more problematic cases.

> I think the only reliable way to make this optimization fully
> transparent would be to go ahead and plan every subquery, even when we
> know we'll discard the planning results immediately.  That seems like
> quite a lot of overkill.  I'm not really sure I buy Greg's argument
> that people might be depending on the performance benefits of not
> planning such subqueries, but I guess it's not impossible either.

I didn't understand Greg's argument as being based on performance but as
being worried about the changed locking and such from a functional
perspective. Greg?

I don't really buy the performance argument either, but I agree that we
shouldn't do all this in the back branches as the "bug" isn't really bad
and it has some potential for introducing problems.

> I'm also now pretty firmly in the camp of "let's not try at all to fix
> this in the back branches".

+1 independent of where this goes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread David Fetter
On Wed, May 08, 2013 at 01:16:14PM -0400, Tom Lane wrote:
> Heikki Linnakangas  writes:
> > On 08.05.2013 19:44, Tom Lane wrote:
> >> No there isn't; what you suggest would require FE/BE protocol
> >> extensions, making it several orders of magnitude more work than the
> >> other thing.
> 
> > I'd imagine that the flow would go something like this:
> 
> > BE  FE
> 
> > CopyInResponse
> > CopyData
> > CopyData
> > ...
> > CopyDone
> > RowDescription
> > DataRow
> > DataRow
> > CommandComplete
> 
> That would require the backend to buffer the entire query response,
> which isn't a great idea.  I would expect that such an operation would
> need to interleave CopyData to the backend with DataRow responses.  Such
> a thing could possibly be built on COPY_BOTH mode, but it would be a lot
> of work (at both ends) for extremely debatable value.
> 
> The general idea of COPY is to load data as fast as possible,

With utmost respect, that is one of several use cases, and any change
would need to keep that use case unburdened.  A sometimes overlapping
set of use cases move data in and out of the database in a simple
manner.  In some of these, people might wish to trade some performance
for the feature.

A particular example would be one where there are several tables to be
loaded, some with generated columns that the future ones would depend
on.  Yes, it's possible (kinda) to do this with the FDW machinery, but
the burden is much higher as it requires DDL permission in general
each time.

> so weighing it down with processing options seems like a pretty
> dubious idea even if the implementation were easy.

Totally agreed that the "fast load/unload" code path must not be
affected by any such changes.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Jonathan S. Katz
On May 8, 2013, at 1:16 PM, Tom Lane wrote:

> Heikki Linnakangas  writes:
>> On 08.05.2013 19:44, Tom Lane wrote:
>>> No there isn't; what you suggest would require FE/BE protocol
>>> extensions, making it several orders of magnitude more work than the
>>> other thing.
> 
>> I'd imagine that the flow would go something like this:
> 
>> BE   FE
> 
>> CopyInResponse
>>  CopyData
>>  CopyData
>>  ...
>>  CopyDone
>> RowDescription
>> DataRow
>> DataRow
>> CommandComplete
> 
> That would require the backend to buffer the entire query response,
> which isn't a great idea.  I would expect that such an operation would
> need to interleave CopyData to the backend with DataRow responses.  Such
> a thing could possibly be built on COPY_BOTH mode, but it would be a lot
> of work (at both ends) for extremely debatable value.
> 
> The general idea of COPY is to load data as fast as possible, so weighing
> it down with processing options seems like a pretty dubious idea even if
> the implementation were easy.

There are cases that I indeed want to load data very quickly, but I want to 
perform an operation on it immediately after, e.g. removing bad data that was 
immediately added from that copy.  For instance, I do have this scenario:

WITH new_data AS (
COPY FROM ...
RETURNING id, field_to_check
)
DELETE FROM table
USING new_data
WHERE
table.id = new_data.id AND
new_data.field_to_check ~* 'bad data';

Now I can take care of that all in one step, and I know I'm only removing 
fields I just added.  This comes up when I am importing external files from 
other sources where I may not necessarily want all of the rows or some of the 
rows contain bad data.

This also presumes that COPY works in a CTE, which I'm not sure it does (and I 
will do the TIAS test after I hit send on this message).

Jonathan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Andrew Dunstan


On 05/08/2013 01:16 PM, Tom Lane wrote:


That would require the backend to buffer the entire query response,
which isn't a great idea.  I would expect that such an operation would
need to interleave CopyData to the backend with DataRow responses.  Such
a thing could possibly be built on COPY_BOTH mode, but it would be a lot
of work (at both ends) for extremely debatable value.

The general idea of COPY is to load data as fast as possible, so weighing
it down with processing options seems like a pretty dubious idea even if
the implementation were easy.





That was my reaction. I would look it this very skeptically.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Dimitri Fontaine
Karol Trzcionka  writes:
> as a continuation of my proposal expanding RETURNING syntax by

What about implementing support for OLD/NEW in per-statement triggers? I
guess you would expose the data via a SRF.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving PL/Python builds on OS X

2013-05-08 Thread Dave Page
On Wed, May 8, 2013 at 5:34 PM, Peter Eisentraut  wrote:
> On 5/8/13 11:51 AM, Dave Page wrote:
>> This patch seems to have broken the build for our installers for 9.3.
>> Because we need a consistent build of the PL interpretors on all the
>> platforms we support, we use the ActiveState distributions of Perl,
>> Python and TCL (we can't rely on vendor supplied packages, because
>> their versions vary between different Linux distros and different OS X
>> versions). However, ActivePython doesn't include a shared library,
>> which this change seems to require.
>>
>> Can that requirement be reverted?
>
> There was no change in this regard.  A shared library was always
> required on OS X.

It's failing on Linux. Even worse, it configures fine and then builds
without error. There is a message spewed out by configure, but it
doesn't contain the words warning or error. Given that I explicitly
said I wanted Python support when I ran configure, it should certainly
fail with an error at configure time. We only noticed this was a
problem when the QA guys started diving in to more detailed tested, as
we don't watch for every message in the 50+ MB of logs our automated
build systems generate.

> ActivePython does include a shared library.  I just tried it and it
> builds fine.



--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Tom Lane
Heikki Linnakangas  writes:
> On 08.05.2013 19:44, Tom Lane wrote:
>> No there isn't; what you suggest would require FE/BE protocol
>> extensions, making it several orders of magnitude more work than the
>> other thing.

> I'd imagine that the flow would go something like this:

> BEFE

> CopyInResponse
>   CopyData
>   CopyData
>   ...
>   CopyDone
> RowDescription
> DataRow
> DataRow
> CommandComplete

That would require the backend to buffer the entire query response,
which isn't a great idea.  I would expect that such an operation would
need to interleave CopyData to the backend with DataRow responses.  Such
a thing could possibly be built on COPY_BOTH mode, but it would be a lot
of work (at both ends) for extremely debatable value.

The general idea of COPY is to load data as fast as possible, so weighing
it down with processing options seems like a pretty dubious idea even if
the implementation were easy.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Heikki Linnakangas

On 08.05.2013 19:44, Tom Lane wrote:

Karol Trzcionka  writes:

as a continuation of my proposal expanding RETURNING syntax by
AFTER/BEFORE, there can be enough time to implement RETURNING for COPY.


No there isn't; what you suggest would require FE/BE protocol
extensions, making it several orders of magnitude more work than the
other thing.


I'd imagine that the flow would go something like this:

BE  FE

CopyInResponse
CopyData
CopyData
...
CopyDone
RowDescription
DataRow
DataRow
CommandComplete

A well-written client should be able to handle that. But if one expects 
the backend to always send a CommandComplete after CopyDone, it will get 
confused. Implementing that doesn't seem too difficult to me.


I agree that this is much more work than the UPDATE RETURNING 
BEFORE/AFTER, though. Not sure if that's a good or a bad thing.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Tom Lane
Karol Trzcionka  writes:
> as a continuation of my proposal expanding RETURNING syntax by
> AFTER/BEFORE, there can be enough time to implement RETURNING for COPY.

No there isn't; what you suggest would require FE/BE protocol
extensions, making it several orders of magnitude more work than the
other thing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Pavel Stehule
Hello

why? What is motivation? What is use case?

Regards

Pavel


2013/5/8 Karol Trzcionka 

> Hello,
> as a continuation of my proposal expanding RETURNING syntax by
> AFTER/BEFORE, there can be enough time to implement RETURNING for COPY.
> I'd like to hear your opinion on that. My draft idea is:
> COPY FROM ... RETURNING table_name.* -> returns all values copied to
> table after all triggers invoke
> COPY FROM ... RETURNING BEFORE.* -> returns all values copied to table
> before triggers (all raw values from file)
> COPY TO ... RETURNING table_name.* -> returns all values from table
> COPY TO ... RETURNING AFTER.*/BEFORE.* -> the same
> What do you think about?
> Regards,
> Karol Trzcionka
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] improving PL/Python builds on OS X

2013-05-08 Thread Peter Eisentraut
On 5/8/13 11:51 AM, Dave Page wrote:
> This patch seems to have broken the build for our installers for 9.3.
> Because we need a consistent build of the PL interpretors on all the
> platforms we support, we use the ActiveState distributions of Perl,
> Python and TCL (we can't rely on vendor supplied packages, because
> their versions vary between different Linux distros and different OS X
> versions). However, ActivePython doesn't include a shared library,
> which this change seems to require.
> 
> Can that requirement be reverted?

There was no change in this regard.  A shared library was always
required on OS X.

ActivePython does include a shared library.  I just tried it and it
builds fine.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint

2013-05-08 Thread Tom Lane
Greg Stark  writes:
> That's kind of dismaying. ORMs have a tendency to create queries like
> this and people may have even written such queries by hand and tested
> them to determine that postgres was able to exclude the useless
> relation. To have them install a security update and discover that
> something they had previously tested no longer worked would be
> annoying.

Turns out to be more to this than I realized before.  In an example
such as the one I showed

select * from
  ((select f1 as x from t1 offset 0)
   union all
   (select f2 as x from t2 offset 0)) ss
where false;

where an appendrel subselect member can be proven empty on the basis
of outer-query clauses alone, *we don't even plan that subquery*.
The fix I had in mind for this fails to capture table references from
such a subquery.

We could extend setrefs.c to dig into unplanned subqueries and grab RTEs
out of them, but that would not be a complete fix.  In particular, RTEs
would not get made for inheritance children of parent tables mentioned
in the query, since inheritance expansion is done by the planner.  Now,
that wouldn't affect permissions checks because no extra permissions
checks are done on inheritance children, but it would affect the locking
behavior that Andres was worried about.

I think the only reliable way to make this optimization fully
transparent would be to go ahead and plan every subquery, even when we
know we'll discard the planning results immediately.  That seems like
quite a lot of overkill.  I'm not really sure I buy Greg's argument
that people might be depending on the performance benefits of not
planning such subqueries, but I guess it's not impossible either.

My inclination is to go ahead and write the extra code to dig RTEs out
of unplanned subqueries, and not worry about failing to lock inheritance
children in them.

I'm also now pretty firmly in the camp of "let's not try at all to fix
this in the back branches".

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] XLogFlush invoked about twice as much after 9.2 group commit enhancement

2013-05-08 Thread Jeff Janes
On Tue, May 7, 2013 at 9:23 PM, Peter Geoghegan  wrote:

> On Tue, May 7, 2013 at 12:48 PM, Jeff Janes  wrote:
> > Anyway, I don't see this behavior change when turning on wal_debug and
> > looking in the logfiles for 'xlog flush request' messages.
>
> That could have everything to do with the hardware you're using. In
> general, the higher the cost of an fsync, the more useful it is to
> amortize that cost among concurrently committing transactions.
>

True, but that is going to happen with a proportional increase in
throughput, which he reported not seeing.  At least originally.  I'm not
sure what to think now.

Cheers,

Jeff


Re: [HACKERS] improving PL/Python builds on OS X

2013-05-08 Thread Dave Page
On Fri, Dec 21, 2012 at 5:45 PM, Peter Eisentraut  wrote:
> The PL/Python build on OS X is currently hardcoded to use the system
> Python install.  If you try to override this when running configure, you
> get a mysterious mix-and-match build.  If you want to build against your
> own Python build, or MacPorts or Homebrew, you can't.
>
> This is straightforward to fix.  In configure, besides checking Python
> include and library paths, we can also check whether it's a framework
> build and the right parameters for that.  The attached patch does that
> and does the job for me.  Please test it.
>
> One constraint, which is explained in the comment in
> src/pl/plpython/Makefile is that in Python <2.5, there is no official
> way to detect either framework builds or shared libpython builds, so we
> can't support those versions on OS X, at least without more hardcoding
> of things.  I'd rather phase some of that out, but if someone needs to
> continue to use Python 2.4 or earlier on OS X, let me know.  (Or more
> proper fix would be to split DLSUFFIX into two variables, but that seems
> more work than it's worth right now.)

This patch seems to have broken the build for our installers for 9.3.
Because we need a consistent build of the PL interpretors on all the
platforms we support, we use the ActiveState distributions of Perl,
Python and TCL (we can't rely on vendor supplied packages, because
their versions vary between different Linux distros and different OS X
versions). However, ActivePython doesn't include a shared library,
which this change seems to require.

Can that requirement be reverted?

--
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] RETURNING syntax for COPY

2013-05-08 Thread Karol Trzcionka
Hello,
as a continuation of my proposal expanding RETURNING syntax by
AFTER/BEFORE, there can be enough time to implement RETURNING for COPY.
I'd like to hear your opinion on that. My draft idea is:
COPY FROM ... RETURNING table_name.* -> returns all values copied to
table after all triggers invoke
COPY FROM ... RETURNING BEFORE.* -> returns all values copied to table
before triggers (all raw values from file)
COPY TO ... RETURNING table_name.* -> returns all values from table
COPY TO ... RETURNING AFTER.*/BEFORE.* -> the same
What do you think about?
Regards,
Karol Trzcionka


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint

2013-05-08 Thread Tom Lane
Andres Freund  writes:
> On 2013-05-07 21:45:02 -0400, Tom Lane wrote:
>> Well, it might fail to report a permissions violation when the
>> not-allowed-to-be-accessed relation could be proven to yield no rows.

> Couldn't it also cause tables not to be locked that ought to be? That
> seems to be the nastier part to me.

In ordinary immediate execution the parser or planner would have
obtained the relevant table lock.  I think what you say is possible if a
prepared plan is re-executed, but TBH it doesn't sound like much of an
issue to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminology issue: suffix tree

2013-05-08 Thread Alexander Korotkov
On Wed, May 8, 2013 at 3:50 PM, Heikki Linnakangas
wrote:

> On 06.05.2013 14:10, Alexander Korotkov wrote:
>
>> On Sat, May 4, 2013 at 10:27 PM, Alexander Korotkov
>> **wrote:
>>
>>> In suffix tree we insert every suffix of source string into the tree.
>>>
>>> http://en.wikipedia.org/wiki/**Suffix_tree
>>> Actually opclass implemented radix tree or patricia tree.
>>> http://en.wikipedia.org/wiki/**Radix_tree
>>> Likely we need a patch to rename it in all the places it mentioned.
>>>
>>
>> Patch is attached.
>>
>
> Thanks, committed.


Thanks!


>  Apparently, we have same issue in contrib/unaccent.
>>
>
> Yeah. The data structure in contrib/unaccent seems to be a plain old trie,
> rather than a radix trie, though. According to wikipedia at least, the
> difference is that in a radix tree, the edges are labeled with sequences of
> elements, rather than single elements. Want to patch that too?


Agree, trie is most comforming term here. Patch is attached.

--
With best regards,
Alexander Korotkov.


unaccent-suffix-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Terminology issue: suffix tree

2013-05-08 Thread Heikki Linnakangas

On 06.05.2013 14:10, Alexander Korotkov wrote:

On Sat, May 4, 2013 at 10:27 PM, Alexander Korotkovwrote:

In suffix tree we insert every suffix of source string into the tree.
http://en.wikipedia.org/wiki/Suffix_tree
Actually opclass implemented radix tree or patricia tree.
http://en.wikipedia.org/wiki/Radix_tree
Likely we need a patch to rename it in all the places it mentioned.


Patch is attached.


Thanks, committed.


Apparently, we have same issue in contrib/unaccent.


Yeah. The data structure in contrib/unaccent seems to be a plain old 
trie, rather than a radix trie, though. According to wikipedia at least, 
the difference is that in a radix tree, the edges are labeled with 
sequences of elements, rather than single elements. Want to patch that too?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cube extension improvement, GSoC

2013-05-08 Thread Alexander Korotkov
On Sat, May 4, 2013 at 11:19 PM, Stas Kelvich  wrote:

> > I think we have at least 3 data types more or less similar to cube.
>  > 1) array of ranges
> > 2) range of arrays
> > 3) 2d arrays
> > Semantically cube is most close to array or ranges. However array of
> ranges have huge storage overhead.
> > Also we can declare cube as domain on 2d arrays and declare operations
> of that domain.
>
> But what we should do when arrays in different records have different
> numbers of element?
>

We can be faced with absolutely same situation with cube.

test=# create table cube_test (v cube);
CREATE TABLE

test=# insert into cube_test values (cube(array[1,2])),
(cube(array[1,2,3]));
INSERT 0 2

In order to force all cubes to have same number of dimensions excplicit
CHECK on table is required.
As I remember cube treats absent dimensions as zeros.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GSOC Student Project Idea

2013-05-08 Thread Heikki Linnakangas

On 24.04.2013 22:10, Michael Schuh wrote:

Thank you both for the very helpful feedback. Perhaps the scope of this
project (application's "completeness criteria") is better as
a feasibility prototyping of the global/distance-based index strategy with
B+-tree and/or GiST extension possibilities.


For GSoC, we'd really like to see some code that can be committed as a 
result. Prototyping is important, but if that's all you do during the 
summer, the work is likely going to waste if no-one is going to work 
actively on the prototype afterwards.


At this point, I think we need a more concrete plan on how this would be 
implemented. The idea of using a regular B-tree for this, with some 
functions to do the partition mapping might work. However, that would be 
a clunky interface - I don't think that would be accepted into 
PostgreSQL. So I don't think that makes a good GSoC project.


If you think this can be done with the existing GiST or SP-GiST APIs, 
I'd like to see a more concrete plan on how that would work. What 
datatype would this be for? How would the partitioning be done? If the 
APIs need to be extended, what would the extensions look like? The 
summer is short, so there isn't much time for exploration - we need to 
have a pretty good idea of what the result will look like, right now.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix permission tests for views/tables proven empty by constraint

2013-05-08 Thread Andres Freund
On 2013-05-07 21:45:02 -0400, Tom Lane wrote:
> Greg Stark  writes:
> > If we just reverted your fix and didn't fix it in 9.2 that would also
> > fix the crash right? The bug was only that it leaked the fact that the
> > view was provably empty from the definition?
> 
> Well, it might fail to report a permissions violation when the
> not-allowed-to-be-accessed relation could be proven to yield no rows.
> I agree that it's a bit hard to call that a security issue as long as
> you assume that the attacker has access to the system catalogs; and
> even if you don't assume that, being able to discern that there's a
> check constraint on some table doesn't seem like a big leakage.

Couldn't it also cause tables not to be locked that ought to be? That
seems to be the nastier part to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC Student Project Idea

2013-05-08 Thread Heikki Linnakangas

On 24.04.2013 14:31, Florian Pflug wrote:

On Apr23, 2013, at 23:25 , Alexander Korotkov
wrote:

I've taken a brief look on the paper and implementation. As I can
see iDistance implements some global building strategy. I mean, for
example, it selects some point, calculates distances from selected
point to all points in dataset etc. So, it uses the whole dataset
at the same time.

However you can try to implement global index building in GiST or
SP-GiST. In this case I think you should carefully estimate your
capabilities during single GSoC project. You would need to extend
GiST or SP-GiST interface and write completely new implementation
of tree building algorithm. Question of how to exactly extend GiST
or SP-GiST interface for this case could appear to be very hard
even theoretically.


+1. That seemed to be a major roadblock to me too when I read the
paper.

You could work around that by making partition identification a
separate step. You'd have a function

idist_analyze(cfg name, table name, field name)

which'd identify suitable partitions for the data distribution in
table.field and store them somewhere. Such a set of pre-identified
partitions would be akin to a tsearch configuration, i.e. all other
parts of the iDistance machinery would use it to map points to index
keys and queries to ranges of those keys. You'll want to look at how
tsearch handles that, and check if the method can indeed be applied
to iDistance.


You could perform that step as part of the index build. Before the index 
build starts to add tuples to the index, it could scan a random sample 
of the heap and identify the partitions based on that.


If you need to store the metadata, like a map of partitions, it becomes 
difficult to cajole this into a normal GiST or SP-GiST opclass. The API 
doesn't have any support for storing such metadata.



In a first cut, you'd probably only allow inserts into index which
don't change the maximum distances from the partition centers that
idist_analyze() found.


That seems like a pretty serious restriction. I'd try to write it so 
that you can insert any value, but if the new values are very different 
from any existing values, it would be OK for the index quality to 
degrade. For example, you could simply add any out-of-bounds values to a 
separate branch in the index, which would have no particular structure 
and would just have to be scanned on every query. You can probably do 
better than that, but that would be a trivial way to deal with it.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] about index inheritance

2013-05-08 Thread Vincenzo Melandri
On Tue, May 7, 2013 at 11:55 PM, Robert Haas  wrote:

> On Mon, May 6, 2013 at 9:30 AM, Vincenzo Melandri 
> wrote:
> > Hi guys,
> >
> > My first post here :)
> > I stumbled into the same problem as this guy
> > http://www.postgresql.org/message-id/4be2835a.5020...@cybertec.at
> > , so since I have some spare time recently, I've set-up the development
> > environment for postgresql and I think I may be able to contibute for the
> > feature of index inheritance, that is currently unsopported, but listed
> in
> > TODOs.
> >
> > I've spent some time reading the docs and I took a look at the code. Is
> > anybody out there working on this already? I don't want to overlap
> someone
> > else effort, plus I'll gladly take any advice or join the community
> efforts
> > if any, 'cause this feature seems pretty huge to me at a first glance..
>
> This is a really hard problem.  If you pick this as your first project
> hacking on PostgreSQL, you will almost certainly fail.
>
>
Thank you very much, i guessed that already -.-
Still, I needed that at my office for a long time, struggled with it many
times and had to come out with some "exotic" solutions...
Now I have spare time between projects, so I can work on it full-time. At
least it's worth a try, isn't it?

Anyway, I'm working to better understand the problem, trying to identify at
least the main involved points.
At the moment I'm figuring out how the inherit mechanism works for
relations (in tablecmds.c).. Then I'll figure out about how indexes work..

I guess you discussed this plenty of time already in the past, but I didn't
found it in the archive. Any hint for old discussions?
I'll try to come out with a list of potential things to do, for you guys to
validate and discuss.

PS: i wrote last mail from an address with which I had not subscribed to
the list, and still the message got through.. Odd..
-- 
Vincenzo.
http://www.linkedin.com/pub/vincenzo-melandri/14/16/730


Re: [HACKERS] Add some regression tests for SEQUENCE

2013-05-08 Thread Fabien COELHO


Have provided an updated patch as per Fabien's recent response on 
Commitfest site. Any and all feedback is appreciated.


Review:

This patch works for me.

It adds valuable sequence test cases, especially trying corner cases with 
expected errors and permission denials.


I suggest to accept it.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add regression tests for SCHEMA

2013-05-08 Thread Fabien COELHO


Dear Robins,


Here is an updated patch that uses different schema / role names for
different tests (as per commitfest site feedback).


Short review about this version of the patch:

This patch work for me.

This test is a good thing and allows schema to be thoroughly tested, 
including corner cases which must fail because of errors or permissions.


Two remarks:

 - test 2 bis: why name 'pg_asdf'? why not 'pg_schema_sch'
   to be homogeneous with other tests?

 - test 3: why not WHERE schema_name='schema_sch3' instead of two
   negative comparisons? ISTM that if for some reason in the future a new
   schema name is added, the test will fail.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers